ExpectationMaximization.jl

This package provides a simple implementation of the Expectation Maximization (EM) algorithm used to fit mixture models. Due to Julia's amazing dispatch system, generic and reusable code spirit, and the Distributions.jl package, the code while being very generic is both very expressive and fast! Take a look at the Benchmark section.

What type of mixtures?

In particular, it works on a lot of mixtures:

Mixture of Univariate continuous distributions
Mixture of Univariate discrete distributions
Mixture of Multivariate distributions (continuous or discrete)
Mixture of mixtures (univariate or multivariate and continuous or discrete)
User defined mixtures (e.g. custom distributions)
More?

How?

Just define a mix::MixtureModel and do fit_mle(mix, y) where y is your observation array (vector or matrix). That's it! For Stochastic EM, just do fit_mle(mix, y, method = StochasticEM()). Take a look at the Examples section.

To work, the only requirements are that the components of the mixture dist ∈ dists = components(mix) considered (custom or coming from an existing package)

Are a subtype of Distribution i.e. dist<:Distribution.
The logpdf(dist, y) is defined (it is used in the E-step)
The fit_mle(dist, y, weights) returns the distribution with the updated parameters maximizing the likelihood. This is used in the M-step of the ClassicalEM algorithm. For the StochasticEM version, only fit_mle(dist, y) is needed. Type or instance version of fit_mle for your dist are accepted thanks to this conversion line.

In general, step 2. is easy, while step 3. is only known explicitly for a few common distributions. In step 3., if the fit_mle is not explicitly known, you can always implement a numerical scheme, if it exists, for fit_mle(dist, y) see Gamma distribution example or use tools like Optimizations.jl. Or, when possible, represent your “difficult” distribution as a mixture of simple terms. (I had this in mind, but it is not directly a mixture model.)

Note

Distributions.jl currently does not allow MixtureModel to both have discrete and continuous components^[2].

Algorithms

So far, the classic EM algorithm and the Stochastic EM are implemented. Look at the Bibliography section for references.

ExpectationMaximization.ClassicEM — Type

ClassicEM<:AbstractEM

The EM algorithm was introduced by A. P. Dempster, N. M. Laird and D. B. Rubin in 1977 in the reference paper Maximum Likelihood from Incomplete Data Via the EM Algorithm.

source

ExpectationMaximization.StochasticEM — Type

Base.@kwdef struct StochasticEM<:AbstractEM
    rng::AbstractRNG = Random.GLOBAL_RNG
end

The Stochastic EM algorithm was introduced by G. Celeux, and J. Diebolt. in 1985 in The SEM Algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem.

The default random seed is Random.GLOBAL_RNG but it can be changed via StochasticEM(seed).

source

Main function

Warning

To fit the mixture, use the “instance” version of fit_mle(mix::MixtureModel, ...) as described below and NOT the “Type” version, i.e., fit_mle(Type{MixtureModel}, ...). The provided mix is used as the starting point of the EM algorithm. See Instance vs Type version section for more context.

Distributions.fit_mle — Method

fit_mle(mix::MixtureModel, y::AbstractVecOrMat, weights...; method = ClassicEM(), display=:none, maxiter=1000, atol=1e-3, rtol=nothing, robust=false, infos=false)

Use the an Expectation Maximization (EM) algorithm to maximize the Loglikelihood (fit) the mixture with an i.i.d sample y. The mix input is a mixture that is used to initilize the EM algorithm.

weights when provided, it will compute a weighted version of the EM. (Useful for fitting mixture of mixtures)
method determines the algorithm used.
infos = true returns a Dict with informations on the algorithm (converged, iteration number, loglikelihood).
robust = true will prevent the (log)likelihood to overflow to -∞ or ∞.
atol criteria determining the convergence of the algorithm. If the Loglikelihood difference between two iteration i and i+1 is smaller than atol i.e. |ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<atol, the algorithm stops.
rtol relative tolerance for convergence, |ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<rtol*(|ℓ⁽ⁱ⁺¹⁾| + |ℓ⁽ⁱ⁾|)/2 (does not check if rtol is nothing)
display value can be :none, :iter, :final to display Loglikelihood evolution at each iterations :iter or just the final one :final

source

Distributions.fit_mle — Method

fit_mle(mix::AbstractArray{<:MixtureModel}, y::AbstractVecOrMat, weights...; method = ClassicEM(), display=:none, maxiter=1000, atol=1e-3, rtol=nothing, robust=false, infos=false)

Do the same as fit_mle for each (initial) mixtures in the mix array. Then it selects the one with the largest loglikelihood. Warning: It uses try and catch to avoid errors messages in case EM converges toward a singular solution (probably using robust should be enough in most case to avoid errors).

source

Utilities

ExpectationMaximization.predict — Function

predict(mix::MixtureModel, y::AbstractVector; robust=false)

Evaluate the most likely category for each observations given a MixtureModel.

robust = true will prevent the (log)likelihood to overflow to -∞ or ∞.

source

ExpectationMaximization.predict_proba — Function

predict_proba(mix::MixtureModel, y::AbstractVecOrMat; robust=false)

Evaluate the probability for each observations to belong to a category given a MixtureModel..

robust = true will prevent the (log)likelihood to under(overflow)flow to -∞ (or ∞).

source

`fit_mle` methods that should be in `Distribution.jl`

I opened two PRs, PR#1670 and PR#1676 to add these methods.

Distributions.fit_mle — Method

fit_mle(g::Product, x::AbstractMatrix)
fit_mle(g::Product, x::AbstractMatrix, γ::AbstractVector)

The fit_mle for multivariate Product distributions g is the product_distribution of fit_mle of each components of g. Product is meant to be depreacated in next versions of Distribution.jl. Use the analog VectorOfUnivariateDistribution type instead.

source

Distributions.fit_mle — Method

fit_mle(::Type{<:Dirac}, x::AbstractArray{<:Real}[, w::AbstractArray{<:Real}])

fit_mle for Dirac distribution (weighted or not) data sets.

source

Distributions.fit_mle — Method

fit_mle(::Type{<:Laplace}, x::AbstractArray{<:Real}, w::AbstractArray{<:Real})

fit_mle for Laplace distribution weighted data sets.

source

1Have a look at the Benchmark section.
2Rain is a good example of a mixture having both a discrete (Delta distribution in 0) and continuous (Exponential, Gamma, ...) component.