ExpectationMaximization.jl

This package provides a simple implementation of the Expectation Maximization (EM) algorithm used to fit mixture models. Due to Julia's amazing dispatch system, generic and reusable code spirit, and the Distributions.jl package, the code while being very generic is both very expressive and fast! Take a look at the Benchmark section.

What type of mixtures?

In particular, it works on a lot of mixtures:

  • Mixture of Univariate continuous distributions
  • Mixture of Univariate discrete distributions
  • Mixture of Multivariate distributions (continuous or discrete)
  • Mixture of mixtures (univariate or multivariate and continuous or discrete)
  • User defined mixtures (e.g. custom distributions)
  • More?

How?

Just define a mix::MixtureModel and do fit_mle(mix, y) where y is your observation array (vector or matrix). That's it! For Stochastic EM, just do fit_mle(mix, y, method = StochasticEM()). Take a look at the Examples section.

To work, the only requirements are that the components of the mixture dist ∈ dists = components(mix) considered (custom or coming from an existing package)

  1. Are a subtype of Distribution i.e. dist<:Distribution.
  2. The logpdf(dist, y) is defined (it is used in the E-step)
  3. The fit_mle(dist, y, weights) returns the distribution with the updated parameters maximizing the likelihood. This is used in the M-step of the ClassicalEM algorithm. For the StochasticEM version, only fit_mle(dist, y) is needed. Type or instance version of fit_mle for your dist are accepted thanks to this conversion line.

In general, step 2. is easy, while step 3. is only known explicitly for a few common distributions. In step 3., if the fit_mle is not explicitly known, you can always implement a numerical scheme, if it exists, for fit_mle(dist, y) see Gamma distribution example or use tools like Optimizations.jl. Or, when possible, represent your “difficult” distribution as a mixture of simple terms. (I had this in mind, but it is not directly a mixture model.)

Note

Distributions.jl currently does not allow MixtureModel to both have discrete and continuous components[2].

Algorithms

So far, the classic EM algorithm and the Stochastic EM are implemented. Look at the Bibliography section for references.

Main function

Warning

To fit the mixture, use the “instance” version of fit_mle(mix::MixtureModel, ...) as described below and NOT the “Type” version, i.e., fit_mle(Type{MixtureModel}, ...). The provided mix is used as the starting point of the EM algorithm. See Instance vs Type version section for more context.

Distributions.fit_mleMethod
fit_mle(mix::MixtureModel, y::AbstractVecOrMat, weights...; method = ClassicEM(), display=:none, maxiter=1000, atol=1e-3, rtol=nothing, robust=false, infos=false)

Use the an Expectation Maximization (EM) algorithm to maximize the Loglikelihood (fit) the mixture with an i.i.d sample y. The mix input is a mixture that is used to initilize the EM algorithm.

  • weights when provided, it will compute a weighted version of the EM. (Useful for fitting mixture of mixtures)
  • method determines the algorithm used.
  • infos = true returns a Dict with informations on the algorithm (converged, iteration number, loglikelihood).
  • robust = true will prevent the (log)likelihood to overflow to -∞ or .
  • atol criteria determining the convergence of the algorithm. If the Loglikelihood difference between two iteration i and i+1 is smaller than atol i.e. |ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<atol, the algorithm stops.
  • rtol relative tolerance for convergence, |ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<rtol*(|ℓ⁽ⁱ⁺¹⁾| + |ℓ⁽ⁱ⁾|)/2 (does not check if rtol is nothing)
  • display value can be :none, :iter, :final to display Loglikelihood evolution at each iterations :iter or just the final one :final
source
Distributions.fit_mleMethod
fit_mle(mix::AbstractArray{<:MixtureModel}, y::AbstractVecOrMat, weights...; method = ClassicEM(), display=:none, maxiter=1000, atol=1e-3, rtol=nothing, robust=false, infos=false)

Do the same as fit_mle for each (initial) mixtures in the mix array. Then it selects the one with the largest loglikelihood. Warning: It uses try and catch to avoid errors messages in case EM converges toward a singular solution (probably using robust should be enough in most case to avoid errors).

source

Utilities

ExpectationMaximization.predictFunction
predict(mix::MixtureModel, y::AbstractVector; robust=false)

Evaluate the most likely category for each observations given a MixtureModel.

  • robust = true will prevent the (log)likelihood to overflow to -∞ or .
source
ExpectationMaximization.predict_probaFunction
predict_proba(mix::MixtureModel, y::AbstractVecOrMat; robust=false)

Evaluate the probability for each observations to belong to a category given a MixtureModel..

  • robust = true will prevent the (log)likelihood to under(overflow)flow to -∞ (or ).
source

fit_mle methods that should be in Distribution.jl

I opened two PRs, PR#1670 and PR#1676 to add these methods.

The "instance" version of fit_mle allows passing a distribution instance (e.g., Normal(0,1)) instead of a type (e.g., Normal). This is required for MixtureModel and ProductDistribution support.

Distributions.fit_mleMethod

Now "instance" version of fit_mle is supported (in addition of the current "type" version). Example: fit_mle(Bernoulli(0.2), x) is accepted in addition of fit_mle(Bernoulli, x) this allows compatibility with how fit_mle(g::Product) and fit_mle(g::MixtureModel) are written. #! Not 100% sure it will not cause any issuses or conflic! #! There might be another way to do with the type something like: #! https://discourse.julialang.org/t/ann-copulas-jl-a-fully-distributions-jl-compliant-copula-package/76544/12 #! MyMarginals = Tuple{LogNormal,Pareto,Gamma,Normal}; #! fitted_model = fit(SklarDist{MyCop,MyMarginals},data) #! and initial parameters as kwargs?

source
Distributions.fit_mleMethod
fit_mle(g::Product, x::AbstractMatrix)
fit_mle(g::Product, x::AbstractMatrix, γ::AbstractVector)

The fit_mle for multivariate Product distributions g is the product_distribution of fit_mle of each components of g. Product is meant to be depreacated in next versions of Distribution.jl. Use the analog VectorOfUnivariateDistribution type instead.

source
Distributions.fit_mleMethod
fit_mle(dists::ArrayOfUnivariateDistribution, x::AbstractArray)
fit_mle(dists::ArrayOfUnivariateDistribution, x::AbstractArray, γ::AbstractVector)

The fit_mle for a ArrayOfUnivariateDistribution distributions dists is the product_distribution of fit_mle of each components of dists. VectorOfUnivariateDistribution should act like old Product while ArrayOfUnivariateDistribution are not really tested yet.

source
Distributions.fit_mleMethod
fit_mle(::Type{<:Dirac}, x::AbstractArray{<:Real}[, w::AbstractArray{<:Real}])

fit_mle for Dirac distribution (weighted or not) data sets.

source
Distributions.fit_mleMethod
fit_mle(::Type{<:Laplace}, x::AbstractArray{<:Real}, w::AbstractArray{<:Real})

fit_mle for Laplace distribution weighted data sets.

source
Distributions.fit_mleMethod
fit_mle(::Type{<:Uniform}, x::AbstractArray{<:Real}, w::AbstractArray{<:Real})

fit_mle for Uniform distribution weighted data sets. It is just the same as unweigted (removing zero weighted data).

source

Low-level API

The following functions implement the inner loop of the EM algorithms. They can be extended to support custom behavior.

ExpectationMaximization.fit_mle!Method
fit_mle!(α::AbstractVector, dists::AbstractVector{F} where {F<:Distribution}, y::AbstractVecOrMat, method::ClassicEM; display=:none, maxiter=1000, atol=1e-3, rtol=nothing, robust=false)

Use the EM algorithm to update the Distribution dists and weights α composing a mixture distribution.

  • robust = true will prevent the (log)likelihood to overflow to -∞ or .
  • atol criteria determining the convergence of the algorithm. If the Loglikelihood difference between two iteration i and i+1 is smaller than atol i.e. |ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<atol, the algorithm stops.
  • rtol relative tolerance for convergence, |ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<rtol*(|ℓ⁽ⁱ⁺¹⁾| + |ℓ⁽ⁱ⁾|)/2 (does not check if rtol is nothing)
  • display value can be :none, :iter, :final to display Loglikelihood evolution at each iterations :iter or just the final one :final
source
ExpectationMaximization.fit_mle!Method
fit_mle!(α::AbstractVector, dists::AbstractVector{F} where {F<:Distribution}, y::AbstractVecOrMat, method::StochasticEM; display=:none, maxiter=1000, atol=1e-3, robust=false)

Use the stochastic EM algorithm to update the Distribution dists and weights α composing a mixture distribution.

  • robust = true will prevent the (log)likelihood to overflow to -∞ or .
  • atol criteria determining the convergence of the algorithm. If the Loglikelihood difference between two iteration i and i+1 is smaller than atol i.e. |ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<atol, the algorithm stops.
  • rtol relative tolerance for convergence, |ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<rtol*(|ℓ⁽ⁱ⁺¹⁾| + |ℓ⁽ⁱ⁾|)/2 (does not check if rtol is nothing)
  • display value can be :none, :iter, :final to display Loglikelihood evolution at each iterations :iter or just the final one :final
source
ExpectationMaximization.M_step!Function
M_step!(α, dists, y, γ, method::ClassicEM)

For the ClassicEM the weigths γ computed at E-step for each observation in y are used to update α and dists.

source
M_step!(α, dists, y, cat, method::StochasticEM)

For the StochasticEM the cat drawn at S-step for each observation in y is used to update α and dists.

source

Index

  • 2Rain is a good example of a mixture having both a discrete (Delta distribution in 0) and continuous (Exponential, Gamma, ...) component.