ExpectationMaximization.jl
This package provides a simple implementation of the Expectation Maximization (EM) algorithm used to fit mixture models. Due to Julia's amazing dispatch system, generic and reusable code spirit, and the Distributions.jl package, the code while being very generic is both very expressive and fast! Take a look at the Benchmark section.
What type of mixtures?
In particular, it works on a lot of mixtures:
- Mixture of Univariate continuous distributions
- Mixture of Univariate discrete distributions
- Mixture of Multivariate distributions (continuous or discrete)
- Mixture of mixtures (univariate or multivariate and continuous or discrete)
- User defined mixtures (e.g. custom distributions)
- More?
How?
Just define a mix::MixtureModel and do fit_mle(mix, y) where y is your observation array (vector or matrix). That's it! For Stochastic EM, just do fit_mle(mix, y, method = StochasticEM()). Take a look at the Examples section.
To work, the only requirements are that the components of the mixture dist ∈ dists = components(mix) considered (custom or coming from an existing package)
- Are a subtype of
Distributioni.e.dist<:Distribution. - The
logpdf(dist, y)is defined (it is used in the E-step) - The
fit_mle(dist, y, weights)returns the distribution with the updated parameters maximizing the likelihood. This is used in the M-step of theClassicalEMalgorithm. For theStochasticEMversion, onlyfit_mle(dist, y)is needed. Type or instance version offit_mlefor yourdistare accepted thanks to this conversion line.
In general, step 2. is easy, while step 3. is only known explicitly for a few common distributions. In step 3., if the fit_mle is not explicitly known, you can always implement a numerical scheme, if it exists, for fit_mle(dist, y) see Gamma distribution example or use tools like Optimizations.jl. Or, when possible, represent your “difficult” distribution as a mixture of simple terms. (I had this in mind, but it is not directly a mixture model.)
Distributions.jl currently does not allow MixtureModel to both have discrete and continuous components[2].
Algorithms
So far, the classic EM algorithm and the Stochastic EM are implemented. Look at the Bibliography section for references.
ExpectationMaximization.ClassicEM — Type
ClassicEM<:AbstractEMThe EM algorithm was introduced by A. P. Dempster, N. M. Laird and D. B. Rubin in 1977 in the reference paper Maximum Likelihood from Incomplete Data Via the EM Algorithm.
ExpectationMaximization.StochasticEM — Type
Base.@kwdef struct StochasticEM<:AbstractEM
rng::AbstractRNG = Random.GLOBAL_RNG
endThe Stochastic EM algorithm was introduced by G. Celeux, and J. Diebolt. in 1985 in The SEM Algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem.
The default random seed is Random.GLOBAL_RNG but it can be changed via StochasticEM(seed).
Main function
To fit the mixture, use the “instance” version of fit_mle(mix::MixtureModel, ...) as described below and NOT the “Type” version, i.e., fit_mle(Type{MixtureModel}, ...). The provided mix is used as the starting point of the EM algorithm. See Instance vs Type version section for more context.
Distributions.fit_mle — Method
fit_mle(mix::MixtureModel, y::AbstractVecOrMat, weights...; method = ClassicEM(), display=:none, maxiter=1000, atol=1e-3, rtol=nothing, robust=false, infos=false)Use the an Expectation Maximization (EM) algorithm to maximize the Loglikelihood (fit) the mixture with an i.i.d sample y. The mix input is a mixture that is used to initilize the EM algorithm.
weightswhen provided, it will compute a weighted version of the EM. (Useful for fitting mixture of mixtures)methoddetermines the algorithm used.infos = truereturns aDictwith informations on the algorithm (converged, iteration number, loglikelihood).robust = truewill prevent the (log)likelihood to overflow to-∞or∞.atolcriteria determining the convergence of the algorithm. If the Loglikelihood difference between two iterationiandi+1is smaller thanatoli.e.|ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<atol, the algorithm stops.rtolrelative tolerance for convergence,|ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<rtol*(|ℓ⁽ⁱ⁺¹⁾| + |ℓ⁽ⁱ⁾|)/2(does not check ifrtolisnothing)displayvalue can be:none,:iter,:finalto display Loglikelihood evolution at each iterations:iteror just the final one:final
Distributions.fit_mle — Method
fit_mle(mix::AbstractArray{<:MixtureModel}, y::AbstractVecOrMat, weights...; method = ClassicEM(), display=:none, maxiter=1000, atol=1e-3, rtol=nothing, robust=false, infos=false)Do the same as fit_mle for each (initial) mixtures in the mix array. Then it selects the one with the largest loglikelihood. Warning: It uses try and catch to avoid errors messages in case EM converges toward a singular solution (probably using robust should be enough in most case to avoid errors).
Utilities
ExpectationMaximization.predict — Function
predict(mix::MixtureModel, y::AbstractVector; robust=false)Evaluate the most likely category for each observations given a MixtureModel.
robust = truewill prevent the (log)likelihood to overflow to-∞or∞.
ExpectationMaximization.predict_proba — Function
predict_proba(mix::MixtureModel, y::AbstractVecOrMat; robust=false)Evaluate the probability for each observations to belong to a category given a MixtureModel..
robust = truewill prevent the (log)likelihood to under(overflow)flow to-∞(or∞).
fit_mle methods that should be in Distribution.jl
I opened two PRs, PR#1670 and PR#1676 to add these methods.
The "instance" version of fit_mle allows passing a distribution instance (e.g., Normal(0,1)) instead of a type (e.g., Normal). This is required for MixtureModel and ProductDistribution support.
Distributions.fit_mle — Method
Now "instance" version of fit_mle is supported (in addition of the current "type" version). Example: fit_mle(Bernoulli(0.2), x) is accepted in addition of fit_mle(Bernoulli, x) this allows compatibility with how fit_mle(g::Product) and fit_mle(g::MixtureModel) are written. #! Not 100% sure it will not cause any issuses or conflic! #! There might be another way to do with the type something like: #! https://discourse.julialang.org/t/ann-copulas-jl-a-fully-distributions-jl-compliant-copula-package/76544/12 #! MyMarginals = Tuple{LogNormal,Pareto,Gamma,Normal}; #! fitted_model = fit(SklarDist{MyCop,MyMarginals},data) #! and initial parameters as kwargs?
Distributions.fit_mle — Method
fit_mle(g::Product, x::AbstractMatrix)
fit_mle(g::Product, x::AbstractMatrix, γ::AbstractVector)The fit_mle for multivariate Product distributions g is the product_distribution of fit_mle of each components of g. Product is meant to be depreacated in next versions of Distribution.jl. Use the analog VectorOfUnivariateDistribution type instead.
Distributions.fit_mle — Method
fit_mle(dists::ArrayOfUnivariateDistribution, x::AbstractArray)
fit_mle(dists::ArrayOfUnivariateDistribution, x::AbstractArray, γ::AbstractVector)The fit_mle for a ArrayOfUnivariateDistribution distributions dists is the product_distribution of fit_mle of each components of dists. VectorOfUnivariateDistribution should act like old Product while ArrayOfUnivariateDistribution are not really tested yet.
Distributions.fit_mle — Method
fit_mle(::Type{<:Dirac}, x::AbstractArray{<:Real}[, w::AbstractArray{<:Real}])fit_mle for Dirac distribution (weighted or not) data sets.
Distributions.fit_mle — Method
fit_mle(::Type{<:Laplace}, x::AbstractArray{<:Real}, w::AbstractArray{<:Real})fit_mle for Laplace distribution weighted data sets.
Distributions.fit_mle — Method
fit_mle(::Type{<:Uniform}, x::AbstractArray{<:Real}, w::AbstractArray{<:Real})fit_mle for Uniform distribution weighted data sets. It is just the same as unweigted (removing zero weighted data).
Low-level API
The following functions implement the inner loop of the EM algorithms. They can be extended to support custom behavior.
ExpectationMaximization.fit_mle! — Method
fit_mle!(α::AbstractVector, dists::AbstractVector{F} where {F<:Distribution}, y::AbstractVecOrMat, method::ClassicEM; display=:none, maxiter=1000, atol=1e-3, rtol=nothing, robust=false)Use the EM algorithm to update the Distribution dists and weights α composing a mixture distribution.
robust = truewill prevent the (log)likelihood to overflow to-∞or∞.atolcriteria determining the convergence of the algorithm. If the Loglikelihood difference between two iterationiandi+1is smaller thanatoli.e.|ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<atol, the algorithm stops.rtolrelative tolerance for convergence,|ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<rtol*(|ℓ⁽ⁱ⁺¹⁾| + |ℓ⁽ⁱ⁾|)/2(does not check ifrtolisnothing)displayvalue can be:none,:iter,:finalto display Loglikelihood evolution at each iterations:iteror just the final one:final
ExpectationMaximization.fit_mle! — Method
fit_mle!(α::AbstractVector, dists::AbstractVector{F} where {F<:Distribution}, y::AbstractVecOrMat, method::StochasticEM; display=:none, maxiter=1000, atol=1e-3, robust=false)Use the stochastic EM algorithm to update the Distribution dists and weights α composing a mixture distribution.
robust = truewill prevent the (log)likelihood to overflow to-∞or∞.atolcriteria determining the convergence of the algorithm. If the Loglikelihood difference between two iterationiandi+1is smaller thanatoli.e.|ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<atol, the algorithm stops.rtolrelative tolerance for convergence,|ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<rtol*(|ℓ⁽ⁱ⁺¹⁾| + |ℓ⁽ⁱ⁾|)/2(does not check ifrtolisnothing)displayvalue can be:none,:iter,:finalto display Loglikelihood evolution at each iterations:iteror just the final one:final
ExpectationMaximization.M_step! — Function
M_step!(α, dists, y, γ, method::ClassicEM)For the ClassicEM the weigths γ computed at E-step for each observation in y are used to update α and dists.
M_step!(α, dists, y, cat, method::StochasticEM)For the StochasticEM the cat drawn at S-step for each observation in y is used to update α and dists.
Index
ExpectationMaximization.ClassicEMExpectationMaximization.StochasticEMDistributions.fit_mleDistributions.fit_mleDistributions.fit_mleDistributions.fit_mleDistributions.fit_mleDistributions.fit_mleDistributions.fit_mleDistributions.fit_mleExpectationMaximization.M_step!ExpectationMaximization.fit_mle!ExpectationMaximization.fit_mle!ExpectationMaximization.predictExpectationMaximization.predict_proba
- 2Rain is a good example of a mixture having both a discrete (
Deltadistribution in0) and continuous (Exponential,Gamma, ...) component.