ExpectationMaximization.jl
This package provides a simple implementation of the Expectation Maximization (EM) algorithm used to fit mixture models. Due to Julia amazing dispatch systems, generic and reusable code spirit, and the Distributions.jl package, the code while being very generic is both very expressive and fast[1]!
What type of mixtures?
In particular, it works on a lot of mixtures:
- Mixture of Univariate continuous distributions
- Mixture of Univariate discrete distributions
- Mixture of Multivariate distributions (continuous or discrete)
- Mixture of mixtures (univariate or multivariate and continuous or discrete)
- More?
How?
Just define a mix::MixtureModel
and do fit_mle(mix, y)
with your data y
and that's it! Have a look at the examples section.
To work, the only requirements are that the components of the mixture dist ∈ dists = components(mix)
considered (custom or coming from an existing package)
- Are a subtype of
Distribution
i.e.dist<:Distribution
. - The
logpdf(dist, y)
is defined (it is used in the E-step) - The
fit_mle(dist, y, weigths)
returns the distribution with parameters equals to MLE. This is used in the M-step of theClassicalEM
algorithm. For theStocasticEM
version, onlyfit_mle(dist, y)
is needed. Type or instance version offit_mle
for yourdist
are accepted thanks to this conversion line.
In general, 2. is easy, while 3. is only known explicitly for a few common distributions. In case 2. is not explicit known, you can always implement a numerical scheme, if it exists, for fit_mle(dist, y)
see Gamma
distribution example. Or, when possible, represent your “difficult” distribution as a mixture of simple terms. (I had this in mind, but it is not directly a mixture model.)
Distributions.jl currently does not allow MixtureModel
to both have discrete and continuous components[2].
Algorithms
ExpectationMaximization.ClassicEM
— TypeClassicEM<:AbstractEM
The EM algorithm was introduced by A. P. Dempster, N. M. Laird and D. B. Rubin in 1977 in the reference paper Maximum Likelihood from Incomplete Data Via the EM Algorithm.
ExpectationMaximization.StochasticEM
— TypeBase.@kwdef struct StochasticEM<:AbstractEM
rng::AbstractRNG = Random.GLOBAL_RNG
end
The Stochastic EM algorithm was introduced by G. Celeux, and J. Diebolt. in 1985 in The SEM Algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem.
The default random seed is Random.GLOBAL_RNG
but it can be changed via StochasticEM(seed)
.
Main function
To fit the mixture, use the “instance” version of fit_mle(mix::MixtureModel, ...)
as described below and NOT the “Type” version, i.e., fit_mle(Type{MixtureModel}, ...)
. The provided mix
is used as the starting point of the EM algorithm. See Instance vs Type version section for more context.
Distributions.fit_mle
— Methodfit_mle(mix::MixtureModel, y::AbstractVecOrMat, weights...; method = ClassicEM(), display=:none, maxiter=1000, atol=1e-3, rtol=nothing, robust=false, infos=false)
Use the an Expectation Maximization (EM) algorithm to maximize the Loglikelihood (fit) the mixture with an i.i.d sample y
. The mix
input is a mixture that is used to initilize the EM algorithm.
weights
when provided, it will compute a weighted version of the EM. (Useful for fitting mixture of mixtures)method
determines the algorithm used.infos = true
returns aDict
with informations on the algorithm (converged, iteration number, loglikelihood).robust = true
will prevent the (log)likelihood to overflow to-∞
or∞
.atol
criteria determining the convergence of the algorithm. If the Loglikelihood difference between two iterationi
andi+1
is smaller thanatol
i.e.|ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<atol
, the algorithm stops.rtol
relative tolerance for convergence,|ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<rtol*(|ℓ⁽ⁱ⁺¹⁾| + |ℓ⁽ⁱ⁾|)/2
(does not check ifrtol
isnothing
)display
value can be:none
,:iter
,:final
to display Loglikelihood evolution at each iterations:iter
or just the final one:final
Distributions.fit_mle
— Methodfit_mle(mix::AbstractArray{<:MixtureModel}, y::AbstractVecOrMat, weights...; method = ClassicEM(), display=:none, maxiter=1000, atol=1e-3, rtol=nothing, robust=false, infos=false)
Do the same as fit_mle
for each (initial) mixtures in the mix array. Then it selects the one with the largest loglikelihood. Warning: It uses try and catch to avoid errors messages in case EM converges toward a singular solution (probably using robust should be enough in most case to avoid errors).
Utilities
ExpectationMaximization.predict
— Functionpredict(mix::MixtureModel, y::AbstractVector; robust=false)
Evaluate the most likely category for each observations given a MixtureModel
.
robust = true
will prevent the (log)likelihood to overflow to-∞
or∞
.
ExpectationMaximization.predict_proba
— Functionpredict_proba(mix::MixtureModel, y::AbstractVecOrMat; robust=false)
Evaluate the probability for each observations to belong to a category given a MixtureModel
..
robust = true
will prevent the (log)likelihood to under(overflow)flow to-∞
(or∞
).
fit_mle
methods that should be in Distribution.jl
I opened two PRs, PR#1670 and PR#1676 to add these methods.
Distributions.fit_mle
— Methodfit_mle(g::Product, x::AbstractMatrix)
fit_mle(g::Product, x::AbstractMatrix, γ::AbstractVector)
The fit_mle
for multivariate Product distributions g
is the product_distribution
of fit_mle
of each components of g
. Product is meant to be depreacated in next version of Distribution.jl
. Use the analog VectorOfUnivariateDistribution
type instead.
Distributions.fit_mle
— Methodfit_mle(::Type{<:Dirac}, x::AbstractArray{<:Real}[, w::AbstractArray{<:Real}])
fit_mle
for Dirac
distribution (weighted or not) data sets.
Distributions.fit_mle
— Methodfit_mle(::Type{<:Laplace}, x::AbstractArray{<:Real}, w::AbstractArray{<:Real})
fit_mle
for Laplace
distribution weighted data sets.
Index
ExpectationMaximization.ClassicEM
ExpectationMaximization.StochasticEM
Distributions.fit_mle
Distributions.fit_mle
Distributions.fit_mle
Distributions.fit_mle
Distributions.fit_mle
ExpectationMaximization.predict
ExpectationMaximization.predict_proba
- 1Have a look at the Benchmark section.
- 2Rain is a good example of mixture having both a discrete (
Delta
distribution in0
) and continuous (Exponential
,Gamma
, ...) component.