Comparison with other packages

This benchmark was inspired by this post. The full benchmark code is available as a Jupyter notebook and here.

Scope and limitations of competing packages

A key distinction of ExpectationMaximization.jl is its genericity: it works with any mixture of distributions supported by Distributions.jl (univariate, multivariate, continuous, discrete, or custom), without any modification to the core algorithm. The competing packages benchmarked here are, in contrast, largely restricted to Gaussian mixtures:

PackageLanguageGaussian only?Notes
SklearnPythonYesHardcoded Gaussian[2]
mixtoolsRMostlySupports some other families but not extensible
mixemPythonMostlyNumerically fragile[3]
GaussianMixtures.jlJuliaYesHardcoded Gaussian
ExpectationMaximization.jlJuliaNoAny Distributions.jl distribution

The benchmark below only tests the Gaussian mixture case (the most common), which is deliberately the strongest case for the specialized packages. Despite this, ExpectationMaximization.jl remains highly competitive.

Why is ExpectationMaximization.jl fast?

No heavy programming tricks are used. The performance comes from standard Julia best practices:

  • E-step: pre-allocated memory, @views, type-stable code, and logsumexp! from LogExpFunctions.jl.
  • M-step: delegates to fit_mle from Distributions.jl, which is well-optimized for each distribution (e.g., see the Multivariate Normal implementation).
Clean Julia code

Many more optimizations are possible, however, I'd like to keep the code as simple and readable as possible.

Results

timing_K_2

Or the ratio view:

timing_K_2_ratio

Conclusion: for Gaussian mixtures, ExpectationMaximization.jl is about 4× faster than Sklearn (Python) and 7× faster than mixtools (R), while being only slightly slower than the Gaussian-specialized GaussianMixtures.jl. Crucially, unlike all competing packages, ExpectationMaximization.jl handles arbitrary mixture distributions out of the box.

If you have comments to improve these benchmarks, they are welcome.

Benchmarking methodology

Cross-language comparisons are inherently imperfect[1]. PythonCall.jl and RCall.jl introduce a small overhead (~few milliseconds), which was verified to be negligible here.

  • 1@btime with RCall.jl and PythonCall.jl may add a small overhead; see this discussion. Timings were cross-checked against R microbenchmark and Python timeit, which gave consistent results. BenchmarkTools.jl automatically determines the number of repetitions needed for a reliable estimate.
  • 2Sklearn's GaussianMixture used to run K-means initialization even when initial conditions are explicitly provided — see this discussion, issue, and PR. I should be fix by now.
  • 3mixem overflows for $n \gtrsim 500$ due to a fragile logsumexp implementation and was excluded from the benchmark.