Benchmarks

I was inspired by this benchmark. I am not too sure how to do 100% fair comparisons across languages^[1]. There is a small overhead for using PythonCall.jl and RCall.jl. I checked that it was small in my experimentation (~ few milliseconds?). Here is the Jupyter notebook of the benchmark.

I test only the Gaussian Mixture case, since it is the most common type of mixture (remember that this package allows plenty of other mixtures).

In the code, I did not use (too much) fancy programming tricks, the speed only comes mostly from Julia usual performance tips:

E-step: Pre-allocating memory, using @views, type-stable code (could be improved here) + the package LogExpFunctions.jl for logsumexp! function.
M-step: fit_mle for each distribution coming from the Distributions.jl package. In principle, this should be quite fast. For example, look at the Multivariate Normal code.

Univariate Gaussian mixture with 2 components

I compare with Sklearn.py^[2], mixtool.R, mixem.py^[3]. I wanted to try mclust, but I did not manage to specify initial conditions.

Overall, mixtool.R and mixem.py were constructed in a similar spirit as this package, making them easy to use for me. Sklearn.py is built to match the Sklearn format (all in one). GaussianMixturesModel.jl is built with a similar vibe.

If you have comments to improve these benchmarks, they are welcome.

You can find the benchmark code here.

timing_K_2

Or the ratio view:

timing_K_2_ratio Conclusion: for Gaussian mixtures, ExpectationMaximization.jl is about 4 times faster than Python Sklearn and 7 times faster than R mixtools implementations and slightly slower than the specialized Julia package GaussianMixturesModel.jl.

I did compare with R microbenchmark and Python timeit and they produced very similar timing, but in my experience BenchmarkTools.jl is smarter and simpler to use, i.e., it will figure out the number of repetitions to do based on the run.

Last, the step-by-step likelihood of Sklearn is not the same as outputted by ExpectationMaximization.jl and mixtool.R (both agree), so I am a bit suspicious.

1Note that @btime with RCall.jl and PythonCall.jl might produce a small-time overhead compared to the pure R/Python time; see here for example.
2There is a suspect trigger warning regarding K-means which I do not want to use here. I asked a question here. It led to this issue and that PR. It turns out that even if initial conditions were provided, the K-mean was still computed. However, to this day (23-11-29) with scikit-learn 1.3.2 it still gets the warning. Maybe it will be in the next release? I also noted this recent PR.
3It overflows very quickly for $n>500$ or so. I think it is because of the implementation of logsumexp. So I eventually did not include the result in the benchmark.