Coding Stochastic Weather Generators: Challenges and Perspectives

Author

David Métivier

Published

December 8, 2025

The notebook is the companion to the talk Coding Stochastic Weather Generators: Challenges and Perspectives given at the SWGEN 2025 - Conference on Stochastic Weather Generators.

It showcases two challenges encountered when developing Stochastic Weather Generators (SWGs):

cd(@__DIR__)
import Pkg
Pkg.activate(".");
using PrettyTables
  Activating project at `C:\Users\metivier\Dropbox\PC (2)\Documents\GitHub\Pluto_export\SWG_coding_challenges`
# import PythonCall #if you want to test Python version
import RCall

Benchmarking HMM Simulation across languages

All implementations have been generated by a LLM. All versions could be further optimized, but the goal here is to illustrate typical performance differences “out of the box”. We will illustrate with a simple Hidden Markov Model (HMM) with 4 states and 12-dimensional multivariate normal emissions.

TipUsing R, Python and C from Julia

Using packages such as RCall.jl and PythonCall.jl, Julia can seamlessly call R and Python code from within Julia scripts or notebooks. C code can be compiled into shared libraries and called using Julia’s ccall interface.

Julia Version

In the Julia version, we use Distributions.jl for an arbitrary distribution interface. Thanks to Julia’s multiple dispatch, switching to other emission distributions (e.g., Poisson, Gamma) is straightforward.

using Distributions, LinearAlgebra

function rand_HMM(Q, dist, N)
    Z = zeros(Int, N)
    d = length(dist[1])  # dimension of observations
    Y = zeros(d, N)
    Z[1] = 1
    Y[:, 1] = rand(dist[Z[1]])
    for t in 2:N
        Z[t] = rand(Categorical(Q[Z[t-1], :]))
        Y[:, t] = rand(dist[Z[t]])
    end
    return Z, Y
end
rand_HMM (generic function with 1 method)

R Version

In the R version, we use the mvtnorm package for multivariate normal sampling. Compared to the Julia version above, the MvNormal distribution has to be hard-coded.

library(mvtnorm)

rand_HMM <- function(Q, means_list, Sigma_list, N) {
    n_states <- nrow(Q)
    d <- length(means_list[[1]])  # dimension
    Z <- integer(N)
    Y <- matrix(0, nrow = d, ncol = N)
    
    Z[1] <- 1
    Y[, 1] <- rmvnorm(1, mean = means_list[[Z[1]]], sigma = Sigma_list[[Z[1]]])
    
    for (t in 2:N) {
        Z[t] <- sample(1:n_states, size = 1, prob = Q[Z[t-1], ])
        Y[, t] <- rmvnorm(1, mean = means_list[[Z[t]]], sigma = Sigma_list[[Z[t]]])
    }
    return(list(Z = Z, Y = Y))
};
Warning: RCall.jl: Warning: package 'mvtnorm' was built under R version 4.4.2
@ RCall C:\Users\metivier\.julia\packages\RCall\HIJsY\src\io.jl:171
RCall.RFunction{RCall.RObject{RCall.ClosSxp}}(RCall.RObject{RCall.ClosSxp}
function (Q, means_list, Sigma_list, N) 
{
    n_states <- nrow(Q)
    d <- length(means_list[[1]])
    Z <- integer(N)
    Y <- matrix(0, nrow = d, ncol = N)
    Z[1] <- 1
    Y[, 1] <- rmvnorm(1, mean = means_list[[Z[1]]], sigma = Sigma_list[[Z[1]]])
    for (t in 2:N) {
        Z[t] <- sample(1:n_states, size = 1, prob = Q[Z[t - 1], 
            ])
        Y[, t] <- rmvnorm(1, mean = means_list[[Z[t]]], sigma = Sigma_list[[Z[t]]])
    }
    return(list(Z = Z, Y = Y))
}
)

Python Version

The Python version also has to hard-code the MvNormal distribution. It was not tested because Python is not installed on my computer.

import numpy as np

def rand_HMM(Q, means_list, Sigma_list, N):
    n_states = len(Q)
    d = len(means_list[0])  # dimension
    Z = np.zeros(N, dtype=int)
    Y = np.zeros((d, N))
    
    Z[0] = 0
    Y[:, 0] = np.random.multivariate_normal(means_list[Z[0]], Sigma_list[Z[0]])
    
    for t in range(1, N):
        Z[t] = np.random.choice(n_states, p=Q[Z[t-1], :])
        Y[:, t] = np.random.multivariate_normal(means_list[Z[t]], Sigma_list[Z[t]])
    
    return Z, Y

C Version (called from Julia)

Here is the C code generated by the LLM on the first try. It performs manual Cholesky decomposition and Box-Muller sampling of standard normals.

Tip

This C code could be further optimized using optimizations flags e.g. -O3 -ffast-math -msse3 during compilation, but the goal here is to illustrate typical performance of C code written by a non-expert. This type of optimization is also possible in Julia with macros such as @inbounds, @fastmath and @simd. Note that this type of optimization trades off safety for speed and should be used with caution.

using Libdl # Dynamic Linker to find path for C

The C code is folded below.

Code
C_code_v1 = """
#include <stddef.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>

// Cholesky decomposition (simple version, assumes positive definite)
void cholesky(double *A, double *L, size_t d) {
    memset(L, 0, d * d * sizeof(double));
    for (size_t i = 0; i < d; i++) {
        for (size_t j = 0; j <= i; j++) {
            double sum = 0.0;
            for (size_t k = 0; k < j; k++) {
                sum += L[i * d + k] * L[j * d + k];
            }
            if (i == j) {
                L[i * d + j] = sqrt(A[i * d + i] - sum);
            } else {
                L[i * d + j] = (A[i * d + j] - sum) / L[j * d + j];
            }
        }
    }
}

void rand_HMM_c_v1(size_t N, size_t d, double *Q, size_t n_states, 
                double *means, double *Sigmas, 
                int *Z, double *Y, unsigned int seed) {
    srand(seed);
    
    // Precompute Cholesky decompositions
    double *L_all = (double *)malloc(n_states * d * d * sizeof(double));
    for (size_t s = 0; s < n_states; s++) {
        cholesky(&Sigmas[s * d * d], &L_all[s * d * d], d);
    }
    
    Z[0] = 0;
    // Sample first observation
    for (size_t j = 0; j < d; j++) {
        double z = 0.0;
        for (size_t k = 0; k < d; k++) {
            double u1 = (double)rand() / RAND_MAX;
            double u2 = (double)rand() / RAND_MAX;
            double std_normal = sqrt(-2.0 * log(u1)) * cos(2.0 * M_PI * u2);
            z += L_all[Z[0] * d * d + j * d + k] * std_normal;
        }
        Y[j * N + 0] = means[Z[0] * d + j] + z;
    }
    
    for (size_t t = 1; t < N; t++) {
        // Sample next state
        double r = (double)rand() / RAND_MAX;
        double cum_prob = 0.0;
        for (size_t s = 0; s < n_states; s++) {
            cum_prob += Q[Z[t-1] * n_states + s];
            if (r <= cum_prob) {
                Z[t] = s;
                break;
            }
        }
        
        // Sample multivariate normal observation
        for (size_t j = 0; j < d; j++) {
            double z = 0.0;
            for (size_t k = 0; k < d; k++) {
                double u1 = (double)rand() / RAND_MAX;
                double u2 = (double)rand() / RAND_MAX;
                double std_normal = sqrt(-2.0 * log(u1)) * cos(2.0 * M_PI * u2);
                z += L_all[Z[t] * d * d + j * d + k] * std_normal;
            }
            Y[j * N + t] = means[Z[t] * d + j] + z;
        }
    }
    
    free(L_all);
}
"""
"#include <stddef.h>\n#include <stdlib.h>\n#include <math.h>\n#include <string.h>\n\n// Cholesky decomposition (simple version, assumes positive definite)\nvoid cholesky(double *A, double *L, size_t d) {\n    memset(L, 0, d * d * sizeof(double));\n    for (size_t i = 0; i < d; i++) {\n   " ⋯ 1872 bytes ⋯ "double)rand() / RAND_MAX;\n                double std_normal = sqrt(-2.0 * log(u1)) * cos(2.0 * M_PI * u2);\n                z += L_all[Z[t] * d * d + j * d + k] * std_normal;\n            }\n            Y[j * N + t] = means[Z[t] * d + j] + z;\n        }\n    }\n    \n    free(L_all);\n}\n"

The Julia wrapper to compile and call the C function:

const Clib_path_v1 = tempname()

open(`gcc -fPIC -xc -shared -o $(Clib_path_v1 * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code_v1)
end

function c_rand_HMM_v1(Q::Matrix{Float64}, means::Vector{Vector{Float64}},
    Sigmas::Vector{Matrix{Float64}}, N::Int)
    n_states = size(Q, 1)
    d = length(means[1])
    Z = zeros(Int32, N)
    Y = zeros(Float64, d, N)

    # Flatten means and Sigmas for C
    means_flat = reduce(vcat, means)
    Sigmas_flat = reduce(vcat, [vec(Sigmas[s]') for s in 1:n_states])

    seed = rand(UInt32)
    ccall(("rand_HMM_c_v1", Clib_path_v1), Cvoid,
        (Csize_t, Csize_t, Ptr{Float64}, Csize_t, Ptr{Float64}, Ptr{Float64},
            Ptr{Int32}, Ptr{Float64}, UInt32),
        N, d, Q, n_states, means_flat, Sigmas_flat, Z, Y, seed)
    return Z, Y
end
c_rand_HMM_v1 (generic function with 1 method)

Here is another version assuming that the Cholesky factors of the covariance matrices are precomputed in Julia and passed to C, along with standard normal random variables. So the C code only handles a loop, state sampling and matrix-vector products. After some try, I could not get a pure C version with native C libraries for Cholesky and normal sampling to work on my computer.

The C code is folded below.

Code
C_code_v2 = """
#include <stddef.h>
#include <stdlib.h>
#include <string.h>

// rand_HMM_c: assumes Julia provides L_all (Cholesky factors) and E (standard normals)
// Q: n_states x n_states row-stochastic transition matrix
// means: n_states x d means (flattened row-major per state)
// L_all: n_states x d x d lower-triangular Cholesky factors (flattened)
// E: d x N matrix of standard normals (flattened column-major as passed from Julia)
void rand_HMM_c_v2(size_t N, size_t d, double *Q, size_t n_states,
                double *means, double *L_all, double *E,
                int *Z, double *Y, unsigned int seed) {
    // Simple LCG for reproducible state sampling (optional: srand)
    srand(seed);

    // Initial state
    Z[0] = 0;

    // First observation: y = μ + L * e
    for (size_t j = 0; j < d; j++) {
        double z = 0.0;
        for (size_t k = 0; k < d; k++) {
            double e = E[k * N + 0];
            z += L_all[Z[0] * d * d + j * d + k] * e;
        }
        Y[j * N + 0] = means[Z[0] * d + j] + z;
    }

    for (size_t t = 1; t < N; t++) {
        // Sample next state from Q[Z[t-1], :]
        double r = (double)rand() / RAND_MAX;
        double cum_prob = 0.0;
        for (size_t s = 0; s < n_states; s++) {
            cum_prob += Q[Z[t-1] * n_states + s];
            if (r <= cum_prob) {
                Z[t] = s;
                break;
            }
        }

        // Observation: y = μ + L * e
        for (size_t j = 0; j < d; j++) {
            double z = 0.0;
            for (size_t k = 0; k < d; k++) {
                double e = E[k * N + t];
                z += L_all[Z[t] * d * d + j * d + k] * e;
            }
            Y[j * N + t] = means[Z[t] * d + j] + z;
        }
    }
}
"""
"#include <stddef.h>\n#include <stdlib.h>\n#include <string.h>\n\n// rand_HMM_c: assumes Julia provides L_all (Cholesky factors) and E (standard normals)\n// Q: n_states x n_states row-stochastic transition matrix\n// means: n_states x d means (flattened row-major per state)\n// L_all" ⋯ 1173 bytes ⋯ "_t j = 0; j < d; j++) {\n            double z = 0.0;\n            for (size_t k = 0; k < d; k++) {\n                double e = E[k * N + t];\n                z += L_all[Z[t] * d * d + j * d + k] * e;\n            }\n            Y[j * N + t] = means[Z[t] * d + j] + z;\n        }\n    }\n}\n"
const Clib_path_v2 = tempname()

open(`gcc -fPIC -xc -shared -o $(Clib_path_v2 * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code_v2)
end

function c_rand_HMM_v2(Q::Matrix{Float64}, means::Vector{Vector{Float64}},
    Sigmas::Vector{Matrix{Float64}}, N::Int)
    n_states = size(Q, 1)
    d = length(means[1])
    Z = zeros(Int32, N)
    Y = zeros(Float64, d, N)
    # Flatten means for C
    means_flat = reduce(vcat, means)
    # Precompute Cholesky factors in Julia and flatten for C
    L_all = reduce(vcat, [vec(cholesky(Sigmas[s]).L') for s in 1:n_states])
    # Generate standard normals in Julia (d x N) and flatten column-major
    E = randn(d, N)

    seed = rand(UInt32)
    ccall(("rand_HMM_c_v2", Clib_path_v2), Cvoid,
        (Csize_t, Csize_t, Ptr{Float64}, Csize_t, Ptr{Float64}, Ptr{Float64}, Ptr{Float64},
            Ptr{Int32}, Ptr{Float64}, UInt32),
        N, d, Q, n_states, means_flat, L_all, E, Z, Y, seed)
    return Z, Y
end
c_rand_HMM_v2 (generic function with 1 method)

Benchmark Setup

Define a 4-state HMM with 12-dimensional multivariate normal emissions:

using BenchmarkTools
using Random

Random.seed!(123)

# Transition matrix
Q = [0.7 0.15 0.10 0.05;
    0.2 0.6 0.15 0.05;
    0.1 0.2 0.6 0.10;
    0.05 0.1 0.15 0.7]

# Generate parameters for 4 states, 12 dimensions
d = 12
n_states = 4
means_vec = [randn(d) .+ i for i in 1:n_states]
Sigmas_vec = [Matrix(Diagonal(rand(d) .+ 0.5)) for _ in 1:n_states]

# Julia version with Distributions.jl
dist = [MvNormal(means_vec[i], Sigmas_vec[i]) for i in 1:n_states]

N = 10_000*10  # 10k*10 time steps ≃ 27*10 years of daily data
100000

Julia Benchmark

time_julia = @belapsed rand_HMM(Q, dist, N);

Pass parameters to R

RCall.@rput Q
RCall.@rput N
RCall.@rput d
RCall.@rput n_states
RCall.@rput means_vec
RCall.@rput Sigmas_vec;

R Benchmark

time_R = @belapsed RCall.R"rand_HMM(Q, means_vec, Sigmas_vec, N)";

C Benchmark

time_C_v1 = @belapsed c_rand_HMM_v1(Q, means_vec, Sigmas_vec, N);
time_C_v2 = @belapsed c_rand_HMM_v2(Q, means_vec, Sigmas_vec, N);

Results Summary

Code
ratio_C_v1 = time_C_v1 / time_julia
ratio_C_v2 = time_C_v2 / time_julia
ratio_R = time_R / time_julia
results_table = [
    "Julia (baseline)" time_julia 1.0;
    "C v1" time_C_v1 ratio_C_v1;
    "C v2" time_C_v2 ratio_C_v2;
    "R" time_R ratio_R
]

pretty_table(results_table,
    column_labels=["Language", "Time (s)", "Relative Speed"], backend=:html)
Language Time (s) Relative Speed
Julia (baseline) 0.128532 1.0
C v1 6.46593 50.306
C v2 0.192528 1.4979
R 89.3993 695.542
  1. R loops are unavoidably slow: R is ~700.0× slower than Julia for this loop-heavy simulation. While vectorization helps for some operations, SWG simulations require sequential time-stepping that cannot be vectorized. Optimizing R code is limited. Typically, one would try to parallelize with mclapply gaining only limited speedup (in practice speedup due to distributed computing increases slower than the number of cores) or rewrite bottlenecks in C/C++ via Rcpp, but this adds complexity and maintenance burden.

2-v1. C is not always faster: Surprisingly, the first C implementation (v1) produced by the LLM is ~50.0× slower than Julia! This demonstrates that:

  • Bad C code can easily be slower than high-level languages with good JIT compilers
  • The complexity of C makes it unsuitable for rapid prototyping in scientific applications

2-v2. As a sanity check the v2 C version, that uses precomputed Cholesky factors and standard normals from dedicated (Julia) packages, is much faster. It has the same speed as the Julia version (the small difference might be due to the interface between Julia and C).

  1. Julia “solves” the two-language problem: Julia provides C-like performance with Python/R-like ease of use. No need to rewrite bottlenecks in a low-level language - the prototype is the production code.

Fitting Gaussian Random Fields with Matérn Covariance

Maximum Likelihood Estimation is the most common method to fit SWGs to data. Typical workflows involve using a package or directly using the optim R function (that is used under the hood by many R packages). The default method is Nelder-Mead, a gradient-free optimization method. The second choice is the BFGS quasi-Newton method where the gradients are approximated with finite differences. Both approaches can be very slow and/or inaccurate for complex models with many parameters. See the optim documentation for more details.

R as well as native Python/Matlab lack automatic differentiation (AD) capabilities.

In Julia, the support of AD is automatic: almost any Julia code can be differentiated with Automatic Differentiation. Of course, Julia optimization packages also support finite differences and derivative-free methods.

Mathematical Setup

We consider \(N\) independent samples \(Y^{(1)}, \ldots, Y^{(N)}\) from a spatial Gaussian Random Field (GRF) at \(d\) locations, where each sample follows:

\[ Y^{(i)} \sim \mathcal{N}(0, \sigma^2 R(\nu, \rho)) \]

The correlation matrix \(R\) has a Matérn structure:

\[ R_{jk} = \rho_{\text{Matérn}}(h_{jk}; \nu, \rho) = \frac{2^{1-\nu}}{\Gamma(\nu)} \left(\frac{h_{jk}}{\rho}\right)^\nu K_\nu\left(\frac{h_{jk}}{\rho}\right) \]

where:

  • \(h_{jk}\) is the Euclidean distance between locations \(j\) and \(k\)
  • \(\nu > 0\) is the smoothness parameter (higher \(\nu\) = smoother fields)
  • \(\rho > 0\) is the range parameter (controls spatial correlation decay)
  • \(\sigma > 0\) is the variance parameter (controls overall variance)
  • \(K_\nu\) is the modified Bessel function of the second kind
  • \(\Gamma\) is the gamma function

Log-Likelihood

For i.i.d. samples, the log-likelihood is:

\[ \ell(\nu, \rho) = -\frac{N}{2}\left[\log|R(\nu, \rho)| + d\log(2\pi)\right] - \frac{1}{2}\sum_{i=1}^{N} (Y^{(i)})^\top R(\nu, \rho)^{-1} Y^{(i)} \]

We estimate \((\nu, \rho, \sigma)\) by minimizing \(\ell\) using gradient-based optimization with automatic differentiation.

Set up

Generate synthetic data from a Matérn GRF with known parameters.

using GaussianRandomFields
using BesselK: _gamma, adbesselkxv
using Distances
using LinearAlgebra
using Random
using Optimization, OptimizationOptimJL
import FiniteDiff, ForwardDiff
Random.seed!(1234)

# Create a d×d grid of 2D spatial locations
d = 15  # number of spatial locations per dimension (total: d²)
x_pts = range(0, stop=10, length=d)
y_pts = range(0, stop=10, length=d)

# Build location matrix (d² × 2)
locs = [(x, y) for x in x_pts, y in y_pts]
locs = hcat([[loc[1], loc[2]] for loc in locs]...)'

# Compute pairwise distances
H = pairwise(Euclidean(), locs, dims=1)
225×225 Matrix{Float64}:
  0.0        0.714286   1.42857   …  13.1708    13.6464    14.1421
  0.714286   0.0        0.714286     12.7175    13.1708    13.6464
  1.42857    0.714286   0.0          12.289     12.7175    13.1708
  2.14286    1.42857    0.714286     11.8881    12.289     12.7175
  2.85714    2.14286    1.42857      11.5175    11.8881    12.289
  3.57143    2.85714    2.14286   …  11.1803    11.5175    11.8881
  4.28571    3.57143    2.85714      10.8797    11.1803    11.5175
  5.0        4.28571    3.57143      10.6186    10.8797    11.1803
  5.71429    5.0        4.28571      10.4002    10.6186    10.8797
  6.42857    5.71429    5.0          10.227     10.4002    10.6186
  ⋮                               ⋱                        
 10.8797    10.6186    10.4002        4.28571    5.0        5.71429
 11.1803    10.8797    10.6186        3.57143    4.28571    5.0
 11.5175    11.1803    10.8797        2.85714    3.57143    4.28571
 11.8881    11.5175    11.1803        2.14286    2.85714    3.57143
 12.289     11.8881    11.5175    …   1.42857    2.14286    2.85714
 12.7175    12.289     11.8881        0.714286   1.42857    2.14286
 13.1708    12.7175    12.289         0.0        0.714286   1.42857
 13.6464    13.1708    12.7175        0.714286   0.0        0.714286
 14.1421    13.6464    13.1708        1.42857    0.714286   0.0

Matérn Covariance Function

Define the Matérn correlation function with the adbesselkxv and _gamma functions from the BesselK.jl package as they are AD-compatible:

function cov_matérn(h, ν)
    iszero(h) && return one(h)  # Return type-stable 1
    return 2^(1 - ν) / _gamma(ν) * adbesselkxv(ν, h)
end
cov_matérn (generic function with 1 method)

Fixed variance \(\sigma = 1\) case

Data Generation

Generate \(N\) i.i.d. samples from a Matérn GRF

θ_true = [2.5, 1.2] # (ρ, ν)
2-element Vector{Float64}:
 2.5
 1.2
cov = CovarianceFunction(2, Matern(θ_true[1], θ_true[2]))
grf = GaussianRandomField(cov, GaussianRandomFields.Cholesky(), x_pts, y_pts)
Gaussian random field with 2d Matérn covariance function (λ=2.5, ν=1.2, σ=1.0, p=2.0) on a 15×15 structured grid, using a Cholesky decomposition
N = 700
Y = [sample(grf) for _ in 1:N]

println("Generated $N samples on a $(d)×$(d) grid ($(size(H,1)) locations)")
Generated 700 samples on a 15×15 grid (225 locations)

Negative Log-Likelihood

Implement the NLL using:

function nll(u, p)
    ρ, ν = u[1], u[2]
    Y, H = p[1], p[2]
    N = length(Y)
    d = size(H, 1)

    # Build correlation matrix R(ν, ρ) with broadcasting
    R = cov_matérn.(H / ρ, ν)

    # Cholesky decomposition (handles Σ⁻¹ and log|Σ| efficiently)
    Σ = cholesky!(Symmetric(R))

    # Negative log-likelihood:
    # ℓ = (N/2)[log|R| + d·log(2π)] + (1/2)∑ᵢ yᵢᵀ R⁻¹ yᵢ
= N / 2 * (logdet(Σ) + d * log(2π)) +
        sum(dot(vec(y), Σ \ vec(y)) for y in Y) / 2

    return
end
nll (generic function with 1 method)

Parameter Estimation

Compare optimization methods: Automatic Differentiation vs Finite Differences vs Derivative-Free

# Initial guess (true values: ρ=2.5, ν=1.2)
θ₀ = [5.2, 0.5]

# 1. Automatic Differentiation (AD) - exact gradients
optf = OptimizationFunction(nll, Optimization.AutoForwardDiff())
prob = OptimizationProblem(optf, θ₀, [Y, H], lb=[0.0, 0.0], ub=[Inf, Inf])
# 2. Finite Differences (FD) - approximate gradients
optfFD = OptimizationFunction(nll, Optimization.AutoFiniteDiff())
probFD = OptimizationProblem(optfFD, θ₀, [Y, H], lb=[0.0, 0.0], ub=[Inf, Inf])
OptimizationProblem. In-place: true
u0: 2-element Vector{Float64}:
 5.2
 0.5

Apparently, Nelder-Mead uses one evaluation of gradient for some simplex stuff see here and here. So we use the Finite Differences version for consistency with the R version.

To limit computation time, we limit the maximum number of iterations as Finite Differences is super slow. You can increase it to reach better convergence.

maxiters = 400
400

Optimization Results

(solAD, timeAD) = @timed solve(prob, IPNewton(), maxiters=maxiters)
solAD.stats
Warning: The selected optimization algorithm requires second order derivatives, but `SecondOrder` ADtype was not provided.
        So a `SecondOrder` with ADTypes.AutoForwardDiff() for both inner and outer will be created, this can be suboptimal and not work in some cases so
        an explicit `SecondOrder` ADtype is recommended.
@ OptimizationBase C:\Users\metivier\.julia\packages\OptimizationBase\R2xIG\src\cache.jl:51
SciMLBase.OptimizationStats
Number of iterations:                              29
Time in seconds:                                   192.830000
Number of function evaluations:                    74
Number of gradient evaluations:                    74
Number of hessian evaluations:                     29
(solFD, timeFD) = @timed solve(probFD, IPNewton(), maxiters=maxiters)
solFD.stats
Warning: The selected optimization algorithm requires second order derivatives, but `SecondOrder` ADtype was not provided.
        So a `SecondOrder` with ADTypes.AutoFiniteDiff() for both inner and outer will be created, this can be suboptimal and not work in some cases so
        an explicit `SecondOrder` ADtype is recommended.
@ OptimizationBase C:\Users\metivier\.julia\packages\OptimizationBase\R2xIG\src\cache.jl:51
SciMLBase.OptimizationStats
Number of iterations:                              400
Time in seconds:                                   453.919000
Number of function evaluations:                    529
Number of gradient evaluations:                    529
Number of hessian evaluations:                     401
(solNoDiff, timeNoDiff) = @timed solve(probFD, NelderMead(), maxiters=maxiters)
solNoDiff.stats
SciMLBase.OptimizationStats
Number of iterations:                              5
Time in seconds:                                   15.423000
Number of function evaluations:                    487
Number of gradient evaluations:                    1
Number of hessian evaluations:                     0

Results Summary

Code
results_table = [
    "True" θ_true[1] θ_true[2] "N/A";
    "AD" round(solAD.u[1], digits=3) round(solAD.u[2], digits=3) timeAD;
    "FD" round(solFD.u[1], digits=3) round(solFD.u[2], digits=3) timeFD;
    "NM" round(solNoDiff.u[1], digits=3) round(solNoDiff.u[2], digits=3) timeNoDiff
]

pretty_table(results_table,
    column_labels=["Method", "ρ", "ν", "Time (s)"], backend=:html)
Method ρ ν Time (s)
True 2.5 1.2 N/A
AD 2.516 1.195 205.186
FD 2.871 1.118 457.771
NM 2.516 1.195 16.2413

Free variance parameter \(\sigma\)

Data Generation

θσ_true = [2.5, 1.2, 4.0] # (ρ, ν, σ)
3-element Vector{Float64}:
 2.5
 1.2
 4.0
covσ = CovarianceFunction(2, Matern(θσ_true[1], θσ_true[2], σ=θσ_true[3]))
grfσ = GaussianRandomField(covσ, GaussianRandomFields.Cholesky(), x_pts, y_pts)

= [sample(grfσ) for _ in 1:N]
700-element Vector{Matrix{Float64}}:
 [-2.908789397950098 -3.8131606954589463 … 2.1799794522902656 4.596670635099563; 0.11906230828325759 -1.1200414240148966 … 0.7310653607067716 3.2218717476072607; … ; -3.3298667384113565 -3.4485990712482915 … 1.851762239582804 0.33598070020734483; -3.5281571196689114 -2.469843084132078 … 2.0786003052783015 2.2067623466468334]
 [-0.313092254671193 -0.6952943404963746 … 7.093191293580664 7.255686785741399; -3.04940693514618 -3.202361341070152 … 6.3909650094538275 6.578454974732634; … ; -4.477612774250883 -4.244753240458948 … 4.949807486617989 2.6337494394464143; -4.067743540401492 -3.4918743980425733 … 2.121380102158777 1.609712371730212]
 [-0.08451936980921823 1.3443976005655083 … -0.47142832756923886 0.2625890895516912; 0.7750323392359102 3.1527001162621002 … 0.44974661369119606 0.16415799665088204; … ; -8.165703436450121 -8.813877348866782 … -3.1156279769047415 -0.3879050059615239; -7.554554583789133 -7.758913834321561 … -3.3778080532296975 -1.240276468606131]
 [-5.676515966859646 -6.716685318367137 … 1.1369109918964289 3.2347819481159226; -5.895749134693372 -5.098952852195997 … 1.2606823402520817 3.686720551891458; … ; 3.5131749565860018 4.189357149735303 … -6.934798611064781 -4.662758977888864; 5.599097883613813 5.9553517953183706 … -4.8282843374871565 -3.3682827499020185]
 [1.7339207170392155 2.322741455121328 … -8.127329784300995 -5.727244717650702; 2.085657998190487 1.8108420652039547 … -8.472794353242957 -6.491141320348742; … ; 5.213523793252757 3.376616624133374 … -3.3237930098913777 -2.06791067069522; 6.481376200083426 5.381115750848656 … -2.8116397062662832 -0.05224306321142902]
 [-1.633560704076629 -0.3595365630862855 … -0.24969665672829425 -0.34394631395533665; -1.4779095730876675 -0.9413336061672843 … -2.0973063192688306 -2.016989620303522; … ; -1.386754134015919 -1.6217364408241104 … -1.8362684460254384 -1.1043748550866055; -0.9229258249564791 0.0392410697080412 … -3.299544916763974 -1.5806815955476854]
 [-5.290246384521325 -4.5022009087722035 … -7.666662515900084 -5.984660828256332; -4.372175018547575 -3.8002572482946015 … -7.549476927796233 -5.734528976775111; … ; -4.672732410194486 -4.788264512294825 … 5.0552972364680615 5.51746250758299; -6.778288457511182 -6.326729073888694 … 4.925936375368744 5.416843346158407]
 [-3.0832222044749904 -4.302480890420956 … 1.1941700367472539 0.6218192805152737; -1.336121120806356 -2.223844366280061 … 1.8840192198774357 1.5929577583256038; … ; -0.42063772223176654 -1.2150581056241614 … 1.4106669967056704 0.050247108332530876; 0.6177976934503271 0.0793484404884428 … 3.1293367349276053 1.881441216724007]
 [-2.4546302654166614 -2.016642806126867 … -0.7019843958779142 -0.29714555390124786; -2.1951287727681925 -1.8126456819829178 … -0.7444362147916765 -0.15346935586833887; … ; -2.480543584657277 -3.2317242833242146 … 3.027928672397817 5.324174042677577; -3.0314321624627656 -4.373592980519557 … 2.507429939440178 2.798379151642449]
 [4.78204397472337 2.4463424940950604 … 0.8210860973949246 0.23207574303935044; 5.7413773724474595 3.4747108943183544 … 1.9078907374287999 1.481235504196272; … ; -1.4097992333052762 -3.161642103908008 … 1.6329258061555523 4.486541843425547; -2.0954955820260173 -3.371506553563967 … 1.7525186624116396 2.7787361175836813]
 ⋮
 [2.526504307776522 -0.5847799761052339 … 0.29965532278756324 0.4240473963351052; -2.021392457653805 -2.648417948989092 … 0.917961973589297 0.28738887483136855; … ; 6.59707784165706 6.042459463855335 … 2.6017701942788856 3.2747496694198404; 5.309849811493078 5.204631071316478 … 2.576088218521617 3.1914362348104146]
 [-0.44049596592253704 -1.9777331875924107 … 0.412011950200139 2.9056766799873466; 0.9038349644701564 -0.561041692044341 … -0.3618454049533446 1.3488068409579839; … ; 5.412064157780901 2.391218999855472 … -9.282943772912104 -9.233555804007667; 4.266229309591138 2.8374530505903968 … -9.425347389768108 -9.754728455949365]
 [-0.28475469701010214 1.1001766540481925 … 1.600514853699297 2.7118075329270934; -0.5697853430646705 1.4401770090838288 … -0.6828174553503688 1.6657797982748135; … ; 3.7440139090116857 2.0768368409453264 … 5.396916300495214 5.129191480855952; 6.015021207004874 4.257866286559388 … 5.769484677895831 4.71061678311146]
 [-3.188028190815699 -3.359640584677632 … 1.3557516107975274 -0.1388938388383145; -3.8710585453041375 -4.001455922305908 … 0.4343308250061231 -0.5233291343188942; … ; -3.967809264805045 -3.184588880537353 … 0.6489500382398137 1.0592860503183903; -2.3893513789524996 -0.859509260753168 … 0.9214782762785829 0.23437136986225815]
 [-0.09670717025197507 -0.19189193937892765 … -0.4860045994573195 -0.1752761301483978; -1.4829104528641177 0.0025063582641504933 … 0.9999686829041503 0.1639107576190973; … ; 3.2668570795780245 5.716426108602874 … 1.8149671149761357 1.344971325398709; 1.8126088479126672 1.7850094922579074 … -0.27854407132346115 -0.40689749020303384]
 [-5.822843100723238 -3.973250899903822 … -4.12918582209273 -2.6027512827605546; -6.602395251810106 -6.131332389022809 … -3.2096059155636265 -3.72348450736055; … ; -0.7785456273546474 -0.9186385412820087 … 8.263413235836842 7.030564212468362; -2.380483246214559 -2.4546152198065996 … 8.18636463419122 7.813930742438426]
 [3.960708464997427 5.4964917833571585 … -1.304882317952616 -1.7013950196041603; 2.4827678410065404 4.5965405775169055 … -1.881955529462397 -1.6175246759169428; … ; 10.444819508881277 10.67813598858717 … -4.134015386176744 -4.035045443593478; 9.507305235402903 10.13729267232085 … -4.305964195395933 -5.568150978574627]
 [-1.0823772157612597 1.105657976804282 … 8.311003893951643 6.428549247345561; 0.6840986172085235 2.85366358824699 … 8.301786420234777 6.214449730539613; … ; -2.1260710615001566 -4.501645010387021 … -4.0008100305918175 -2.9475338619153515; 0.46249287120664107 -1.7888746933039674 … -3.780282406516182 -1.9722418922947416]
 [3.8484383176364076 5.44552172636197 … -0.7565132972602653 1.0855799275964333; 2.4396625814313637 4.271836880977826 … 0.8054201037994785 2.4563274869622727; … ; 5.463672975178838 5.6590107771176505 … 2.5074168712247427 0.7505902477446413; 5.029091961300201 5.238688979569602 … 3.222128418942532 3.2620639216563845]

Negative Log-Likelihood

function nllσ(u, p)
    ρ, ν, σ = u[1], u[2], u[3]
    Y, H = p[1], p[2]
    N = length(Y)
    d = size(H, 1)

    # Build correlation matrix R(ν, ρ) with broadcasting
    R = σ^2 * cov_matérn.(H / ρ, ν)

    # Cholesky decomposition (handles Σ⁻¹ and log|Σ| efficiently)
    Σ = cholesky!(Symmetric(R))

    # Negative log-likelihood:
    # ℓ = (N/2)[log|R| + d·log(2π)] + (1/2)∑ᵢ yᵢᵀ R⁻¹ yᵢ
= N / 2 * (logdet(Σ) + d * log(2π)) +
        sum(dot(vec(y), Σ \ vec(y)) for y in Y) / 2

    return
end
nllσ (generic function with 1 method)

Parameter Estimation

Compare optimization methods: Automatic Differentiation vs Finite Differences vs Derivative-Free

# Initial guess (true values: ρ=2.5, ν=1.2, σ = 4.0)
θσ₀ = [0.2, 0.5, 0.5]

# With automatic differentiation
optfσ = OptimizationFunction(nllσ, Optimization.AutoForwardDiff())
probσ = OptimizationProblem(optfσ, θσ₀, [Yσ, H], lb=[0.0, 0.0, 0.0], ub=[Inf, Inf, Inf])
# With finite differences
optfσFD = OptimizationFunction(nllσ, Optimization.AutoFiniteDiff())
probσFD = OptimizationProblem(optfσFD, θσ₀, [Yσ, H], lb=[0.0, 0.0, 0.0], ub=[Inf, Inf, Inf])
OptimizationProblem. In-place: true
u0: 3-element Vector{Float64}:
 0.2
 0.5
 0.5

Optimization Results

(solAD, timeAD) = @timed solve(probσ, IPNewton(), maxiters=maxiters)
solAD.stats
Warning: The selected optimization algorithm requires second order derivatives, but `SecondOrder` ADtype was not provided.
        So a `SecondOrder` with ADTypes.AutoForwardDiff() for both inner and outer will be created, this can be suboptimal and not work in some cases so
        an explicit `SecondOrder` ADtype is recommended.
@ OptimizationBase C:\Users\metivier\.julia\packages\OptimizationBase\R2xIG\src\cache.jl:51
SciMLBase.OptimizationStats
Number of iterations:                              32
Time in seconds:                                   145.094000
Number of function evaluations:                    76
Number of gradient evaluations:                    76
Number of hessian evaluations:                     32

Since Finite Differences is very slow, we catch potential failures to converge within the iteration budget.

(solFD, timeFD) = @timed try
    solve(probσFD, IPNewton(), maxiters=maxiters)
catch
    @warn "The solve failed."
    (u=[missing, missing, missing], stats=nothing)
end
solFD.stats
Warning: The selected optimization algorithm requires second order derivatives, but `SecondOrder` ADtype was not provided.
        So a `SecondOrder` with ADTypes.AutoFiniteDiff() for both inner and outer will be created, this can be suboptimal and not work in some cases so
        an explicit `SecondOrder` ADtype is recommended.
@ OptimizationBase C:\Users\metivier\.julia\packages\OptimizationBase\R2xIG\src\cache.jl:51
SciMLBase.OptimizationStats
Number of iterations:                              400
Time in seconds:                                   1360.517000
Number of function evaluations:                    783
Number of gradient evaluations:                    783
Number of hessian evaluations:                     401

In this example, Nelder-Mead also fails to converge (producing non positive definite matrices).

(solNoDiff, timeNoDiff) = @timed try
    solve(probσ, NelderMead(), maxiters=maxiters)
catch
    @warn "The solve failed."
    (u=[missing, missing, missing], stats=nothing)
end
solNoDiff.stats
Warning: The solve failed.
@ Main.Notebook C:\Users\metivier\Dropbox\PC (2)\Documents\GitHub\Pluto_export\SWG_coding_challenges\Talk_DM_quarto.qmd:698

Results Summary

Code
results_table = [
    "True" θσ_true[1] θσ_true[2] θσ_true[3] "N/A";
    "AD" round(solAD.u[1], digits=3) round(solAD.u[2], digits=3) round(solAD.u[3], digits=3) timeAD;
    "FD" round(solFD.u[1], digits=3) round(solFD.u[2], digits=3) round(solFD.u[3], digits=3) timeFD;
    "NM" round(solNoDiff.u[1], digits=3) round(solNoDiff.u[2], digits=3) round(solNoDiff.u[3], digits=3) timeNoDiff
]

pretty_table(results_table,
    column_labels=["Method", "ρ", "ν", "σ", "Time (s)"], backend=:html)
Method ρ ν σ Time (s)
True 2.5 1.2 4.0 N/A
AD 2.508 1.196 3.987 146.412
FD 2.405 1.207 3.87 1364.05
NM missing missing missing 38.5543

Summary

To fit a two or three parameter Matérn GRF model to spatial data on a small 2D grid with 700 i.i.d. samples:

  • Finite Differences is extremely slow and may fail to converge within a reasonable time budget. When it converges, the estimates are optimal.
  • Gradient-Free (Nelder-Mead) is faster in this case and accurate for the two-parameter \((\rho, \nu)\) case. When adding variance to the estimation, it failed to converge. Note that it does converge when the number of samples is larger.
  • Automatic Differentiation is both fast and converges in both settings.

This does not mean that AD is always the best choice, but it should be seriously considered when fitting complex SWG models with many parameters.

Julia, Computer and Packages settings

Code
using InteractiveUtils
InteractiveUtils.versioninfo()

Pkg.status();
Julia Version 1.12.1
Commit ba1e628ee4 (2025-10-17 13:02 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 28 × 13th Gen Intel(R) Core(TM) i7-13850HX
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, alderlake)
  GC: Built with stock GC
Threads: 1 default, 0 interactive, 1 GC (on 28 virtual cores)
Environment:
  JULIA_LOAD_PATH = @;@stdlib
  JULIA_PROJECT = @.
Status `C:\Users\metivier\Dropbox\PC (2)\Documents\GitHub\Pluto_export\SWG_coding_challenges\Project.toml`
  [6e4b80f9] BenchmarkTools v1.6.3
  [432ab697] BesselK v0.5.7
  [a93c6f00] DataFrames v1.8.1
  [b4f34e82] Distances v0.10.12
  [31c24e10] Distributions v0.25.122
  [6a86dc24] FiniteDiff v2.29.0
  [f6369f11] ForwardDiff v1.3.0
  [e4b2fa32] GaussianRandomFields v2.2.7
 [7f7a1694] Optimization v5.1.0
  [36348300] OptimizationOptimJL v0.4.8
  [08abe8d2] PrettyTables v3.1.2
  [6f49c342] RCall v0.14.10
  [b77e0a4c] InteractiveUtils v1.11.0
  [8f399da3] Libdl v1.11.0
  [37e2e46d] LinearAlgebra v1.12.0
Info Packages marked with  have new versions available and may be upgradable.