Rainfall

Todo

Amongst the rich literature of SWG there are many interesting models. For example

WGEN — Multisite latent Gaussian Rain Occurrence Model

Overview

The WGEN model is a weather generator approach using latent Gaussian variables to generate multisite rain occurrences with order $p$ Markov chain dependence at each site. The model was first proposed in (Wilks, 1998) and extended to order 4 Markov chains in (Srikanthan and Pegram, 2009).

WGEN model

In the literature, the name WGEN refers to different models e.g. to (Richardson, 1981) !

Model Components

  • Single site Temporal Occurrence: Binary rain/no-rain determined by Markov chain of order $p$ at each station.
  • Spatial correlation: Gaussian latent variable with spatial covariance, i.e. Gaussian copula.
  • Amount Layer: Can be added trained on top of the occurrence layer using parametric distributions (e.g., Gamma, mixture of exponential) fitted to positive rain days. See (Evin et al., 2018) for a more complex rainfall amount model and associated R package GWEX (TODO implementation).
  • Seasonality: All parameters are fitted monthly.
Todo
  • Rewrite the Markov chain efficiently (the generation is slow).
  • Current implementation concatenates each month into the likelihood to fit -> add different fit options:
    • Sum the likelihood of each month
    • Estimate at each month of each year and take the mean/median of each month

Assumptions & Limitations

  • Computational complexity scales as $\mathcal{O}(S^2)$ where $S$ is the station number, making it less suitable for high-dimensional station networks.
  • Lacks explicit hidden states or large-scale regime representation.
  • Time and space are fitted separately
  • Ideal for small regions.
Todo

Model selection

Mathematical description

TODO

Usage

See the WGEN model section of the tutorial for an example of fit and simulation.

StochasticWeatherGenerators.wgenType
wgen
wgen(p::Integer, D::Integer, ncat::Integer=2)
wgen(p::Integer, D::Integer, idxs::AbstractArray, ncat::Integer=2)

Structure for a multisite rain occurrence model as described in (Wilks 1998) and generalized to higher order by Srikanthan et al. (2009). It contains Markov chain transition probabilities of order p at each sites j ∈ 1:D, and a correlation matrix for the unobserved Multivariate Gaussian variable (used to generate correlated rain occurrences). When idxs is provided it is an Vector of size 12 (typically) with index of days corresponding to each months. ncat can be larger than 2 to model generic categorical variables (experimental feature).

Example

p = 4
date_range = Date(1956):Day(1):Date(2019,12,31)
D = 5
Y = rand(Bool, length(date_range), D)
idx_months = [findall(x -> month.(x) == m, date_range) for m in 1:12]
# Fit returning a `wgen` struct
wgen4_model = fit_wgen(Y, idx_months, wgen_order)
# Simulate from `wgen` struct
Y_simu = rand(wgen4_model, 1956:2019; Y_ini=rand(Bool, p, D))

References

  • Wilks, D. S. "Multisite generalization of a daily stochastic precipitation generation model". Journal of Hydrology, (1998). https://doi.org/10.1016/S0022-1694(98)00186-3.
  • Srikanthan, Ratnasingham, et Geoffrey G. S. Pegram. "A nested multisite daily rainfall stochastic generation model". Journal of Hydrology 2009. https://doi.org/10.1016/j.jhydrol.2009.03.025.
source
StochasticWeatherGenerators.fit_wgenFunction
fit_wgen(Y::AbstractMatrix, idxs, p::Integer; kwargs...)

Use fit_markov_chain at each sites and each set of idxs to fit Markov chain of order p. Then it fits the mutlivariate Gaussian copula for occurrences (each month) with fit_Ω. Return a wgen structure.

source
StochasticWeatherGenerators.fit_ΩFunction
fit_Ω(transition_probs, Y; ω0=(-0.999, 0.999))
fit_Ω(transition_probs::AbstractMatrix, Y, idxs; ω0=(-0.999, 0.999))

Fit the mutlisite mutlivariate Gaussian variable.

source
Base.randMethod
rand(model::wgen, years::AbstractArray{<:Integer}; Y_ini)

Takes a wgen struct TODO

source

For simple Markov chain of order $p$

StochasticWeatherGenerators.fit_markov_chainFunction
fit_markov_chain(Y::AbstractVector{<:Integer}, p::Integer)
fit_markov_chain(Y::AbstractMatrix{<:Integer}, p::Integer)
fit_markov_chain(Y::AbstractMatrix{<:Integer}, idxs::AbstractArray, p::Integer)
y = rand(Bool, 1000)
markov = fit_markov_chain(y, 2)
markov[[0,1]]

TODO: add doc

source
StochasticWeatherGenerators.simulate_markov_gaussianFunction
simulate_markov_gaussian(N::Integer, ΩX::AbstractMatrix, transition_probs; Y_ini)
simulate_markov_gaussian(years::AbstractArray{<:Integer}, ΩX, transition_probs; Y_ini)

Function to simulate Xt given a correlation matrix ΩX and transition probabilities.

  • If N is an integer then it return a sequence of size N with fixed parameters.
  • If not then it
source

SHHMM — Seasonal Hierarchical Hidden Markov Model

Overview

SHHMM is a multisite stochastic weather generator (SWG) based on a seasonal Hidden Markov Model (HMM). The model was proposed in (Gobet et al., Sep 2024) by one author of this package. It is designed to generate temporally and spatially correlated precipitation sequences over large regions (e.g., France), while maintaining full interpretability. It has a Hierarchical Structure as it combines a discrete HMM for occurrences with a seasonal rainfall amount model conditional on the hidden states.

Model Components

  • Hidden Weather Regimes: The model infers interpretable hidden states (weather regimes) from daily rain occurrence data, without relying on exogenous variables.
  • Rain Occurrence Layer: Discrete Bernoulli HMM modeling rain/no-rain across stations.
  • Local Memory: Incorporates autoregressive memory for dry/wet persistence (Auto-Regressive HMM). The local memory can be per site. TODO: doc
  • Rain Amount Layer: Gaussian copula-based model to assign amounts conditioned on the occurrence and hidden states. The parametric form can be an arbitrary continuous distribution from Distributions.jl.
  • Seasonality: All model parameters vary continuously over the year.

Assumptions & Limitations

  • Relies on the conditional independence assumption — dense station networks may violate this.
  • Computational complexity is $\mathcal{O}(S)$ and $\mathcal{O}(K^2)$ (number of hidden states).
  • Tail behavior/extreme events can challenge the stability or identifiability of the hidden states.
  • Trade-off between model interpretability and spatial resolution.
  • Ideal to studying climate variability over large regions.
Todo

Add model selection details

Mathematical description

TODO

Schematic of the Autoregressive Seasonal Hidden Markov Model

Usage

For a complete example see the this section that both serve as a tutorial to fit, simulate and visualize the SHHMM model and the reproducible supplementary material of the paper (Gobet et al., Sep 2024). Some element of the paper are not included in the tutorial to limit runtime of CI (model selection, Weather Regimes illustration, comparison with climate models).

Fit

Fit functions condition on given hidden states. Fix K=1 or z=ones(size(Y,A)) if you do not have hidden states.

SmoothPeriodicStatsModels.fit_mle_ROFunction
fit_mle_RO(df::DataFrame, K, T, degree, local_order)

Given a DataFrame df with known hidden states column z ∈ 1:K. The rain occurrences of the new station are fitted conditionally to the hidden state. For local_order>0 the model is also autoregressive with its past.

source
StochasticWeatherGenerators.fit_mle_RRFunction
fit_mle_RR(df::DataFrame, local_order,  K = length(unique(df.z)); maxiter=5000, tol=2e-4, robust=true, silence=true, warm_start=true, display=:none, mix₀=mix_ini(length(unique(dayofyear_Leap.(df.DATE)))))

Fit the strictly positive rain amounts RR>0 distribution $g_{k,t}(r)$ w.r.t. to each hidden states k∈[1,K] (provided in a column of df.z). The fitted model could in principle be any seasonal model. For now by default it is a double exponential model,

$g_{k,t}(r) = \alpha(t,k)\exp(-r/\theta_1(t,k))/\theta_1(t,k) + (1-\alpha(t,k))\exp(-r/\theta_2(t,k))/\theta_2(t,k).$

source

TODO add SmoothPeriodicStatsModels.fit_mle

Spatial correlations

StochasticWeatherGenerators.cor_RRFunction
cor_RR(dfs::AbstractArray{<:DataFrame}[, K]; cor_method=Σ_Spearman2Pearson, force_PosDef = true)

Compute the (strictly positive) rain pair correlations cor(Rs₁ > 0, Rs₂ > 0) between each pair of stations s₁, s₂ for each hidden state Z = k.

Input: a array dfs of df::DataFrame of length S (number of station) where each df have :DATE, :RR, :z (same :z for each df).

Output: K correlation matrix of size S×S

Options:

  • force_PosDef will enforce Positive Definite matrix with NearestCorrelationMatrix.jl.
  • cor_method: typically Σ_Spearman2Pearson or Σ_Kendall2Pearson
  • impute_missing: if nothing, missing will be outputted when two stations do not have at least two rain days in common. Otherwise the value impute_missing will be set.
ΣRR = cor_RR(data_stations, K)
source
StochasticWeatherGenerators.cov_RRFunction
cov_RR(dfs::AbstractArray{<:DataFrame}[, K]; cor_method=Σ_Spearman2Pearson, force_PosDef = true)

Compute the (strictly positive) rain pair covariance cov(Rs₁ > 0, Rs₂ > 0) between each pair of stations s₁, s₂ for each hidden state Z = k.

Input: a array dfs of df::DataFrame of length S (number of station) where each df have :DATE, :RR, :z (same :z for each df).

Output: K covariance matrix of size S×S

Options:

  • force_PosDef will enforce Positive Definite matrix with NearestCorrelationMatrix.jl.
  • cor_method: typically Σ_Spearman2Pearson or Σ_Kendall2Pearson
  • impute_missing: if nothing, missing will be outputted when two stations do not have at least two rain days in common. Otherwise the value impute_missing will be set.
ΣRR = cov_RR(data_stations, K)
source

Generation

Function to generate rain amounts

StochasticWeatherGenerators.rand_RRFunction
rand_RR(mixs::AbstractArray{<:MixtureModel}, n2t::AbstractVector, z::AbstractVector, y::AbstractMatrix, Σk::AbstractArray)

Generate a (nonhomegenous) sequence of length length(n2t) of rain amounts conditionally to a given dry/wet matrix y and (hidden) state sequence z. Univariate distribution are given by mixs while correlations are given by covariance matrix Σk.

source

References

  • Ailliot, P.; Thompson, C. and Thomson, P. (2009). Space Time Modelling of Precipitation by Using a Hidden Markov Model and Censored Gaussian Distributions. Journal of the Royal Statistical Society: Series C (Applied Statistics) 58, 405–426.
  • Evin, G.; Favre, A.-C. and Hingray, B. (2018). Stochastic Generation of Multi-Site Daily Precipitation Focusing on Extreme Events. Hydrology and Earth System Sciences 22, 655–672.
  • Gobet, E.; Métivier, D. and Parey, S. (Sep 2024). Interpretable Seasonal Hidden Markov Model for Spatio-Temporal Stochastic Rain Generation in France.
  • Najibi, N.; Mukhopadhyay, S. and Steinschneider, S. (2021). Identifying Weather Regimes for Regional-Scale Stochastic Weather Generators. International Journal of Climatology 41, 2456–2479.
  • Richardson, C. W. (1981). Stochastic Simulation of Daily Precipitation, Temperature, and Solar Radiation. Water Resources Research 17, 182–190.
  • Srikanthan, R. and Pegram, G. G. (2009). A Nested Multisite Daily Rainfall Stochastic Generation Model. Journal of Hydrology 371, 142–153.
  • Vaittinada Ayar, P.; Blanchet, J.; Paquet, E. and Penot, D. (2020). Space-Time Simulation of Precipitation Based on Weather Pattern Sub-Sampling and Meta-Gaussian Model. Journal of Hydrology 581, 124451.
  • Wilks, D. S. (1998). Multisite Generalization of a Daily Stochastic Precipitation Generation Model. Journal of Hydrology 210, 178–191.