Rainfall
- Rainfall
Amongst the rich literature of SWG there are many interesting models. For example
- Censored Latent Gaussian models as in (Vaittinada Ayar et al., 2020; Ailliot et al., 2009), etc. In these models, the rainfall occurrence and amounts are generated using the same underlying Gaussian latent variable. It makes possible having very complex spatiotemporal covariance structures.
- The author of (Najibi et al., 2021) created a Git repo of their model, which is always cool and interesting. -> investigate
WGEN — Multisite latent Gaussian Rain Occurrence Model
Overview
The WGEN model is a weather generator approach using latent Gaussian variables to generate multisite rain occurrences with order $p$ Markov chain dependence at each site. The model was first proposed in (Wilks, 1998) and extended to order 4 Markov chains in (Srikanthan and Pegram, 2009).
In the literature, the name WGEN refers to different models e.g. to (Richardson, 1981) !
Model Components
- Single site Temporal Occurrence: Binary rain/no-rain determined by Markov chain of order $p$ at each station.
- Spatial correlation: Gaussian latent variable with spatial covariance, i.e. Gaussian copula.
- Amount Layer: Can be added trained on top of the occurrence layer using parametric distributions (e.g., Gamma, mixture of exponential) fitted to positive rain days. See (Evin et al., 2018) for a more complex rainfall amount model and associated R package GWEX (TODO implementation).
- Seasonality: All parameters are fitted monthly.
- Rewrite the Markov chain efficiently (the generation is slow).
- Current implementation concatenates each month into the likelihood to fit -> add different fit options:
- Sum the likelihood of each month
- Estimate at each month of each year and take the mean/median of each month
Assumptions & Limitations
- Computational complexity scales as $\mathcal{O}(S^2)$ where $S$ is the station number, making it less suitable for high-dimensional station networks.
- Lacks explicit hidden states or large-scale regime representation.
- Time and space are fitted separately
- Ideal for small regions.
Model selection
Mathematical description
TODO
Usage
See the WGEN model section of the tutorial for an example of fit and simulation.
StochasticWeatherGenerators.wgen
— Typewgen
wgen(p::Integer, D::Integer, ncat::Integer=2)
wgen(p::Integer, D::Integer, idxs::AbstractArray, ncat::Integer=2)
Structure for a multisite rain occurrence model as described in (Wilks 1998) and generalized to higher order by Srikanthan et al. (2009). It contains Markov chain transition probabilities of order p
at each sites j ∈ 1:D
, and a correlation matrix for the unobserved Multivariate Gaussian variable (used to generate correlated rain occurrences). When idxs
is provided it is an Vector
of size 12 (typically) with index of days corresponding to each months. ncat
can be larger than 2 to model generic categorical variables (experimental feature).
Example
p = 4
date_range = Date(1956):Day(1):Date(2019,12,31)
D = 5
Y = rand(Bool, length(date_range), D)
idx_months = [findall(x -> month.(x) == m, date_range) for m in 1:12]
# Fit returning a `wgen` struct
wgen4_model = fit_wgen(Y, idx_months, wgen_order)
# Simulate from `wgen` struct
Y_simu = rand(wgen4_model, 1956:2019; Y_ini=rand(Bool, p, D))
References
- Wilks, D. S. "Multisite generalization of a daily stochastic precipitation generation model". Journal of Hydrology, (1998). https://doi.org/10.1016/S0022-1694(98)00186-3.
- Srikanthan, Ratnasingham, et Geoffrey G. S. Pegram. "A nested multisite daily rainfall stochastic generation model". Journal of Hydrology 2009. https://doi.org/10.1016/j.jhydrol.2009.03.025.
StochasticWeatherGenerators.fit_wgen
— Functionfit_wgen(Y::AbstractMatrix, idxs, p::Integer; kwargs...)
Use fit_markov_chain
at each sites and each set of idxs
to fit Markov chain of order p
. Then it fits the mutlivariate Gaussian copula for occurrences (each month) with fit_Ω
. Return a wgen
structure.
StochasticWeatherGenerators.fit_Ω
— Functionfit_Ω(transition_probs, Y; ω0=(-0.999, 0.999))
fit_Ω(transition_probs::AbstractMatrix, Y, idxs; ω0=(-0.999, 0.999))
Fit the mutlisite mutlivariate Gaussian variable.
Base.rand
— Methodrand(model::wgen, years::AbstractArray{<:Integer}; Y_ini)
Takes a wgen
struct TODO
For simple Markov chain of order $p$
StochasticWeatherGenerators.fit_markov_chain
— Functionfit_markov_chain(Y::AbstractVector{<:Integer}, p::Integer)
fit_markov_chain(Y::AbstractMatrix{<:Integer}, p::Integer)
fit_markov_chain(Y::AbstractMatrix{<:Integer}, idxs::AbstractArray, p::Integer)
y = rand(Bool, 1000)
markov = fit_markov_chain(y, 2)
markov[[0,1]]
TODO: add doc
StochasticWeatherGenerators.simulate_markov_gaussian
— Functionsimulate_markov_gaussian(N::Integer, ΩX::AbstractMatrix, transition_probs; Y_ini)
simulate_markov_gaussian(years::AbstractArray{<:Integer}, ΩX, transition_probs; Y_ini)
Function to simulate Xt
given a correlation matrix ΩX
and transition probabilities.
- If
N
is an integer then it return a sequence of sizeN
with fixed parameters. - If not then it
SHHMM — Seasonal Hierarchical Hidden Markov Model
Overview
SHHMM
is a multisite stochastic weather generator (SWG) based on a seasonal Hidden Markov Model (HMM). The model was proposed in (Gobet et al., Sep 2024) by one author of this package. It is designed to generate temporally and spatially correlated precipitation sequences over large regions (e.g., France), while maintaining full interpretability. It has a Hierarchical Structure as it combines a discrete HMM for occurrences with a seasonal rainfall amount model conditional on the hidden states.
Model Components
- Hidden Weather Regimes: The model infers interpretable hidden states (weather regimes) from daily rain occurrence data, without relying on exogenous variables.
- Rain Occurrence Layer: Discrete Bernoulli HMM modeling rain/no-rain across stations.
- Local Memory: Incorporates autoregressive memory for dry/wet persistence (Auto-Regressive HMM). The local memory can be per site. TODO: doc
- Rain Amount Layer: Gaussian copula-based model to assign amounts conditioned on the occurrence and hidden states. The parametric form can be an arbitrary continuous distribution from
Distributions.jl
. - Seasonality: All model parameters vary continuously over the year.
Assumptions & Limitations
- Relies on the conditional independence assumption — dense station networks may violate this.
- Computational complexity is $\mathcal{O}(S)$ and $\mathcal{O}(K^2)$ (number of hidden states).
- Tail behavior/extreme events can challenge the stability or identifiability of the hidden states.
- Trade-off between model interpretability and spatial resolution.
- Ideal to studying climate variability over large regions.
Add model selection details
Mathematical description
TODO
Usage
For a complete example see the this section that both serve as a tutorial to fit, simulate and visualize the SHHMM model and the reproducible supplementary material of the paper (Gobet et al., Sep 2024). Some element of the paper are not included in the tutorial to limit runtime of CI (model selection, Weather Regimes illustration, comparison with climate models).
Fit
Fit functions condition on given hidden states. Fix K=1
or z=ones(size(Y,A))
if you do not have hidden states.
SmoothPeriodicStatsModels.fit_mle_RO
— Functionfit_mle_RO(df::DataFrame, K, T, degree, local_order)
Given a DataFrame df
with known hidden states column z ∈ 1:K
. The rain occurrences of the new station are fitted conditionally to the hidden state. For local_order>0
the model is also autoregressive with its past.
StochasticWeatherGenerators.fit_mle_RR
— Functionfit_mle_RR(df::DataFrame, local_order, K = length(unique(df.z)); maxiter=5000, tol=2e-4, robust=true, silence=true, warm_start=true, display=:none, mix₀=mix_ini(length(unique(dayofyear_Leap.(df.DATE)))))
Fit the strictly positive rain amounts RR>0
distribution $g_{k,t}(r)$ w.r.t. to each hidden states k∈[1,K]
(provided in a column of df.z
). The fitted model could in principle be any seasonal model. For now by default it is a double exponential model,
$g_{k,t}(r) = \alpha(t,k)\exp(-r/\theta_1(t,k))/\theta_1(t,k) + (1-\alpha(t,k))\exp(-r/\theta_2(t,k))/\theta_2(t,k).$
TODO add SmoothPeriodicStatsModels.fit_mle
Spatial correlations
StochasticWeatherGenerators.cor_RR
— Functioncor_RR(dfs::AbstractArray{<:DataFrame}[, K]; cor_method=Σ_Spearman2Pearson, force_PosDef = true)
Compute the (strictly positive) rain pair correlations cor(Rs₁ > 0, Rs₂ > 0)
between each pair of stations s₁, s₂
for each hidden state Z = k
.
Input: a array dfs
of df::DataFrame
of length S
(number of station) where each df
have :DATE, :RR, :z (same :z for each df).
Output: K
correlation matrix of size S×S
Options:
force_PosDef
will enforce Positive Definite matrix with NearestCorrelationMatrix.jl.cor_method
: typicallyΣ_Spearman2Pearson
orΣ_Kendall2Pearson
impute_missing
: ifnothing
,missing
will be outputted when two stations do not have at least two rain days in common. Otherwise the valueimpute_missing
will be set.
ΣRR = cor_RR(data_stations, K)
StochasticWeatherGenerators.cov_RR
— Functioncov_RR(dfs::AbstractArray{<:DataFrame}[, K]; cor_method=Σ_Spearman2Pearson, force_PosDef = true)
Compute the (strictly positive) rain pair covariance cov(Rs₁ > 0, Rs₂ > 0)
between each pair of stations s₁, s₂
for each hidden state Z = k
.
Input: a array dfs
of df::DataFrame
of length S
(number of station) where each df
have :DATE, :RR, :z (same :z for each df).
Output: K
covariance matrix of size S×S
Options:
force_PosDef
will enforce Positive Definite matrix with NearestCorrelationMatrix.jl.cor_method
: typicallyΣ_Spearman2Pearson
orΣ_Kendall2Pearson
impute_missing
: ifnothing
,missing
will be outputted when two stations do not have at least two rain days in common. Otherwise the valueimpute_missing
will be set.
ΣRR = cov_RR(data_stations, K)
StochasticWeatherGenerators.Σ_Spearman2Pearson
— FunctionΣ_Spearman2Pearson(M::AbstractMatrix)
Compute the Pearson correlation coefficient i.e. the classic one from the Spearman correlation #TODO Add ref
StochasticWeatherGenerators.Σ_Kendall2Pearson
— FunctionΣ_Kendall2Pearson(M::AbstractMatrix)
Compute the Pearson correlation coefficient i.e. the classic one from the Kendall correlation #TODO Add ref
StochasticWeatherGenerators.joint_rain
— Functionjoint_rain(M::AbstractMatrix, j1::Integer, j2::Integer, r = 0)
Select all the rows of M
with values for (two) columns above some value r
.
Generation
Function to generate rain amounts
StochasticWeatherGenerators.rand_RR
— Functionrand_RR(mixs::AbstractArray{<:MixtureModel}, n2t::AbstractVector, z::AbstractVector, y::AbstractMatrix, Σk::AbstractArray)
Generate a (nonhomegenous) sequence of length length(n2t)
of rain amounts conditionally to a given dry/wet matrix y
and (hidden) state sequence z
. Univariate distribution are given by mixs
while correlations are given by covariance matrix Σk.
References
- Ailliot, P.; Thompson, C. and Thomson, P. (2009). Space Time Modelling of Precipitation by Using a Hidden Markov Model and Censored Gaussian Distributions. Journal of the Royal Statistical Society: Series C (Applied Statistics) 58, 405–426.
- Evin, G.; Favre, A.-C. and Hingray, B. (2018). Stochastic Generation of Multi-Site Daily Precipitation Focusing on Extreme Events. Hydrology and Earth System Sciences 22, 655–672.
- Gobet, E.; Métivier, D. and Parey, S. (Sep 2024). Interpretable Seasonal Hidden Markov Model for Spatio-Temporal Stochastic Rain Generation in France.
- Najibi, N.; Mukhopadhyay, S. and Steinschneider, S. (2021). Identifying Weather Regimes for Regional-Scale Stochastic Weather Generators. International Journal of Climatology 41, 2456–2479.
- Richardson, C. W. (1981). Stochastic Simulation of Daily Precipitation, Temperature, and Solar Radiation. Water Resources Research 17, 182–190.
- Srikanthan, R. and Pegram, G. G. (2009). A Nested Multisite Daily Rainfall Stochastic Generation Model. Journal of Hydrology 371, 142–153.
- Vaittinada Ayar, P.; Blanchet, J.; Paquet, E. and Penot, D. (2020). Space-Time Simulation of Precipitation Based on Weather Pattern Sub-Sampling and Meta-Gaussian Model. Journal of Hydrology 581, 124451.
- Wilks, D. S. (1998). Multisite Generalization of a Daily Stochastic Precipitation Generation Model. Journal of Hydrology 210, 178–191.