import Pkg;
Multisite daily Stochastic Weather Generator
This tutorial describes the numerical applications described in the paper Interpretable Seasonal Hidden Markov Model for spatio-temporal stochastic rain generation in France by Emmanuel Gobet (CMAP - École Polytechnique), David Métivier (MISTEA – INRAE) and Sylvie Parey (R&D – EDF). It shows a fully reproducible example on how to use the package StochasticWeatherGenerators.jl
to reproduce, step-by-step, exactly (almost) all the figures of the paper.
The paper describes the construction of a Stochastic Weather Generator with an Autoregressive Seasonal Hidden Markov Model (SHMM). The SHMM is trained with French weather stations, and the hidden states are interpreted as weather regimes. The model is validated with simulations, especially for its ability to reproduce extreme weather, e.g. droughts. In the paper, the model is also used with Climate Change RCP scenarios (not shown here).
Set up
Package and functions
There are several ways to add
a package before using
, one way is for this tutorial to copy-paste (it might take a while):
import Pkg
Pkg.add(["CSV", "JLD", "DelimitedFiles", "DataFrames", "DataFramesMeta", "StatsBase", "Random", "Distributions", "StatsPlots", "LaTeXStrings"])
using CSV, JLD, DelimitedFiles # File Read/Load/Save
using DataFrames, DataFramesMeta # DataFrames
using Dates
using StatsBase, Random
using Distributions
The two main packages for this tutorial are not yet registered in the official Julia registry, since they are not quite fully ready. They can be either add
ed through my local Julia registry with the LocalRegistry.jl package i.e.
using Pkg
pkg"registry add https://github.com/dmetivie/LocalRegistry"
Pkg.add("SmoothPeriodicStatsModels")
Pkg.add("StochasticWeatherGenerators")
Or directly on the master branch with add
ed via url i.e.
import Pkg
Pkg.add(url = "https://github.com/dmetivie/SmoothPeriodicStatsModels.jl")
Pkg.add(url = "https://github.com/dmetivie/StochasticWeatherGenerators.jl")
using SmoothPeriodicStatsModels # Name might change. Small collection of smooth periodic models e.g. AR, HMM
using StochasticWeatherGenerators # interface to use with SmoothPeriodicStatsModels.jl
Random.seed!(1234)
Random.TaskLocalRNG()
Settings for plotting
Some settings and packages to have nice plots.
using StatsPlots, LaTeXStrings
using StatsPlots.PlotMeasures # To play with margin in Plots
gr() # plotly() # for interactive plots
default(fontfamily="Computer Modern")
cur_colors = get_color_palette(:auto, 100);
my_palette(K) = palette(vcat(cur_colors[1], [cur_colors[c] for c in 3:4], cur_colors[2]), K)
file_for_plot_utilities = download("https://raw.githubusercontent.com/dmetivie/StochasticWeatherGenerators.jl/master/examples/utilities_plot.jl")
include(file_for_plot_utilities)
cyclic (generic function with 1 method)
To plot maps, we use GeoMakie.jl
+ NaturalEarth.jl
. Note that using cartopy
with PyCall.jl
also works very well.
For the following code to work you will need to add the following packages
import Pkg
Pkg.add("NaturalEarth", "GeoMakie", "CairoMakie")
file_for_maps_with_geomakie = download("https://raw.githubusercontent.com/dmetivie/StochasticWeatherGenerators.jl/master/examples/utilities_geo_makie_features.jl") # download file from a GitHub repo
include(file_for_maps_with_geomakie)
savefigcrop (generic function with 4 methods)
Global Parameters
Number of days in a year (choice here is 366)
T = 366
366
Define the French area for map (Longitude and latitude) plot and the precision of the map precision_scale
precision_scale = 50 # meter
LON_min = -5 # West
LON_max = 10 # East
LAT_min = 41 # South
LAT_max = 52 # North
52
conversion_factor
for rain amounts RR
in 0.1 mm to mm
conversion_factor = 0.1 # 0.1 mm -> mm
0.1
HMM Hyperparameters
Number of hidden states
K = 4
my_pal = my_palette(K); # just colors I like for plotting weather regime!
Degree 𝐃𝐞𝐠
of the trigonometric expansion It could be an array different for each station and variable. Not implemented yet though.
𝐃𝐞𝐠 = 1
1
Local memory order i.e. at station $j$, $\mathbb{P}(Y_n^{(j)} = y_n^{(j)} \mid Z = k, Y_{n-1:n-\texttt{local memory}}^{(j)} = y_{n-1:n-\texttt{local memory}}^{(j)})$
local_order = 1
1
The local_order
and/or 𝐃𝐞𝐠
could be a vector/matrix of size D
and different for each station, and also different depending on wet or dry past. Not yet implemented.
size_order = 2^local_order
println("K = $K, ", "local_order = $local_order, ", "degree = $𝐃𝐞𝐠")
K = 4, local_order = 1, degree = 1
Data
Select relevant stations from the station.txt
file
Here we
- Remove white space at the right of the
CN
, STANAME which is caused by imperfect CVS importation - Select only the stations with 100% valid data for the period
Date(1955,12,31) .≤ :DATE .≤ Date(2019,12,31)
- Shorten station names
begin
station_file = Base.download("https://raw.githubusercontent.com/dmetivie/StochasticWeatherGenerators.jl/master/weather_files/stations.txt")
station_all = CSV.read(station_file, DataFrame, header=18, normalizenames=true, ignoreemptyrows=true)
station_all = @chain station_all begin
@transform(:CN = rstrip.(:CN), :STANAME = rstrip.(:STANAME))
# @subset(:CN .∈ tuple(["FR", "BE", "LU", "CH"])) # Choose that if you want to look at all stations in France, Belgium, Luxembourg and Switzerland.
@subset(:STAID .∈ tuple([32, 33, 34, 36, 39, 203, 322, 323, 434, 736, 737, 738, 740, 742, 745, 749, 750, 755, 756, 757, 758, 786, 793, 2192, 2203, 2205, 2207, 2209, 11244, 11245, 11247, 11249]))
@transform(:STANAME = shortname.(:STANAME))
end
end
selected_station_name = ["BOURGES", "TOULOUSE", "MARIGNANE", "LUXEMBOURG", "LILLE", "EMBRUN", "BASTIA", "LA HAGUE", "CHASSIRON", "ORLY"]
10-element Vector{String}:
"BOURGES"
"TOULOUSE"
"MARIGNANE"
"LUXEMBOURG"
"LILLE"
"EMBRUN"
"BASTIA"
"LA HAGUE"
"CHASSIRON"
"ORLY"
You can change the selected stations. However, keep in mind that for the model to work, the conditional independence hypothesis must hold between stations i.e. $\mathbb{P}(Y_1 = y_1, \cdots, Y_S = y_s\mid Z = k) = \prod_{s=1}^S \mathbb{P}(Y_s = y_s)$. Hence stations must be sufficiently far apart. Check out this MNIST example to see Bernoulli mixtures in action!
station = @subset(station_all, :STANAME .∈ tuple(selected_station_name))
STAID = station.STAID #[32, 33, 39, 203, 737, 755, 758, 793, 11244, 11249];
station_name = station.STANAME
10-element Vector{SubString{String}}:
"BOURGES"
"TOULOUSE"
"MARIGNANE"
"LUXEMBOURG"
"LILLE"
"EMBRUN"
"BASTIA"
"LA HAGUE"
"CHASSIRON"
"ORLY"
Sort stations (index) by latitude. It is useful for plotting from North to South.
staid_lat = sortperm(station.LAT, rev=true);
Station number
D = length(STAID)
10
Date range
date_start = Date(1956)
1956-01-01
Date including the previous days used in the initial condition (in case local_memory > 0
)
date_start_w_memory = date_start - Day(local_order)
date_end = Date(2020) - Day(1)
every_year = date_start:Day(1):date_end
every_year_w_memory = date_start_w_memory:Day(1):date_end
n2t = dayofyear_Leap.(every_year)
N = length(n2t)
23376
Treat data
Load into a DataFrame
the (ECA) RR files (rain). It filters by date and valid data. It also adds a column :RO
for rain occurrences (0: dry, 1: wet).
begin
data_stations = collect_data_ECA.(STAID, date_start_w_memory, date_end, "https://raw.githubusercontent.com/dmetivie/StochasticWeatherGenerators.jl/master/weather_files/ECA_blend_rr/RR_", portion_valid_data=1, skipto=22, header=21, url=true)
for i = eachindex(data_stations)
@transform!(data_stations[i], :RO = onefy.(:RR))
end
end
Binary matrix version of the rain event at the D
stations.
Yall = BitMatrix(reduce(hcat, [data_stations[j].RO for j = 1:D]))
Y_past = BitMatrix(Yall[1:local_order, :]) # rand(Bool, local_order, D)
ξ = [1; zeros(K - 1)]; # 1 jan 1956 was most likely a type Z = 1 wet day all over France
Y = Yall[1+local_order:end, :]
23376×10 BitMatrix:
1 1 1 1 1 1 1 1 1 1
1 1 0 1 1 0 1 0 1 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0 1
1 1 1 0 1 0 1 1 1 1
1 1 1 1 1 1 0 0 1 1
0 1 1 1 1 1 1 1 1 0
1 0 0 1 1 1 0 1 1 1
⋮ ⋮
1 1 0 1 1 1 0 1 1 1
1 0 0 1 1 0 0 0 0 0
0 0 0 1 0 0 0 1 1 0
1 0 0 1 1 0 0 1 0 1
1 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0
0 0 0 0 1 0 0 0 1 0
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1 0
Map of stations
Convert LAT/LON coordinates from DMS to DD which seems most widely accepted format.
LAT_idx = dms_to_dd.(station.LAT)
LON_idx = dms_to_dd.(station.LON)
long_spell = [longuest_spell(y) for y in eachcol(Y)]
FR_map_spell = map_with_stations(LON_idx, LAT_idx, long_spell; station_name=station_name, show_value=true, colorbar_show=true, precision_scale = precision_scale, colorbar_label = "Days")
Fit the seasonal HMM
We assume all models e.g. HMM, rain mixture to have parameters evolving smoothly with periodicity $T$ for $t \in [1, T]$. For example a Bernoulli parameter will write
\[p(t) = \dfrac{1}{1 + e^{P(t)}} \in [0, 1],\]
with
\[ P_c(t) = c_0 + \sum_{j=1}^{\texttt{Deg}} \left(c_{2j-1}\cos\left(\dfrac{2\pi}{T}j t\right) + c_{2j}\sin\left(\dfrac{2\pi}{T}j t\right)\right).\]
Fit slice: naive estimation
Before inferring the HMM parameters with the EM (Baum-Welch) algorithm, we do a first naive inference that will be used as initial condition for the EM.
The reference station ref_station
is used to sort the hidden states obtained via the slide initialization Here we choose j=1
$\to$ STAID=32
$\to$ BOURGES
because it is a central station for France
ref_station = 1
1
This generates a random Periodic HMM that we then fit slice by slice (day by day). See paper.
hmm_random = randARPeriodicHMM(K, T, D, local_order; ξ=ξ, ref_station=ref_station);
@time "FitMLE SHMM (Slice)" hmm_slice = fit_mle_all_slices(hmm_random, Y, Y_past; n2t=n2t, robust=true, rand_ini=true, Dirichlet_α=0.8, history=false, n_random_ini=1, Yₜ_extanted=[-12, -7, 0, 6, 13]);
θᴬ_slice, θᴮ_slice = fit_θ!(hmm_slice, 𝐃𝐞𝐠);
FitMLE SHMM (Slice): 27.979578 seconds (343.91 M allocations: 29.818 GiB, 14.78% gc time, 15.72% compilation time)
Fit with Baum Welch using the slice estimate as a starting point
With the Slice estimate as a good starting point for the full (seasonal) Baum Welch EM algorithm we fit the model!
To accelerate the fitting procedure (especially for larger models or when testing various model hyperparameters), one can do
using Distributed
addprocs(10) # number of worker to add
@everywhere using SmoothPeriodicStatsModels # load the pkg on each worker
Then the fitting loop inside fit_mle
will be distributed. See the official Julia doc for more info. On smaller models it does not worth it since adding workers add some compilation and communication time.
@time "FitMLE SHMM (Baum Welch)" hmm_fit, θq_fit, θy_fit, hist, histo_A, histo_B = fit_mle(hmm_slice, θᴬ_slice, θᴮ_slice, Y, Y_past,
maxiter=10000, robust=true; display=:final, silence=true, tol=1e-3, θ_iters=true, n2t=n2t);
EM converged in 58 iterations, logtot = -117127.72964034643
FitMLE SHMM (Baum Welch): 37.021429 seconds (187.89 M allocations: 18.842 GiB, 5.91% gc time, 9.85% compilation time)
Uncomment to load a previously computed hmm
# hmm_infos = load("save_tuto_path/hmm_fit.jld")
# hmm_fit = hmm_infos["hmm"]
# hist = hmm_infos["hist"]
# θq_fit = hmm_infos["Q_param"]
# θy_fit = hmm_infos["Y_param"]
Visualization of the HMM parameters
Transition matrix
begin
pA = [plot(legendfont=14, foreground_color_legend=nothing, background_color_legend=nothing, legend_columns=4, tickfont=12, legendfontsize=16) for k in 1:K]
for k in 1:K
[plot!(pA[k], hmm_fit.A[k, l, :], c=my_color(l, K), label=L"Q_{%$(k)\to %$(l)}", legend=:top, lw=1.75) for l in 1:K]
hline!(pA[k], [0.5], c=:black, label=:none, s=:dot)
xticks!(pA[k], vcat(dayofyear_Leap.(Date.(2000, 1:12)), 366), vcat(string.(monthabbr.(1:12)), ""), xlims=(0, 367), ylims=(0, 1))
end
pallA = plot(pA..., size=(1000, 500))
end
Rain probabilities
begin
mm = 1
jt = D
pB = [plot(legendfont=14, title="$(station_name[j])", titlefontsize=17, tickfont=14, legendfontsize = 16) for j in 1:jt]
for j in 1:jt
[plot!(pB[j], succprob.(hmm_fit.B[k, :, j, mm]), c=my_color(k, K), label=islabel(j, 3, L"\mathbb{P}(Y = \textrm{wet}\mid Z = %$k, H = \textrm{dry})"), lw=2) for k in 1:K]
hline!(pB[j], [0.5], c=:black, label=:none, s=:dot)
xticks!(
pB[j],
vcat(dayofyear_Leap.(Date.(2000, 1:12)), 366),
vcat(string.(first.(monthabbr.(1:12))))
)
xlims!(pB[j], (0, 367))
ylims!(pB[j], (0, 1))
end
pallB = plot(pB[staid_lat]..., size=(3000 / 2, 1000 / 1), layout=(2, 5))
end
Spatial Rain probability
memory_past_cat = 1
1
h = 1 (day before dry) or 2 (day before wet) $\mathbb{P}(Y = \text{Rain}\mid Z = k, H = h)$ with h = memory_past_cat
For now there are some scale rendering issues due to an GeoMakie.jl issue so it might be tiny.
p_FR_map_mean_prob = map_with_stations(LON_idx, LAT_idx, [[mean(succprob.(hmm_fit.B[k, :, j, memory_past_cat])) for j in 1:length(STAID)] for k in 1:K], colorbar_show=true, colorbar_label = L"\mathbb{P}(Y = \text{Rain}\mid Z = k, H = 1)", precision_scale = precision_scale)
Inference of the historical hidden states
Viterbi algorithm
ẑ = viterbi(hmm_fit, Y, Y_past; n2t=n2t)
data_stations_z = map(data_stations) do df
@transform(df, :z = [fill(missing, local_order); ẑ])
end
ẑ_per_cat = [findall(ẑ .== k) for k in 1:K]
4-element Vector{Vector{Int64}}:
[1, 7, 8, 12, 17, 23, 29, 44, 80, 81 … 23356, 23357, 23358, 23359, 23362, 23364, 23365, 23366, 23367, 23368]
[2, 10, 11, 14, 18, 20, 21, 22, 25, 26 … 23328, 23333, 23351, 23352, 23355, 23360, 23361, 23369, 23370, 23371]
[6, 9, 13, 15, 16, 24, 27, 30, 31, 40 … 23325, 23330, 23331, 23334, 23335, 23339, 23345, 23346, 23350, 23363]
[3, 4, 5, 19, 32, 33, 34, 38, 46, 48 … 23295, 23332, 23347, 23348, 23349, 23372, 23373, 23374, 23375, 23376]
Visualization of the historical sequences of hidden states
year_range = unique(year.(data_stations[1][1+local_order:end, :DATE]));
idx_year = [findall(x -> year.(x) == m, data_stations[1][1+local_order:end, :DATE]) for m in year_range];
select_year = unique(sort([4:10:length(year_range); 21; 48; 64]))
begin
year_nb = length(select_year)
z_hat_mat = zeros(year_nb, 366)
for (i, y) in enumerate(select_year)
if isleapyear(year_range[y])
z_hat_mat[i, :] = ẑ[idx_year[y]]
else
z_hat_mat[i, :] = [ẑ[idx_year[y]]; 0]
end
end
thick = 1
heatmap(z_hat_mat, colorbar=:none, c=my_palette(K), minorticks=:false, framestyle=:xbox, grid=:none, thickness_scaling=thick)
xticks!(vcat(dayofyear_Leap.(Date.(2000, 1:12)), 366), vcat(string.(monthabbr.(1:12)), ""), xlims=(0, 367), xtickfontsize=14 / thick, ytickfontsize=14 / thick)
hline!((1:year_nb) .+ 0.5, c=:black, legend=:none, lw=4)
ylims!(0.5, year_nb + 0.5)
pviterbi = yticks!(1:year_nb, string.(year_range[select_year]), size=(1000, 600))
end
Adding Rain amounts to the model
Marginal distribution
We fit the marginals of the rain amount $R>0$ at each station $s$ and for each hidden state $Z$ independently. We use a mixture of exponential functions
\[g(r) = w \dfrac{e^{-{\frac {r}{\vartheta_1}}}}{\vartheta_1} + (1-w) \dfrac{e^{-{\frac {r}{\vartheta_2}}}}{\vartheta_2}.\]
whose parameters $w(t)$, $\vartheta_1(t)$ and $\vartheta_2(t)$ are smooth periodic functions of the day of the year.
@time "FitMLE RR" mix_allE = fit_mle_RR.(data_stations_z, local_order, mix₀=StochasticWeatherGenerators.mix_ini(T));
FitMLE RR: 79.899330 seconds (381.61 M allocations: 36.519 GiB, 5.97% gc time, 1.43% compilation time)
Thanks to Distributions.jl PR #1389 (September 2nd, 2021) and Julia multiple dispatch, the quantile function of Mixtures can be very efficiently computed.
Rain correlations
We fit a Gaussian copula to each pair of stations for joint rainy days only.
For some hidden states corresponding to dry weather, it might happen that for some pair of stations, there are not enough simultaneous rain occurrences in a rain category $Z = k$ to estimate a correlation coefficient. In that case a missing
coefficient is returned by cov_RR
or it returns the value impute_missing
if specified (to avoid missing). The option force_PosDef
(enabled by default) ensures having positive definite matrix. This is necessary to use gaussian copula.
Σ²RR = cov_RR(data_stations_z, K)
4-element Vector{Matrix{Float64}}:
[3879.3410237455955 643.1969973888454 … 1044.722558290334 1422.1412369837526; 643.1969973888454 3906.8065809319683 … 421.21726685970293 261.80966034462176; … ; 1044.722558290334 421.21726685970293 … 3782.5212727685685 797.3469589896836; 1422.1412369837526 261.80966034462176 … 797.3469589896836 3100.798265766617]
[2301.9313610952054 485.18478831568194 … 770.578089634074 863.0300234394578; 485.18478831568194 2006.8630718718769 … 428.13538892508905 271.67257203655475; … ; 770.578089634074 428.13538892508905 … 2440.5330122895875 592.8827865026185; 863.0300234394578 271.67257203655475 … 592.8827865026185 1893.078927735636]
[2948.5928743209097 750.2824568975376 … 621.8223025654515 827.3791552074431; 750.2824568975376 5838.066822339216 … 369.18199793741627 352.11239948672414; … ; 621.8223025654515 369.18199793741627 … 2871.8548379173185 652.5033493795553; 827.3791552074431 352.11239948672414 … 652.5033493795553 2070.968133316776]
[2578.6677735877342 939.3858870148969 … 533.7956218371148 -108.18202437342076; 939.3858870148969 3052.544041679537 … 895.8296379061048 212.7174426012282; … ; 533.7956218371148 895.8296379061048 … 1706.7038943029056 -415.34586196345674; -108.18202437342076 212.7174426012282 … -415.34586196345674 803.7842493374762]
Simulation
Now we are ready to generate samples from the SWG model.
Nb
is the number of realization
Nb = 1000
1000
Sample the (seasonal) HMM model and output the sequence of hidden states and multi-site dry/wet.
begin
zs = zeros(Int, N, Nb)
ys = zeros(Bool, N, D, Nb)
@time "Simulations Z, Y" for i in 1:Nb
zs[:, i], ys[:, :, i] = rand(hmm_fit, n2t; y_ini=Yall[1:local_order, :], z_ini=1, seq=true)
end
end
Simulations Z, Y: 21.397614 seconds (374.64 M allocations: 23.083 GiB, 15.32% gc time, 2.07% compilation time)
Given the hidden states and dry/wet, it generates the rain amounts at each station (correlated with a Gaussian Copula).
begin
rs = zeros(D, N, Nb)
@time "Simulations RR>0" for i in 1:Nb
rs[:, :, i] = rand_RR(mix_allE, n2t, zs[:, i], ys[:, :, i]', Σ²RR)
end
end
Simulations RR>0: 256.679054 seconds (447.21 M allocations: 35.475 GiB, 2.13% gc time, 0.22% compilation time)
Results
Spell distribution
select_month
to choose the month where to compute the spell distributions (summer month, winter, etc.) select_month = 1:12
corresponds to all months.
select_month = 1:12
idx_months = [findall(x -> month.(x) == m, data_stations[1][1+local_order:end, :DATE]) for m in 1:12]
idx_month_vcat = vcat(idx_months[select_month]...)
idx_all = [intersect(yea, mon) for yea in idx_year, mon in idx_months];
Historic spells
len_spell_hist = [pmf_spell(Y[idx_month_vcat, j], dw) for j in 1:D, dw in 0:1];
Simulation spells
len_spell_simu = [pmf_spell(ys[idx_month_vcat, j, i], dw) for i in 1:Nb, j in 1:D, dw in 0:1];
Dry spell
make_range(y, step=1) = range(extrema(y)..., step=step)
begin
dry_or_wet = 1 # dry
p_spell_dry = [plot(ylims=(1e-4, 1e-0), tickfont=11, legendfontsize=13) for j = 1:D]
for j = 1:D
all_spells = len_spell_simu[:, j, dry_or_wet]
errorlinehist!(p_spell_dry[j], all_spells, groupcolor=:grey, legend=:topright, label=islabel(j, staid_lat[[1]], L"Simu $q_{0,100}$"), norm=:probability, bins=make_range(reduce(vcat, all_spells)), errortype=:percentile, percentiles=[0, 100], fillalpha=0.4, centertype=:median)
errorlinehist!(p_spell_dry[j], all_spells, groupcolor=:red, label=islabel(j, staid_lat[[1]], L"Simu $q_{25,75}$"), norm=:probability, bins=make_range(reduce(vcat, all_spells)), errortype=:percentile, percentiles=[25, 75], fillalpha=0.5, centertype=:median)
histo_spell = len_spell_hist[j, dry_or_wet]
errorlinehist!(p_spell_dry[j], [histo_spell], label=islabel(j, staid_lat[[1]], "Obs"), groupcolor=:blue, lw=1.5, norm=:probability, bins=make_range(histo_spell), errortype=:percentile)
xlims!(p_spell_dry[j], 0, 2 + maximum(1.5maximum.(histo_spell)))
yaxis!(:log10)
end
[xlabel!(p_spell_dry[j], "Nb of days", xlabelfontsize=12) for j in staid_lat[6:10]]
[ylabel!(p_spell_dry[j], "PMF", ylabelfontsize=12) for j in staid_lat[[1, 6]]]
[title!(p_spell_dry[j], station_name[j], titlefontsize=13) for j = 1:D]
pall_spell_dry = plot(p_spell_dry[staid_lat]..., size=(3000 / 2.5, 1000 / 1.5), layout=(2, 5), left_margin=0.5cm, bottom_margin=0.275cm)
end
Wet spell
begin
dry_or_wet = 2 # wet
p_spell_wet = [plot(ylims=(1e-4, 1e-0), tickfont=11, legendfontsize=13) for j = 1:D]
for j = 1:D
all_spells = len_spell_simu[:, j, dry_or_wet]
errorlinehist!(p_spell_wet[j], all_spells, groupcolor=:grey, legend=:topright, label=islabel(j, staid_lat[[1]], L"Simu $q_{0,100}$"), norm=:probability, bins=make_range(reduce(vcat, all_spells)), errortype=:percentile, percentiles=[0, 100], fillalpha=0.4, centertype=:median)
errorlinehist!(p_spell_wet[j], all_spells, groupcolor=:red, label=islabel(j, staid_lat[[1]], L"Simu $q_{25,75}$"), norm=:probability, bins=make_range(reduce(vcat, all_spells)), errortype=:percentile, percentiles=[25, 75], fillalpha=0.5, centertype=:median)
histo_spell = len_spell_hist[j, dry_or_wet]
errorlinehist!(p_spell_wet[j], [histo_spell], label=islabel(j, staid_lat[[1]], "Obs"), groupcolor=:blue, lw=1.5, norm=:probability, bins=make_range(histo_spell), errortype=:percentile)
xlims!(p_spell_wet[j], 0, 2 + maximum(1.5maximum.(histo_spell)))
yaxis!(:log10)
end
[xlabel!(p_spell_wet[j], "Nb of days", xlabelfontsize=12) for j in staid_lat[6:10]]
[ylabel!(p_spell_wet[j], "PMF", ylabelfontsize=12) for j in staid_lat[[1, 6]]]
[title!(p_spell_wet[j], station_name[j], titlefontsize=13) for j = 1:D]
pall_spell_wet = plot(p_spell_wet[staid_lat]..., size=(3000 / 2.5, 1000 / 1.5), layout=(2, 5), left_margin=0.5cm, bottom_margin=0.275cm)
end
Rain
Interpretation: Mean Rain per weather regime $R > 0 \mid Z = k$.
We plot the empirical (strictly) positive mean rain amounts per weather regime. The results are smoothed using a cyclic_moving_average
with a time window of $\pm 15$ days and the Epanechnikov kernel.
begin
p_rainpercat = [plot(tickfont=12, ylabelfontsize=14, titlefontsize=14, legendfontsize=13) for j = 1:D]
for j = 1:D
df_j = @chain data_stations_z[j] begin
dropmissing
@transform(:day = dayofyear_Leap.(:DATE))
@subset(:RR .> 0)
@by([:day, :z], :MEAN_RR = mean(:RR))
groupby(:z)
end
# Uncomment to see how the double exponential mixtures compare to the empirical data.
# [plot!(p_rainpercat[j], 1:T, t -> conversion_factor * mean(mix_allE[j][k, t]), label=:none, c=my_color(k, K), lw=1.5, legend = :topleft) for k in 1:K]
for k in 1:K
cycle_avg = replace(cyclic_moving_average(df_j[k].MEAN_RR, df_j[k].day, T, 15), 0 => missing)
@df df_j[k] plot!(p_rainpercat[j], 1:T, conversion_factor * cycle_avg, c=my_color(k, K), alpha=1, label=islabel(j, staid_lat[[4]], L"Z = %$k"), lw=1.5)
end
ylims!(p_rainpercat[j], 0, Inf)
end
[ylabel!(p_rainpercat[j], L"Rain (mm/m$^2$)") for j in staid_lat[[1, 6]]]
[xticks!(
p_rainpercat[j],
vcat(dayofyear_Leap.(Date.(2000, 1:12)), 366),
vcat(string.(first.(string.(monthabbr.(1:12)))))
) for j in 1:D]
[title!(p_rainpercat[j], station_name[j]) for j = 1:D]
plt_rain_cat_mix = plot(p_rainpercat[staid_lat]..., size=(3000 / 2.2, 1000 / 1.5), layout=(2, 5), left_margin=25px)
end
Univariate Rain distributions
Historical vs Nb
simulations distribution
begin
p_uniR = [plot(yaxis=:log10, ylims=(1e-4, 1e-0), tickfont=11, legendfontsize=13, titlefontsize=13) for j = 1:D]
for j = 1:D
dists_RR_positive_j = conversion_factor * [filter(!iszero, rs[j, :, i]) for i in 1:Nb]
errorlinehist!(p_uniR[j], dists_RR_positive_j, groupcolor=:grey, legend=:topright, label=islabel(j, staid_lat[[1]], L"Simu $q_{0,100}$"), norm=:pdf, errortype=:percentile, percentiles=[0, 100], fillalpha=0.4, centertype=:median)
errorlinehist!(p_uniR[j], dists_RR_positive_j, groupcolor=:red, label=islabel(j, staid_lat[[1]], L"Simu $q_{25,75}$"), norm=:pdf, errortype=:percentile, percentiles=[25, 75], fillalpha=0.5, centertype=:median)
errorlinehist!(p_uniR[j], [conversion_factor * filter(!iszero, data_stations[j].RR)], label=islabel(j, staid_lat[[1]], "Obs"), groupcolor=:blue, lw=1.5, norm=:pdf, errortype=:percentile)
xlims!(p_uniR[j], 0.0, Inf)
end
[plot!(p_uniR[j], xlabel=L"Rain (mm/m$^2$)") for j in staid_lat[6:10]]
[plot!(p_uniR[j], ylabel="PDF") for j in staid_lat[[1, 6]]]
[title!(p_uniR[j], station_name[j]) for j = 1:D]
pall_R = plot(p_uniR[staid_lat]..., size=(3000 / 2.5, 1000 / 1.5), layout=(2, 5), bottom_margin=11px, left_margin=15px)
end
Monthly quantile amount
rh = reduce(hcat, [df[1+local_order:end, :RR] for df in data_stations])
month_rain_simu = [monthly_agg(rs[j, :, i], idx_all) for j in 1:D, i in 1:Nb];
month_rain_histo = [monthly_agg(rh[:, j], idx_all) for j in 1:D]
qs = [0.9, 0.5, 0.1]
@time "Plot monthly quantile" begin
p_month_RR = [scatter(xtickfontsize=10, ytickfontsize=11, ylabelfontsize=12, legendfontsize = 12, foreground_color_legend=nothing) for j = 1:D]
for j = 1:D
for (α, per) in enumerate([[0, 100], [25, 75]])
for (cc, q) in enumerate(qs)
errorline!(p_month_RR[j], [quantile(month_rain_simu[j, i][:, m], q) * conversion_factor for m in 1:12, i in 1:Nb], label=(α == 1 ? islabel(j, 9,L"Simu $q_{%$(Int(q*100))}$") : :none), fillalpha=0.18 * α^2, centertype=:median, errortype=:percentile, percentiles=per, groupcolor=my_palette(length(qs))[cc])
end
end
for q in qs
scatter!(p_month_RR[j], m -> quantile(month_rain_histo[j][:, m], q) * conversion_factor, 1:12, label=(q == qs[1] ? islabel(j, 3,"Obs") : :none), legend = :topleft, ms=2.5, c=:blue)
plot!(p_month_RR[j], m -> quantile(month_rain_histo[j][:, m], q) * conversion_factor, 1:12, label=:none, c=:blue, lw=1.75)
end
xticks!(p_month_RR[j], 1:12, string.(first.(monthabbr.(1:12))))
ylims!(p_month_RR[j], 0, Inf)
end
[ylabel!(p_month_RR[j], L"Rain (mm/m$^2$)") for j in staid_lat[[1, 6]]]
[title!(p_month_RR[j], station_name[j], titlefontsize=12) for j = 1:D]
pall_month_RR = plot(p_month_RR[staid_lat]..., size=(1190, 500), layout=(2, 5), left_margin=19px)
end
Correlations
Rain event dry/wet
cor_bin_hist = cor(reduce(hcat, [df.RO for df in data_stations]));
cor_bin_mean_simu = mean(cor(ys[:, :, i]) for i in 1:Nb);
begin
plots_cor_bin = [plot(-0.1:0.1:0.8, -0.1:0.1:0.8, aspect_ratio=true, label=:none, xlabelfontsize=16, ylabelfontsize=16, tickfont=11, legendfontsize=13) for _ in 1:1]
scatter!(plots_cor_bin[1], vec_triu(cor_bin_hist), vec_triu(cor_bin_mean_simu), label="Correlations", xlabel="Observations", ylabel="Simulations", c=2)
[xlims!(plots_cor_bin[i], -0.1, 1) for i in 1:1]
[ylims!(plots_cor_bin[i], -0.1, 1) for i in 1:1]
annotate!(0.2, 0.7, "MSE ≃ $(round(mean(abs2, vec_triu(cor_bin_hist) - vec_triu(cor_bin_mean_simu)), digits = 4))")
plot_cor_bin = plot(plots_cor_bin...)
end
The largest pair correlation error for rain occurence comes from the pair
println("$(station_name[findmax(cor_bin_hist - cor_bin_mean_simu)[2][1]]) and $(station_name[findmax(cor_bin_hist - cor_bin_mean_simu)[2][2]])")
CHASSIRON and LA HAGUE
Rain amount
cor_hist = cor(reduce(hcat, [df.RR for df in data_stations]));
corT_hist = corTail(reduce(hcat, [df.RR for df in data_stations]));
cor_mean_simu = mean(cor(rs[:, :, i]') for i in 1:Nb);
corT_mean_simu = mean(corTail(rs[:, :, i]') for i in 1:Nb);
begin
plots_cor = [plot(-0.1:0.1:0.8, -0.1:0.1:0.8, aspect_ratio=true, label=:none, xlabelfontsize=16, ylabelfontsize=16, tickfont=11, legendfontsize=13) for _ in 1:2]
scatter!(plots_cor[1], vec_triu(cor_hist), vec_triu(cor_mean_simu), label="Correlations", xlabel="Observations", ylabel="Simulations", c=2)
annotate!(plots_cor[1], 0.3, 0.7, "MSE ≃ $(round(mean(abs2, vec_triu(cor_hist) - vec_triu(cor_mean_simu)), digits = 4))")
scatter!(plots_cor[2], vec_triu(corT_hist), vec_triu(corT_mean_simu), label="Tail index", xlabel="Observations", ylabel="Simulations", c=3)
annotate!(plots_cor[2], 0.3, 0.7, "MSE ≃ $(round(mean(abs2, vec_triu(corT_hist) - vec_triu(corT_mean_simu)), digits = 4))")
[xlims!(plots_cor[i], -0.1, 1) for i in 1:2]
[ylims!(plots_cor[i], -0.1, 1) for i in 1:2]
plot_cor_all = plot(plots_cor..., size=(800, 400), left_margin=15px, right_margin = 8px)
end
The largest pair correlation error for rain (zero and non zero amounts) comes from the pair
println("$(station_name[findmax(cor_hist - cor_mean_simu)[2][1]]) and $(station_name[findmax(cor_hist - cor_mean_simu)[2][2]])")
EMBRUN and MARIGNANE
Gaussian copula hypothesis
For a pair of stations, we transform the marginal $R_s>0$ to $\mathcal{N}(0,1)$. We compare the obtained bi-variate Normal distribution with the Mahalanobis distance to the theoretical $\chi^2(2)$-distriubtion.
corΣ = cov2cor.(Σ²RR)
begin
j1 = 10
j2 = 8
plt_qqp_copula = plot(0:25, 0:25, aspect_ratio=:equal, legendfontsize=14, c=:black, label=:none, tickfont=12, ylabelfontsize=13, xlabelfontsize=13)
df_12 = leftjoin(data_stations_z[j1], data_stations_z[j2], on=:DATE, makeunique=true)
@subset!(df_12, :RR .> 0, :RR_1 .> 0)
for k in 1:K
df_X = @chain df_12 begin
@subset(:z .== k)
dropmissing
@aside u = StochasticWeatherGenerators.Copulas.pseudos(permutedims(hcat(_.RR, _.RR_1)))
@transform(:X = quantile(Normal(), u[1,:]), :X_1 = quantile(Normal(), u[2,:]))
end
X = hcat(df_X.X, df_X.X_1)
cor_sigma = [1 corΣ[k][j1,j2]; corΣ[k][j1,j2] 1]
Σ⁻¹ = inv(cor_sigma)
X2 = [(x' * Σ⁻¹ * x) for x in eachrow(X)] |> sort
ecdfX2 = ecdf(X2)(X2) * length(X2) / (length(X2) + 1)
plot!(quantile(Chisq(2), ecdfX2), X2, xlabel=L"$\chi^2(2)$-quantile", c=my_color(k, K), ylabel="Observed squared Mahalanobis distance", label=L"Z = %$k ", legend=:topleft, lw=2)
end
title!("$(station_name[j1])-$(station_name[j2]) $(ifelse(j1 == 10 && j2 == 8, "(334 km)", ""))")
xlims!(0, 22)
ylims!(0, 22)
end
Reproducibility
using InteractiveUtils
InteractiveUtils.versioninfo()
Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 4 × AMD EPYC 7763 64-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
Package list and version
import Pkg; Pkg.status()
Status `~/work/StochasticWeatherGenerators.jl/StochasticWeatherGenerators.jl/docs/Project.toml`
[336ed68f] CSV v0.10.15
⌃ [13f3f980] CairoMakie v0.12.18
[a93c6f00] DataFrames v1.7.0
[1313f7d8] DataFramesMeta v0.15.4
[8bb1440f] DelimitedFiles v1.9.1
[31c24e10] Distributions v0.25.116
[e30172f5] Documenter v1.8.0
[db073c08] GeoMakie v0.7.10
[4138dd39] JLD v0.13.5
[b964fa9f] LaTeXStrings v1.4.0
[98b081ad] Literate v2.20.1
[436b0209] NaturalEarth v0.1.0
[3d8e1505] SmoothPeriodicStatsModels v2.0.1-DEV
[2913bbd2] StatsBase v0.34.4
[f3b207a7] StatsPlots v0.15.7
[3eadb0b9] StochasticWeatherGenerators v1.2.1-DEV `~/work/StochasticWeatherGenerators.jl/StochasticWeatherGenerators.jl`
[fd094767] Suppressor v0.2.8
[ade2ca70] Dates v1.11.0
[b77e0a4c] InteractiveUtils v1.11.0
[9a3f8284] Random v1.11.0
Info Packages marked with ⌃ have new versions available and may be upgradable.
This page was generated using Literate.jl.