StochasticWeatherGenerators.jl
Documentation for StochasticWeatherGenerators.jl
In construction! Note that the main functions to fit HMM, AR etc are currently in SmoothPeriodicStatsModels.jl. This will change when these packages are rebased.
A Julia package, to define, fit and use a Stochastic Weather Generator (SWG) as proposed in the Interpretable Seasonal Hidden Markov Model for spatio-temporal stochastic rain generation in France paper. This SWG relies on some "Seasonal Hidden Markov Models" currently implemented in the package SmoothPeriodicStatsModels.jl.
The objective of this package is not only to show my model, but also to propose several classic (and newer) SWG model. Hence, feel free to open an issue or open PR with ideas and models. This would allow easy model comparison and, in some cases, combination. I'll try to implement the simple (and historic) model, i.e. the Richardson - Water resources research, 1981.
Go check the documentation and the fully reproducible tutorial associated with the paper.
Stochastics Weather Generators are probabilistic weather models. Like random number generators, they can quickly generate multiple random sequences, except that the produced sequences correctly reproduce some statistics of interest, e.g. spatial-temporal correlations, extremes, etc. They can be used to study climate variability.
API
Fit function
SmoothPeriodicStatsModels.fit_mle_stations
— Functionfit_mle_stations(df::DataFrame, K, T, degree, local_order)
Given a DataFrame df
with known hidden states column z ∈ 1:K
. The rain occurrences of the new station are fitted conditionally to the hidden state. For local_order>0
the model is also autoregressive with its past.
StochasticWeatherGenerators.fit_mle_RR
— Functionfit_mle_RR(df::DataFrame, K, local_order; maxiter=5000, tol=2e-4, robust=true, silence=true, warm_start=true, display=:none, mix₀=mix_ini(T))
mix_allE = fit_mle_RR.(data_stations, K, local_order)
StochasticWeatherGenerators.fit_TN
— Functionfit_TN(df_full::DataFrame, 𝐃𝐞𝐠, T; kwargs...)
Fit the variable TN
(daily minimum temperature). In fact it fits the difference ΔT = TX - TN
to ensure a positive difference between TX
and TN
StochasticWeatherGenerators.fit_AR1
— Functionfit_AR1(df_full::DataFrame, X, 𝐃𝐞𝐠, T, K)
Fit a Seasonal AR(1) model of period T
and with K
hidden states for the variable X
of the DataFrame df_full
. $X_{n+1} = \mu(t_n) + \phi(t_n) X_t + \sigma(t_n)\xi$
Climate indexes
StochasticWeatherGenerators.VCX3
— FunctionVCX3(df; y_col, nb = 3)
Yearly Max of nb = 3
days sliding mean for y
for every year. By default, y_col
is the first column not with a Date
type
using DataFrames, Dates, RollingFunctions
time_range = Date(1956):Day(1):Date(2019,12,31)
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
VCX3(df)
VCX3(y, idxs; nb = 3)
Yearly Max of nb = 3
days sliding mean for y
. Here idxs
can be a vector of vector (or range) corresponds to the index of every year.
using DataFrames, Dates, RollingFunctions
time_range = Date(1956):Day(1):Date(2019,12,31)
year_range = unique(year.(time_range))
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
idx_year = [findall(x-> year.(x) == m, df[:, :DATE]) for m in year_range]
VCX3(df.Temperature, idx_year)
StochasticWeatherGenerators.cum_monthly
— Functioncum_monthly(y::AbstractArray, idxs)
using DataFrames, Dates, RollingFunctions
time_range = Date(1956):Day(1):Date(2019,12,31)
year_range = unique(year.(time_range))
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
idx_year = [findall(x-> year.(x) == m, df[:, :DATE]) for m in year_range]
idx_month = [findall(x-> month.(x) == m, df[:, :DATE]) for m in 1:12]
idx_all = [intersect(yea, mon) for yea in idx_year, mon in idx_month]
cum_monthly(y, idx_all)
StochasticWeatherGenerators.corTail
— FunctioncorTail(x::AbstractMatrix, q = 0.95)
Compute the (symmetric averaged) tail index matrix M
of a vector x, i.e. M[i, j] = (ℙ(x[:,j] > Fxⱼ(q) ∣ x[:,i] > Fxᵢ(q)) + ℙ(x[:,i] > Fxᵢ(q) ∣ x[:,j] > Fxⱼ(q)))/2 where Fx(q) is the CDF of x. Note it uses the same convention as cor
function i.e. observations in rows and features in column.
StochasticWeatherGenerators.longuest_spell
— Functionlonguest_spell(y::AbstractArray; value=0)
Compute the length of the longuest consecutive sequence of value
in y
StochasticWeatherGenerators.pmf_spell
— Functionpmf_spell(y::AbstractVector, value)
Return the distribution of spells (consecutive sequence of with the same value) length of value
in y
Simulations
StochasticWeatherGenerators.rand_RR
— Functionrand_RR(mixs::AbstractArray{<:MixtureModel}, n2t::AbstractVector, z::AbstractVector, y::AbstractMatrix, Σk::AbstractArray)
Generate a (nonhomegenous) sequence of length length(n2t)
of rain amounts conditionally to a given dry/wet matrix y
and (hidden) state sequence z
. Univariate distribution are given by mixs
while correlations are given by covariance matrix Σk.
StochasticWeatherGenerators.rand_cond
— Functionrand_cond(ϵ, z, θ_uni, θ_cor, n2t, T)
Genererate a random variable conditionnaly to another one Using Copula
\[X_1 \mid X_2 = ϵ \sim \mathcal{N}\left(\mu_1 + \dfrac{\sigma_1}{\sigma_2}\rho (a - \mu_2), (1-\rho^2)\sigma_1^2 \right)\]
For two N(0,1)
\[X_1 \mid X_2 = ϵ \sim \mathcal{N}\left(\rho a , (1-\rho^2) \right)\]
Correlation utilities
For temperature
StochasticWeatherGenerators.cor_groupby
— FunctionCompute and fit the `cor` between two `var` with a smooth function for each `z`.
StochasticWeatherGenerators.cor_groupbyTXTN
— FunctionCompute and fit the `cor` between `:TX` and `:TX-:TN` with a smooth function for each `z`.
StochasticWeatherGenerators.cov_ar1
— FunctionFit residual to constant (in time) cov matrices for each weather regime Example:
cov_ar1(data_stations, ar1sTX, :TX, K)
For rain
StochasticWeatherGenerators.cov_RR
— Functioncov_RR(dfs::AbstractArray{<:DataFrame}, K)
Each df must have :DATE, :RR, :z (same :z for each df)
Σ²RR = cov_RR(data_stations, K)
StochasticWeatherGenerators.Σ_Spearman2Pearson
— FunctionΣ_Spearman2Pearson(M::AbstractMatrix)
Compute the Pearson correlation coefficient i.e. the classic one from the Spearman correlation #TODO Add ref
StochasticWeatherGenerators.Σ_Kendall2Pearson
— FunctionΣ_Kendall2Pearson(M::AbstractMatrix)
Compute the Pearson correlation coefficient i.e. the classic one from the Kendall correlation #TODO Add ref
StochasticWeatherGenerators.joint_rain
— Functionjoint_rain(M::AbstractMatrix, j1::Integer, j2::Integer, r = 0)
Select all the rows of M
with values for (two) columns above some value r
.
Map utilities
StochasticWeatherGenerators.distance_x_to_y
— Functiondistance_x_to_y
Distance in km between two stations.
StochasticWeatherGenerators.dms_to_dd
— Functiondms_to_dd(l)
Convert Degrees Minutes Seconds to Decimal Degrees. Inputs are strings of the form
- LAT : Latitude in degrees:minutes:seconds (+: North, -: South)
- LON : Longitude in degrees:minutes:seconds (+: East, -: West)
Data manipulation
StochasticWeatherGenerators.collect_data_ECA
— Functioncollect_data_ECA(STAID::Integer, path::String, var::String="RR"; skipto=19, header = 18)
path
gives the path where all data files are stored in a vector
collect_data_ECA(STAID, date_start::Date, date_end::Date, path::String, var::String="RR"; portion_valid_data=1, skipto=19, header = 18, return_nothing = true)
path
gives the path where all data files are stored in a vector- Filter the
DataFrame
s.t.date_start ≤ :DATE ≤ date_end
- var = "RR", "TX" etc.
portion_valid_data
is the portion of valid data we are ok with. If we don't want any missing, fix it to1
.skipto
andheader
forcsv
files with meta informations/comments at the beginning of files. SeeCSV.jl
.return_nothing
iftrue
it will returnnothing
is the file does not exists or does not have enough valid data.
StochasticWeatherGenerators.select_in_range_df
— Functionselect_in_range_df(datas, start_Date, interval_Date, [portion])
Select station with some data availability in dates and quality (portion of valid data). Input is a vector
(array) of DataFrame
(one for each station for example) or a Dict
of DataFrame
. If 0 < portion ≤ 1
is specified, it will authorize some portion of data to be missing.
StochasticWeatherGenerators.shortname
— Functionshortname(name::String)
Experimental function that returns only the most relevant part of a station name.
long_name = "TOULOUSE-BLAGNAC"
shortname(long_name) # "TOULOUSE"
Generic utilities
StochasticWeatherGenerators.my_color
— Functionmy_color(k, K)
Convenience for plot colors pattern and hidden states to blue for k=1 (∼wetter) and orange for k=K (∼driest)