StochasticWeatherGenerators.jl

Documentation for StochasticWeatherGenerators.jl

Warning

In construction! Note that the main functions to fit HMM, AR etc are currently in SmoothPeriodicStatsModels.jl. This will change when these packages are rebased.

A Julia package, to define, fit and use a Stochastic Weather Generator (SWG) as proposed in the Interpretable Seasonal Hidden Markov Model for spatio-temporal stochastic rain generation in France paper. This SWG relies on some "Seasonal Hidden Markov Models" currently implemented in the package SmoothPeriodicStatsModels.jl.

Note

The objective of this package is not only to show my model, but also to propose several classic (and newer) SWG model. Hence, feel free to open an issue or open PR with ideas and models. This would allow easy model comparison and, in some cases, combination. I'll try to implement the simple (and historic) model, i.e. the Richardson - Water resources research, 1981.

Go check the documentation and the fully reproducible tutorial associated with the paper.

Stochastics Weather Generators

Stochastics Weather Generators are probabilistic weather models. Like random number generators, they can quickly generate multiple random sequences, except that the produced sequences correctly reproduce some statistics of interest, e.g. spatial-temporal correlations, extremes, etc. They can be used to study climate variability.

API

Fit function

SmoothPeriodicStatsModels.fit_mle_stationsFunction
fit_mle_stations(df::DataFrame, K, T, degree, local_order)

Given a DataFrame df with known hidden states column z ∈ 1:K. The rain occurrences of the new station are fitted conditionally to the hidden state. For local_order>0 the model is also autoregressive with its past.

source
StochasticWeatherGenerators.fit_mle_RRFunction
fit_mle_RR(df::DataFrame, K, local_order; maxiter=5000, tol=2e-4, robust=true, silence=true, warm_start=true, display=:none, mix₀=mix_ini(T))
mix_allE = fit_mle_RR.(data_stations, K, local_order)
source
StochasticWeatherGenerators.fit_TNFunction
fit_TN(df_full::DataFrame, 𝐃𝐞𝐠, T; kwargs...)

Fit the variable TN (daily minimum temperature). In fact it fits the difference ΔT = TX - TN to ensure a positive difference between TX and TN

source
StochasticWeatherGenerators.fit_AR1Function
fit_AR1(df_full::DataFrame, X, 𝐃𝐞𝐠, T, K)

Fit a Seasonal AR(1) model of period T and with K hidden states for the variable X of the DataFrame df_full. $X_{n+1} = \mu(t_n) + \phi(t_n) X_t + \sigma(t_n)\xi$

source

Climate indexes

StochasticWeatherGenerators.VCX3Function
VCX3(df; y_col, nb = 3)

Yearly Max of nb = 3 days sliding mean for y for every year. By default, y_col is the first column not with a Date type

using DataFrames, Dates, RollingFunctions
time_range = Date(1956):Day(1):Date(2019,12,31)
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
VCX3(df)
source
VCX3(y, idxs; nb = 3)

Yearly Max of nb = 3 days sliding mean for y. Here idxs can be a vector of vector (or range) corresponds to the index of every year.

using DataFrames, Dates, RollingFunctions
time_range = Date(1956):Day(1):Date(2019,12,31)
year_range = unique(year.(time_range))
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
idx_year = [findall(x-> year.(x) == m, df[:, :DATE]) for m in year_range]
VCX3(df.Temperature, idx_year)
source
StochasticWeatherGenerators.cum_monthlyFunction
cum_monthly(y::AbstractArray, idxs)
using DataFrames, Dates, RollingFunctions
time_range = Date(1956):Day(1):Date(2019,12,31)
year_range = unique(year.(time_range))
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
idx_year = [findall(x-> year.(x) == m, df[:, :DATE]) for m in year_range]
idx_month = [findall(x-> month.(x) == m, df[:, :DATE]) for m in 1:12]
idx_all = [intersect(yea, mon) for yea in idx_year, mon in idx_month]
cum_monthly(y, idx_all)
source
StochasticWeatherGenerators.corTailFunction
corTail(x::AbstractMatrix, q = 0.95)

Compute the (symmetric averaged) tail index matrix M of a vector x, i.e. M[i, j] = (ℙ(x[:,j] > Fxⱼ(q) ∣ x[:,i] > Fxᵢ(q)) + ℙ(x[:,i] > Fxᵢ(q) ∣ x[:,j] > Fxⱼ(q)))/2 where Fx(q) is the CDF of x. Note it uses the same convention as cor function i.e. observations in rows and features in column.

source

Simulations

StochasticWeatherGenerators.rand_RRFunction
rand_RR(mixs::AbstractArray{<:MixtureModel}, n2t::AbstractVector, z::AbstractVector, y::AbstractMatrix, Σk::AbstractArray)

Generate a (nonhomegenous) sequence of length length(n2t) of rain amounts conditionally to a given dry/wet matrix y and (hidden) state sequence z. Univariate distribution are given by mixs while correlations are given by covariance matrix Σk.

source
StochasticWeatherGenerators.rand_condFunction
rand_cond(ϵ, z, θ_uni, θ_cor, n2t, T)

Genererate a random variable conditionnaly to another one Using Copula

\[X_1 \mid X_2 = ϵ \sim \mathcal{N}\left(\mu_1 + \dfrac{\sigma_1}{\sigma_2}\rho (a - \mu_2), (1-\rho^2)\sigma_1^2 \right)\]

For two N(0,1)

\[X_1 \mid X_2 = ϵ \sim \mathcal{N}\left(\rho a , (1-\rho^2) \right)\]

source

Correlation utilities

For temperature

For rain

Map utilities

StochasticWeatherGenerators.dms_to_ddFunction
dms_to_dd(l)

Convert Degrees Minutes Seconds to Decimal Degrees. Inputs are strings of the form

  • LAT : Latitude in degrees:minutes:seconds (+: North, -: South)
  • LON : Longitude in degrees:minutes:seconds (+: East, -: West)
source

Data manipulation

StochasticWeatherGenerators.collect_data_ECAFunction
collect_data_ECA(STAID::Integer, path::String, var::String="RR"; skipto=19, header = 18)

path gives the path where all data files are stored in a vector

source
collect_data_ECA(STAID, date_start::Date, date_end::Date, path::String, var::String="RR"; portion_valid_data=1, skipto=19, header = 18, return_nothing = true)
  • path gives the path where all data files are stored in a vector
  • Filter the DataFrame s.t. date_start ≤ :DATE ≤ date_end
  • var = "RR", "TX" etc.
  • portion_valid_data is the portion of valid data we are ok with. If we don't want any missing, fix it to 1.
  • skipto and header for csv files with meta informations/comments at the beginning of files. See CSV.jl.
  • return_nothing if true it will return nothing is the file does not exists or does not have enough valid data.
source
StochasticWeatherGenerators.select_in_range_dfFunction
select_in_range_df(datas, start_Date, interval_Date, [portion])

Select station with some data availability in dates and quality (portion of valid data). Input is a vector (array) of DataFrame (one for each station for example) or a Dict of DataFrame. If 0 < portion ≤ 1 is specified, it will authorize some portion of data to be missing.

source
StochasticWeatherGenerators.shortnameFunction
shortname(name::String)

Experimental function that returns only the most relevant part of a station name.

long_name = "TOULOUSE-BLAGNAC"
shortname(long_name) # "TOULOUSE"
source

Generic utilities