API

Fit function

SmoothPeriodicStatsModels.fit_mle_ROFunction
fit_mle_RO(df::DataFrame, K, T, degree, local_order)

Given a DataFrame df with known hidden states column z ∈ 1:K. The rain occurrences of the new station are fitted conditionally to the hidden state. For local_order>0 the model is also autoregressive with its past.

source
StochasticWeatherGenerators.fit_mle_RRFunction
fit_mle_RR(df::DataFrame, local_order,  K = length(unique(df.z)); maxiter=5000, tol=2e-4, robust=true, silence=true, warm_start=true, display=:none, mix₀=mix_ini(length(unique(dayofyear_Leap.(df.DATE)))))

Fit the strictly positive rain amounts RR>0 distribution $g_{k,t}(r)$ w.r.t. to each hidden states k∈[1,K] (provided in a column of df.z). The fitted model could in principle be any seasonal model. For now by default it is a double exponential model,

$g_{k,t}(r) = \alpha(t,k)\exp(-r/\theta_1(t,k))/\theta_1(t,k) + (1-\alpha(t,k))\exp(-r/\theta_2(t,k))/\theta_2(t,k).$

source
StochasticWeatherGenerators.fit_TNFunction
fit_TN(df_full::DataFrame, 𝐃𝐞𝐠, T; kwargs...)

Fit the variable TN (daily minimum temperature). In fact it fits the difference ΔT = TX - TN to ensure a positive difference between TX and TN

source
StochasticWeatherGenerators.fit_AR1Function
fit_AR1(df_full::DataFrame, var, 𝐃𝐞𝐠, K = length(unique(df_full.z)), T = length(unique(n2t)))

Fit a Seasonal AR(1) model of period T and with K hidden states for the variable X of the DataFrame df_full. The hidden states must be given in a the column z of i.e. df_full.z. The correspondance between day of the year t and index in the time series n must be given in the column n2t i.e. df_full.n2t.

$X_{n+1} = \mu(t_n, z_n) + \phi(t_n, z_n) X_n + \sigma(t_n, z_n)\xi$

with $\xi \sim \mathcal{N}(0,1)$.

source

Climate indexes

StochasticWeatherGenerators.VCX3Function
VCX3(df; y_col, nb = 3)

Yearly Max of nb = 3 days sliding mean for y for every year. By default, y_col is the first column not with a Date type

using DataFrames, Dates, RollingFunctions
time_range = Date(1956):Day(1):Date(2019,12,31)
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
VCX3(df)
source
VCX3(y, idxs; nb = 3)

Yearly Max of nb = 3 days sliding mean for y. Here idxs can be a vector of vector (or range) corresponds to the index of every year.

using DataFrames, Dates, RollingFunctions
time_range = Date(1956):Day(1):Date(2019,12,31)
year_range = unique(year.(time_range))
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
idx_year = [findall(x-> year.(x) == m, df[:, :DATE]) for m in year_range]
VCX3(df.Temperature, idx_year)
source
StochasticWeatherGenerators.monthly_aggFunction
monthly_agg(y::AbstractArray, idxs)
using DataFrames, Dates
time_range = Date(1956):Day(1):Date(2019,12,31)
year_range = unique(year.(time_range))
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
monthly_agg(df, :Temperature) 
monthly_agg(df, :Temperature, mean) 
# or
idx_year = [findall(x-> year.(x) == m, df[:, :DATE]) for m in year_range]
idx_month = [findall(x-> month.(x) == m, df[:, :DATE]) for m in 1:12]
idx_all = [intersect(yea, mon) for yea in idx_year, mon in idx_month]
monthly_agg(df.Temperature, idx_all)
source
StochasticWeatherGenerators.corTailFunction
corTail(x::AbstractMatrix, q = 0.95)

Compute the (symmetric averaged) tail index matrix M of a vector x, i.e. M[i, j] = (ℙ(x[:,j] > Fxⱼ(q) ∣ x[:,i] > Fxᵢ(q)) + ℙ(x[:,i] > Fxᵢ(q) ∣ x[:,j] > Fxⱼ(q)))/2 where Fx(q) is the CDF of x. Note it uses the same convention as cor function i.e. observations in rows and features in column.

source

Simulations

StochasticWeatherGenerators.rand_RRFunction
rand_RR(mixs::AbstractArray{<:MixtureModel}, n2t::AbstractVector, z::AbstractVector, y::AbstractMatrix, Σk::AbstractArray)

Generate a (nonhomegenous) sequence of length length(n2t) of rain amounts conditionally to a given dry/wet matrix y and (hidden) state sequence z. Univariate distribution are given by mixs while correlations are given by covariance matrix Σk.

source
StochasticWeatherGenerators.rand_condFunction
rand_cond(ϵ, z, θ_uni, θ_cor, n2t, T)

Generate a random variable conditionally to another one assuming a Gaussian copula dependance with correlation ρₜ(t / T, θ_cor) (depending on the day of the year). ϵ is assumed Normal(0,1).

source

Correlation utilities

For temperature

StochasticWeatherGenerators.cov_ar1Function
cov_ar1(dfs::AbstractArray{<:DataFrame}, ar1s, var, K = length(unique(dfs[1].z)))

Fit the covariance matrix of the residual ϵ of several AR(1) models ar1s. One matrix is fitted per hidden state. The hidden state z must be given in df.z. Note that we consider constant in time the covariance matrices.

source

For rain

StochasticWeatherGenerators.cor_RRFunction
cor_RR(dfs::AbstractArray{<:DataFrame}[, K]; cor_method=Σ_Spearman2Pearson, force_PosDef = true)

Compute the (strictly positive) rain pair correlations cor(Rs₁ > 0, Rs₂ > 0) between each pair of stations s₁, s₂ for each hidden state Z = k.

Input: a array dfs of df::DataFrame of length S (number of station) where each df have :DATE, :RR, :z (same :z for each df).

Output: K correlation matrix of size S×S

Options:

  • force_PosDef will enforce Positive Definite matrix with NearestCorrelationMatrix.jl.
  • cor_method: typically Σ_Spearman2Pearson or Σ_Kendall2Pearson
  • impute_missing: if nothing, missing will be outputted when two stations do not have at least two rain days in common. Otherwise the value impute_missing will be set.
ΣRR = cor_RR(data_stations, K)
source
StochasticWeatherGenerators.cov_RRFunction
cov_RR(dfs::AbstractArray{<:DataFrame}[, K]; cor_method=Σ_Spearman2Pearson, force_PosDef = true)

Compute the (strictly positive) rain pair covariance cov(Rs₁ > 0, Rs₂ > 0) between each pair of stations s₁, s₂ for each hidden state Z = k.

Input: a array dfs of df::DataFrame of length S (number of station) where each df have :DATE, :RR, :z (same :z for each df).

Output: K covariance matrix of size S×S

Options:

  • force_PosDef will enforce Positive Definite matrix with NearestCorrelationMatrix.jl.
  • cor_method: typically Σ_Spearman2Pearson or Σ_Kendall2Pearson
  • impute_missing: if nothing, missing will be outputted when two stations do not have at least two rain days in common. Otherwise the value impute_missing will be set.
ΣRR = cov_RR(data_stations, K)
source

Map utilities

StochasticWeatherGenerators.dms_to_ddFunction
dms_to_dd(l)

Convert l in Degrees Minutes Seconds to Decimal Degrees. Inputs are strings of the form

  • LAT : Latitude in degrees:minutes:seconds (+: North, -: South)
  • LON : Longitude in degrees:minutes:seconds (+: East, -: West)
source

Data manipulation

StochasticWeatherGenerators.collect_data_ECAFunction
collect_data_ECA(STAID::Integer, path::String, var::String="RR"; skipto=19, header = 18)

path gives the path where all data files are stored in a vector

source
collect_data_ECA(STAID, date_start::Date, date_end::Date, path::String, var::String="RR"; portion_valid_data=1, skipto=19, header = 18, return_nothing = true)
  • path gives the path where all data files are stored in a vector
  • Filter the DataFrame s.t. date_start ≤ :DATE ≤ date_end
  • var = "RR", "TX" etc.
  • portion_valid_data is the portion of valid data we are ok with. If we don't want any missing, fix it to 1.
  • skipto and header for csv files with meta informations/comments at the beginning of files. See CSV.jl.
  • return_nothing if true it will return nothing is the file does not exists or does not have enough valid data.
source
StochasticWeatherGenerators.collect_data_INRAEFunction
collect_data_INRAE(station_path::String; show_warning=false, impute_missing=[])

Read from a file an INRAE formatted weather station data and transform it to match ECA standard naming conventions.

  • impute_missing expects a vector of column name(s) where to impute missing with Impute.Interpolate e.g. impute_missing=[:TX].
  • show_warning in case of missing data. false for no column, true for all variables columns and for selected columns e.g. show_warning = [:TX].
source
StochasticWeatherGenerators.collect_data_MeteoFranceFunction
collect_data_MeteoFrance(STAID; show_warning=false, impute_missing=[], period="1950-2021", variables = "all")

Given a STAID (station ID given by Météo France), it returns a DataFrame with data in period and for the variables.

  • STAID can be an integer or string.
  • Option for period are "1846-1949", "1950-2021", "2022-2023"
  • Option for variables are all, "RR-T-Wind", "others"
  • impute_missing expects a vector of column name(s) where to impute missing with Impute.Interpolate e.g. impute_missing=[:TX].
  • show_warning in case of missing data. false for no column, true for all variables columns and for selected columns e.g. show_warning = [:TX].

The data is available through the French Data.gouv.fr website api. Data may be updated without notice. See the following two links to get informations on the "RR-T-Wind" and "others" variables (in French)

  • https://object.files.data.gouv.fr/meteofrance/data/synchroftp/BASE/QUOT/QdescriptifchampsRR-T-Vent.csv
  • https://object.files.data.gouv.fr/meteofrance/data/synchroftp/BASE/QUOT/Qdescriptifchampsautres-parametres.csv

Or the the SICLIMA website with information (in French) about computation and conversion for some weather variables/index.

source
StochasticWeatherGenerators.clean_dataFunction
clean_data(df::DataFrame; show_warning=false, impute_missing=[])

Impute missing and show warning for missings. It assumes that the first two columns are not numeric.

  • impute_missing expects a vector of column name(s) where to impute missing with Impute.Interpolate e.g. impute_missing=[:TX].
  • show_warning in case of missing data. false for no column, true for all variables columns and for selected columns e.g. show_warning = [:TX].
source
StochasticWeatherGenerators.select_in_range_dfFunction
select_in_range_df(datas, start_Date, interval_Date, [portion])

Select station with some data availability in dates and quality (portion of valid data). Input is a vector (array) of DataFrame (one for each station for example) or a Dict of DataFrame. If 0 < portion ≤ 1 is specified, it will authorize some portion of data to be missing.

source
StochasticWeatherGenerators.shortnameFunction
shortname(name::String)

Experimental function that returns only the most relevant part of a station name.

long_name = "TOULOUSE-BLAGNAC"
shortname(long_name) # "TOULOUSE"
source

Generic utilities