API
Fit function
SmoothPeriodicStatsModels.fit_mle_RO
— Functionfit_mle_RO(df::DataFrame, K, T, degree, local_order)
Given a DataFrame df
with known hidden states column z ∈ 1:K
. The rain occurrences of the new station are fitted conditionally to the hidden state. For local_order>0
the model is also autoregressive with its past.
StochasticWeatherGenerators.fit_mle_RR
— Functionfit_mle_RR(df::DataFrame, local_order, K = length(unique(df.z)); maxiter=5000, tol=2e-4, robust=true, silence=true, warm_start=true, display=:none, mix₀=mix_ini(length(unique(dayofyear_Leap.(df.DATE)))))
Fit the strictly positive rain amounts RR>0
distribution $g_{k,t}(r)$ w.r.t. to each hidden states k∈[1,K]
(provided in a column of df.z
). The fitted model could in principle be any seasonal model. For now by default it is a double exponential model,
$g_{k,t}(r) = \alpha(t,k)\exp(-r/\theta_1(t,k))/\theta_1(t,k) + (1-\alpha(t,k))\exp(-r/\theta_2(t,k))/\theta_2(t,k).$
StochasticWeatherGenerators.fit_TN
— Functionfit_TN(df_full::DataFrame, 𝐃𝐞𝐠, T; kwargs...)
Fit the variable TN
(daily minimum temperature). In fact it fits the difference ΔT = TX - TN
to ensure a positive difference between TX
and TN
StochasticWeatherGenerators.fit_AR1
— Functionfit_AR1(df_full::DataFrame, var, 𝐃𝐞𝐠, K = length(unique(df_full.z)), T = length(unique(n2t)))
Fit a Seasonal AR(1) model of period T
and with K
hidden states for the variable X
of the DataFrame df_full
. The hidden states must be given in a the column z
of i.e. df_full.z
. The correspondance between day of the year t
and index in the time series n
must be given in the column n2t
i.e. df_full.n2t
.
$X_{n+1} = \mu(t_n, z_n) + \phi(t_n, z_n) X_n + \sigma(t_n, z_n)\xi$
with $\xi \sim \mathcal{N}(0,1)$.
Climate indexes
StochasticWeatherGenerators.VCX3
— FunctionVCX3(df; y_col, nb = 3)
Yearly Max of nb = 3
days sliding mean for y
for every year. By default, y_col
is the first column not with a Date
type
using DataFrames, Dates, RollingFunctions
time_range = Date(1956):Day(1):Date(2019,12,31)
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
VCX3(df)
VCX3(y, idxs; nb = 3)
Yearly Max of nb = 3
days sliding mean for y
. Here idxs
can be a vector of vector (or range) corresponds to the index of every year.
using DataFrames, Dates, RollingFunctions
time_range = Date(1956):Day(1):Date(2019,12,31)
year_range = unique(year.(time_range))
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
idx_year = [findall(x-> year.(x) == m, df[:, :DATE]) for m in year_range]
VCX3(df.Temperature, idx_year)
StochasticWeatherGenerators.monthly_agg
— Functionmonthly_agg(y::AbstractArray, idxs)
using DataFrames, Dates
time_range = Date(1956):Day(1):Date(2019,12,31)
year_range = unique(year.(time_range))
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
monthly_agg(df, :Temperature)
monthly_agg(df, :Temperature, mean)
# or
idx_year = [findall(x-> year.(x) == m, df[:, :DATE]) for m in year_range]
idx_month = [findall(x-> month.(x) == m, df[:, :DATE]) for m in 1:12]
idx_all = [intersect(yea, mon) for yea in idx_year, mon in idx_month]
monthly_agg(df.Temperature, idx_all)
StochasticWeatherGenerators.corTail
— FunctioncorTail(x::AbstractMatrix, q = 0.95)
Compute the (symmetric averaged) tail index matrix M
of a vector x, i.e. M[i, j] = (ℙ(x[:,j] > Fxⱼ(q) ∣ x[:,i] > Fxᵢ(q)) + ℙ(x[:,i] > Fxᵢ(q) ∣ x[:,j] > Fxⱼ(q)))/2 where Fx(q) is the CDF of x. Note it uses the same convention as cor
function i.e. observations in rows and features in column.
StochasticWeatherGenerators.longuest_spell
— Functionlonguest_spell(y::AbstractArray; value=0)
Compute the length of the longuest consecutive sequence of value
in y
StochasticWeatherGenerators.pmf_spell
— Functionpmf_spell(y::AbstractVector, value)
Return the distribution of spells (consecutive sequence of with the same value) length of value
in y
Simulations
StochasticWeatherGenerators.rand_RR
— Functionrand_RR(mixs::AbstractArray{<:MixtureModel}, n2t::AbstractVector, z::AbstractVector, y::AbstractMatrix, Σk::AbstractArray)
Generate a (nonhomegenous) sequence of length length(n2t)
of rain amounts conditionally to a given dry/wet matrix y
and (hidden) state sequence z
. Univariate distribution are given by mixs
while correlations are given by covariance matrix Σk.
StochasticWeatherGenerators.rand_cond
— Functionrand_cond(ϵ, z, θ_uni, θ_cor, n2t, T)
Generate a random variable conditionally to another one assuming a Gaussian copula dependance with correlation ρₜ(t / T, θ_cor)
(depending on the day of the year). ϵ
is assumed Normal(0,1)
.
Correlation utilities
For temperature
StochasticWeatherGenerators.cor_groupby
— Functioncor_groupby(df::DataFrame, var1, var2, T::Integer; θ0 = [0, 0.0, 0.0])
Compute and fit the cor
between two var
with a smooth function for each z
.
StochasticWeatherGenerators.cor_groupbyTXTN
— Functioncor_groupbyTXTN(df::DataFrame, T::Integer; θ0 = [0, 0.0, 0.0])
Compute and fit the cor
between :TX
and :TX-:TN
with a smooth function for each z
.
StochasticWeatherGenerators.cov_ar1
— Functioncov_ar1(dfs::AbstractArray{<:DataFrame}, ar1s, var, K = length(unique(dfs[1].z)))
Fit the covariance matrix of the residual ϵ
of several AR(1) models ar1s
. One matrix is fitted per hidden state. The hidden state z
must be given in df.z
. Note that we consider constant in time the covariance matrices.
For rain
StochasticWeatherGenerators.cor_RR
— Functioncor_RR(dfs::AbstractArray{<:DataFrame}[, K]; cor_method=Σ_Spearman2Pearson, force_PosDef = true)
Compute the (strictly positive) rain pair correlations cor(Rs₁ > 0, Rs₂ > 0)
between each pair of stations s₁, s₂
for each hidden state Z = k
.
Input: a array dfs
of df::DataFrame
of length S
(number of station) where each df
have :DATE, :RR, :z (same :z for each df).
Output: K
correlation matrix of size S×S
Options:
force_PosDef
will enforce Positive Definite matrix with NearestCorrelationMatrix.jl.cor_method
: typicallyΣ_Spearman2Pearson
orΣ_Kendall2Pearson
impute_missing
: ifnothing
,missing
will be outputted when two stations do not have at least two rain days in common. Otherwise the valueimpute_missing
will be set.
ΣRR = cor_RR(data_stations, K)
StochasticWeatherGenerators.cov_RR
— Functioncov_RR(dfs::AbstractArray{<:DataFrame}[, K]; cor_method=Σ_Spearman2Pearson, force_PosDef = true)
Compute the (strictly positive) rain pair covariance cov(Rs₁ > 0, Rs₂ > 0)
between each pair of stations s₁, s₂
for each hidden state Z = k
.
Input: a array dfs
of df::DataFrame
of length S
(number of station) where each df
have :DATE, :RR, :z (same :z for each df).
Output: K
covariance matrix of size S×S
Options:
force_PosDef
will enforce Positive Definite matrix with NearestCorrelationMatrix.jl.cor_method
: typicallyΣ_Spearman2Pearson
orΣ_Kendall2Pearson
impute_missing
: ifnothing
,missing
will be outputted when two stations do not have at least two rain days in common. Otherwise the valueimpute_missing
will be set.
ΣRR = cov_RR(data_stations, K)
StochasticWeatherGenerators.Σ_Spearman2Pearson
— FunctionΣ_Spearman2Pearson(M::AbstractMatrix)
Compute the Pearson correlation coefficient i.e. the classic one from the Spearman correlation #TODO Add ref
StochasticWeatherGenerators.Σ_Kendall2Pearson
— FunctionΣ_Kendall2Pearson(M::AbstractMatrix)
Compute the Pearson correlation coefficient i.e. the classic one from the Kendall correlation #TODO Add ref
StochasticWeatherGenerators.joint_rain
— Functionjoint_rain(M::AbstractMatrix, j1::Integer, j2::Integer, r = 0)
Select all the rows of M
with values for (two) columns above some value r
.
Map utilities
StochasticWeatherGenerators.distance_x_to_y
— Functiondistance_x_to_y(station_x, station_y)
Distance in km between two stations. Does not take into account altitude. station
must have a field LAT
and LON
in Decimal Degree.
StochasticWeatherGenerators.dms_to_dd
— Functiondms_to_dd(l)
Convert l
in Degrees Minutes Seconds to Decimal Degrees. Inputs are strings of the form
- LAT : Latitude in degrees:minutes:seconds (+: North, -: South)
- LON : Longitude in degrees:minutes:seconds (+: East, -: West)
Data manipulation
StochasticWeatherGenerators.collect_data_ECA
— Functioncollect_data_ECA(STAID::Integer, path::String, var::String="RR"; skipto=19, header = 18)
path
gives the path where all data files are stored in a vector
collect_data_ECA(STAID, date_start::Date, date_end::Date, path::String, var::String="RR"; portion_valid_data=1, skipto=19, header = 18, return_nothing = true)
path
gives the path where all data files are stored in a vector- Filter the
DataFrame
s.t.date_start ≤ :DATE ≤ date_end
- var = "RR", "TX" etc.
portion_valid_data
is the portion of valid data we are ok with. If we don't want any missing, fix it to1
.skipto
andheader
forcsv
files with meta informations/comments at the beginning of files. SeeCSV.jl
.return_nothing
iftrue
it will returnnothing
is the file does not exists or does not have enough valid data.
StochasticWeatherGenerators.collect_data_INRAE
— Functioncollect_data_INRAE(station_path::String; show_warning=false, impute_missing=[])
Read from a file an INRAE formatted weather station data and transform it to match ECA standard naming conventions.
impute_missing
expects a vector of column name(s) where to impute missing withImpute.Interpolate
e.g.impute_missing=[:TX]
.show_warning
in case of missing data.false
for no column,true
for all variables columns and for selected columns e.g.show_warning = [:TX]
.
StochasticWeatherGenerators.collect_data_MeteoFrance
— Functioncollect_data_MeteoFrance(STAID; show_warning=false, impute_missing=[], period="1950-2021", variables = "all")
Given a STAID
(station ID given by Météo France), it returns a DataFrame
with data in period
and for the variables
.
STAID
can be an integer or string.- Option for
period
are "1846-1949", "1950-2021", "2022-2023" - Option for
variables
areall
, "RR-T-Wind", "others" impute_missing
expects a vector of column name(s) where to impute missing withImpute.Interpolate
e.g.impute_missing=[:TX]
.show_warning
in case of missing data.false
for no column,true
for all variables columns and for selected columns e.g.show_warning = [:TX]
.
The data is available through the French Data.gouv.fr website api. Data may be updated without notice. See the following two links to get informations on the "RR-T-Wind" and "others" variables (in French)
- https://object.files.data.gouv.fr/meteofrance/data/synchroftp/BASE/QUOT/QdescriptifchampsRR-T-Vent.csv
- https://object.files.data.gouv.fr/meteofrance/data/synchroftp/BASE/QUOT/Qdescriptifchampsautres-parametres.csv
Or the the SICLIMA website with information (in French) about computation and conversion for some weather variables/index.
StochasticWeatherGenerators.download_data_MeteoFrance
— Functiondownload_data_MeteoFrance(STAID, period = "1950-2021", variables = "all")
- Option for
period
are "1846-1949", "1950-2021", "2022-2023" - Option for
variables
areall
, "RR-T-Wind", "others"
StochasticWeatherGenerators.clean_data
— Functionclean_data(df::DataFrame; show_warning=false, impute_missing=[])
Impute missing and show warning for missings. It assumes that the first two columns are not numeric.
impute_missing
expects a vector of column name(s) where to impute missing withImpute.Interpolate
e.g.impute_missing=[:TX]
.show_warning
in case of missing data.false
for no column,true
for all variables columns and for selected columns e.g.show_warning = [:TX]
.
StochasticWeatherGenerators.select_in_range_df
— Functionselect_in_range_df(datas, start_Date, interval_Date, [portion])
Select station with some data availability in dates and quality (portion of valid data). Input is a vector
(array) of DataFrame
(one for each station for example) or a Dict
of DataFrame
. If 0 < portion ≤ 1
is specified, it will authorize some portion of data to be missing.
StochasticWeatherGenerators.shortname
— Functionshortname(name::String)
Experimental function that returns only the most relevant part of a station name.
long_name = "TOULOUSE-BLAGNAC"
shortname(long_name) # "TOULOUSE"
Generic utilities
StochasticWeatherGenerators.my_color
— Functionmy_color(k, K)
Convenience for plot colors pattern and hidden states to blue for k=1 (∼wetter) and orange for k=K (∼driest)