📅 Weather Data

Several (and more coming) weather station format can be read and transformed to DataFrame.

ECA dataset

From the European Climate Assessment & Dataset project at this link for zip of all stations per variables and at this link for custom manual query. I asked them about an API to extract directly a specific file automatically, but they answer it is not currently available. I tried unzip-http but could not get it working with ECA website[1]

using StochasticWeatherGenerators, DataFrames, Dates
collect_data_ECA(33, Date(1956), Date(2019, 12, 31), "https://raw.githubusercontent.com/dmetivie/StochasticWeatherGenerators.jl/master/weather_files/ECA_blend_rr/RR_", portion_valid_data=1, skipto=22, header=21, url=true)[1:10,:]
10×5 DataFrame
RowSTAIDSOUIDDATERRQ_RR
Int64Int64DateInt64Int64
1331051956-01-01230
2331051956-01-0210
3331051956-01-0300
4331051956-01-0400
5331051956-01-0500
6331051956-01-06210
7331051956-01-0730
8331051956-01-08250
9331051956-01-09200
10331051956-01-1000
StochasticWeatherGenerators.collect_data_ECAFunction
collect_data_ECA(STAID::Integer, path::String, var::String="RR"; skipto=19, header = 18)

path gives the path where all data files are stored in a vector

source
collect_data_ECA(STAID, date_start::Date, date_end::Date, path::String, var::String="RR"; portion_valid_data=1, skipto=19, header = 18, return_nothing = true)
  • path gives the path where all data files are stored in a vector
  • Filter the DataFrame s.t. date_start ≤ :DATE ≤ date_end
  • var = "RR", "TX" etc.
  • portion_valid_data is the portion of valid data we are ok with. If we don't want any missing, fix it to 1.
  • skipto and header for csv files with meta informations/comments at the beginning of files. See CSV.jl.
  • return_nothing if true it will return nothing is the file does not exists or does not have enough valid data.
source

Météo France

Météo France do have a version of this data and it is accessible through an API on the website Data.Gouv.fr. This package provides a simple command to extract the data of one station (given its STAtionID) from the API.

collect_data_MeteoFrance(34154001)[1:10,:] # Montpellier Airport
10×59 DataFrame
Row__idSTAIDSTANAMELATLONALTIDATERRQRRTNQTNHTNQHTNTXQTXHTXQHTXTMQTMTNTXMQTNTXMTAMPLIQTAMPLITNSOLQTNSOLTN50QTN50DGQDGFFMQFFMFF2MQFF2MFXYQFXYDXYQDXYHXYQHXYFXIQFXIDXIQDXIHXIQHXIFXI2QFXI2DXI2QDXI2HXI2QHXI2FXI3SQFXI3SDXI3SQDXI3SHXI3SQHXI3SDRRQDRR
Int64Int64String31Float64Float64Int64DateFloat64Int64Float64Int64Int64Int64Float64Int64Int64Int64Float64String1Float64Int64Float64Int64Float64Int64Float64Int64Int64?Int64?Float64String1MissingMissingFloat64String1Int64Int64Int64Int64Float64Int64Int64Int64Int64Int64MissingMissingMissingMissingMissingMissingFloat64?String1?MissingMissingInt64?Int64?Int64?Int64?
1568834154001MONTPELLIER-AEROPORT43.57623.9646712024-01-010.018.81636915.511351910.8t12.216.715.897.79094.5tmissingmissing9.6t290111913.9129011009missingmissingmissingmissingmissingmissing13.0tmissingmissing100909
2568934154001MONTPELLIER-AEROPORT43.57623.9646712024-01-020.012.61323912.31135897.7t7.519.71-0.791.09093.1tmissingmissing5.2t101172497.41360117249missingmissingmissingmissingmissingmissing7.2tmissingmissing1724909
3569034154001MONTPELLIER-AEROPORT43.57623.9646712024-01-030.014.01453917.911440911.5t11.0113.911.091.99093.7tmissingmissing10.3t28092031914.91290920189missingmissingmissingmissingmissingmissing13.9tmissingmissing2029909
4569134154001MONTPELLIER-AEROPORT43.57623.9646712024-01-040.018.51710917.011158912.7t12.818.516.096.99093.9tmissingmissing9.3t290120912.01300139missingmissingmissingmissingmissingmissing11.1tmissingmissing109669
5569234154001MONTPELLIER-AEROPORT43.57623.9646712024-01-050.0110.31109914.711447910.8t12.514.417.999.19093.8tmissingmissing7.1t101205910.612012259missingmissingmissingmissingmissingmissing10.1tmissingmissing225909
6569334154001MONTPELLIER-AEROPORT43.57623.9646712024-01-060.017.11659913.711220910.2t10.416.613.696.09097.4tmissingmissing12.2t32011254917.61310112489missingmissingmissingmissingmissingmissing16.0tmissingmissing1137909
7569434154001MONTPELLIER-AEROPORT43.57623.9646712024-01-070.017.41630911.41125498.6t9.414.015.596.49099.6tmissingmissing15.2t32011218921.71320112149missingmissingmissingmissingmissingmissing19.8tmissingmissing1213909
8569534154001MONTPELLIER-AEROPORT43.57623.9646712024-01-080.012.91180098.81115395.5t5.915.910.591.593195.4tmissingmissing9.5t33011017913.91330110349missingmissingmissingmissingmissingmissing12.9tmissingmissing1113909
9569634154001MONTPELLIER-AEROPORT43.57623.9646712024-01-090.01-4.1141699.51132491.8t2.7113.61-8.99-6.5972292.0tmissingmissing3.5t3601184495.51350111049missingmissingmissingmissingmissingmissing5.1tmissingmissing1110909
10569734154001MONTPELLIER-AEROPORT43.57623.9646712024-01-1044.91-2.8121098.61144894.2t2.9111.41-7.49-5.3930294.5tmissingmissing8.2t7091929913.1170919259missingmissingmissingmissingmissingmissing11.3tmissingmissing1925910799
Warning

As it is rather new, this DataGov/MeteoFrance API may change in the future making this function not working anymore. It is currently not fully working. One would have to call the DataGov API directly.

StochasticWeatherGenerators.collect_data_MeteoFranceFunction
collect_data_MeteoFrance(STAID; show_warning=false, impute_missing=[], period="1950-2021", variables = "all")

Given a STAID (station ID given by Météo France), it returns a DataFrame with data in period and for the variables.

  • STAID can be an integer or string.
  • Option for period are "1846-1949", "1950-2021", "2022-2023"
  • Option for variables are all, "RR-T-Wind", "others"
  • impute_missing expects a vector of column name(s) where to impute missing with Impute.Interpolate e.g. impute_missing=[:TX].
  • show_warning in case of missing data. false for no column, true for all variables columns and for selected columns e.g. show_warning = [:TX].

The data is available through the French Data.gouv.fr website api. Data may be updated without notice. See the following two links to get informations on the "RR-T-Wind" and "others" variables (in French)

  • https://object.files.data.gouv.fr/meteofrance/data/synchroftp/BASE/QUOT/QdescriptifchampsRR-T-Vent.csv
  • https://object.files.data.gouv.fr/meteofrance/data/synchroftp/BASE/QUOT/Qdescriptifchampsautres-parametres.csv

Or the the SICLIMA website with information (in French) about computation and conversion for some weather variables/index.

source
StochasticWeatherGenerators.download_data_MeteoFranceFunction
download_data_MeteoFrance(STAID, period = "2024-2025", variables = "all")

Function not really working anymore as the API changed in 2024.

  • Option for period are "1846-1949", "1950-2021", "2022-2023"
  • Option for variables are all, "RR-T-Wind", "others"

The data is available through the French Data.gouv.fr website api. Data may be updated without notice. In particular the path to the data may change.

source

INRAE

The INRAE CLIMATIK platform (Delannoy et al., 2022) (https://agroclim.inrae.fr/climatik/, in French) managed by the AgroClim laboratory of Avignon, France has weather stations. However, their API is not open access.

StochasticWeatherGenerators.collect_data_INRAEFunction
collect_data_INRAE(station_path::String; show_warning=false, impute_missing=[])

Read from a file an INRAE formatted weather station data and transform it to match ECA standard naming conventions.

  • impute_missing expects a vector of column name(s) where to impute missing with Impute.Interpolate e.g. impute_missing=[:TX].
  • show_warning in case of missing data. false for no column, true for all variables columns and for selected columns e.g. show_warning = [:TX].
source

Others

Data manipulation

StochasticWeatherGenerators.clean_dataFunction
clean_data(df::DataFrame; show_warning=false, impute_missing=[])

Impute missing and show warning for missings. It assumes that the first two columns are not numeric.

  • impute_missing expects a vector of column name(s) where to impute missing with Impute.Interpolate e.g. impute_missing=[:TX].
  • show_warning in case of missing data. false for no column, true for all variables columns and for selected columns e.g. show_warning = [:TX].
source
StochasticWeatherGenerators.select_in_range_dfFunction
select_in_range_df(datas, start_Date, interval_Date, [portion])

Select station with some data availability in dates and quality (portion of valid data). Input is a vector (array) of DataFrame (one for each station for example) or a Dict of DataFrame. If 0 < portion ≤ 1 is specified, it will authorize some portion of data to be missing.

source
StochasticWeatherGenerators.shortnameFunction
shortname(name::String)

Experimental function that returns only the most relevant part of a station name.

long_name = "TOULOUSE-BLAGNAC"
shortname(long_name) # "TOULOUSE"
source

References

  • Delannoy, D.; Maury, O. and Décome, J. (2022). CLIMATIK: système d’information pour les données du réseau agroclimatique INRAE.
  • 1I don't remember exactly in fact.