API
Index
ECAD.GTS_FALLBACK_PARTICIPANT_IDECAD.StationDataECAD.VariableDataECAD.all_variablesECAD.camelcase_to_wordsECAD.camelcase_to_words!ECAD.canonical_NAMEECAD.canonical_nameECAD.dataset_zipECAD.detect_header_rowECAD.dms2arcsecECAD.download_progressECAD.from_nameECAD.human_bytesECAD.intersect_stationsECAD.is_header_rowECAD.load_elementsECAD.load_observationsECAD.load_sourcesECAD.load_stationsECAD.longnameECAD.normalize_embedded_commasECAD.patch_first_digit!ECAD.patch_substring!ECAD.pretty_nameECAD.raw_observationsECAD.raw_stationECAD.repair_source_ids!ECAD.resolve_invalid_source_idsECAD.resolve_participant_idECAD.station_idsECAD.variableECAD.zipcontentECAD.zipfileECAD.@defvariable
Public API
ECAD.StationData — Type
StationData(
var_data::Union{VariableData, AbstractVector{<:VariableData}},
station_id::Integer;
include_sources = false
) -> StationDataA struct representing all the data available for a given station, including its metadata and observations for multiple variables.
Fields
id::Int: The station ID. Example:123.name::String: The name of the station. Example: "VAEXJOElatitude::Int: The latitude of the station in arcseconds. Example:1234567for 34.2919°.longitude::Int: The longitude of the station in arcseconds. Example:-1234567for -34.2919°.elevation::Int: The elevation of the station in meters. Example:100for 100 meters above sea level.variables::Vector{VariableData}: The list of variables for which observations are included in this station data.observations::DataFrame: A data frame containing the observations for this station, with one row per date and variable. The columns are:station_id::Int: The station ID. Example:123.date::Date: The date of the observation. Example:Date(1950, 1, 1).- For each variable, there are three columns:
variable_value: The value of the observation for this variable. Example:150for a temperature of 15.0°C if the unit is 0.1variable_quality: The quality flag of the observation for this variable. Is one of "valid", "suspect" or "missing". Example: "valid".variable_element_id: The element ID of the observation for this variable. Example:TX1for the first element of the :tx (temperature max) variable.
ECAD.VariableData — Type
VariableData(variable::Variable, filepath::String; memory_map = true)A variable with its associated data. If you don't know what memory_map means, keep it to true.
Fields
variable::Variable: The variable this data corresponds to.filepath::String: The path to the zip file containing the data for this variable.content::ZipReader{<:AbstractVector{UInt8}}: AZipReaderinstance for reading the contents of the zip file.
Accessor methods are provided for convenience:
variable(var::VariableData): Get the variable.zipfile(var::VariableData): Get the path to the zip file.zipcontent(var::VariableData): Get theZipReaderfor the zip file.ZipArchives.zip_names(var::VariableData): Get the list of file names in the zip archive.ZipArchives.zip_readentry(var::VariableData, args...; kwargs...): Read a specific entry from the zip archive.
To load specific data, use the provided functions:
load_sources(var::VariableData): Load the sources data frame from the zip file.load_stations(var::VariableData): Load the stations data frame from the zip file.load_elements(var::VariableData): Load the elements data frame from the local elements file.load_elements(var::VariableData, file): Load the elements data frame from a specified file.load_observations(var::VariableData, station_id): Load the observations data frame for a specific station ID from the zip file.
ECAD.canonical_name — Method
canonical_name(var::Variable) -> SymbolReturn the canonical name of a variable as a symbol. The canonical name is the primary identifier for a variable, and all aliases for that variable will map to this canonical name.
Example
julia> canonical_name(CloudCover())
:ccECAD.dataset_zip — Method
dataset_zip(var)Get the path to the zip file for the given variable. The variable can be specified using any of its aliases, e.g. :tx, :temperature_max, or "temperature_max".
ECAD.intersect_stations — Method
intersect_stations(vars::AbstractVector{<:VariableData}) -> DataFrameFind all stations that are present across all provided variables, returning their shared metadata.
This is useful when you want to build a StationData for multiple variables and need to know which station IDs are valid for all of them — i.e. which stations have observations for every variable in vars.
Returns a DataFrame with one row per common station, with columns:
id::Int: The station ID.name::String: The name of the station.country_code::String: The ISO country code.latitude_arcsec::Int: The latitude in arcseconds.longitude_arcsec::Int: The longitude in arcseconds.height_meter::Int: The elevation in meters above sea level.
Throws
ArgumentErrorifvarsis empty.
Example
vars = VariableData.([:tg, :tx, :tn])
common = intersect_stations(vars)
station_data = StationData.(Ref(vars), common.id)ECAD.load_elements — Method
load_elements() -> DataFrame
load_elements(var::VariableData, file = nothing) -> DataFrameLoad the elements for the given variable, or everything. Elements are variants of the same variable that may have different units or observation methods. For example, for :tg (temperature mean), a variant simply compute the mean of the daily max and min, while another method compute the mean of each hourly observation.
Columns
element_id::Int: The element ID. Example:TX1for the first element of the :tx (temperature max) variable.description::String: A description of the element. Example: "Daily maximum temperature calculated from the maximum of the daily observations".unit::String: The unit of the element. Example: "0.1°C".variable_id::String: The canonical name of the variable this element belongs to. Example: "TX" for :tx (temperature max).variable_name::String: A human-readable name of the variable this element belongs to. Example: "MAX TEMPERATURE" for :tx (temperature max).
ECAD.load_observations — Method
load_observations(var::VariableData, station_id::Integer) -> DataFrameLoad the observations for the given variable and station ID from the zip file. The station ID must be one of the IDs returned by station_ids(var).
A warning is emitted if there are multiple elements in the observations, meaning that the observations may have different units / observation methods. In that case, the elements data frame returned by load_elements(var) should be consulted to see what each element ID means.
Columns
station_id::Int: The station ID. Example:123.source_id::Int: The source ID associated with this observation. Example:1. Seeload_sourcesfor more info on sources.element_id::Int: The element ID associated with this observation. Example:TX1for the first element of the :tx (temperature max) variable. Seeload_elementsfor more info on elements.date::Date: The date of the observation. Example:Date(1950, 1, 1).value::Int64?: The value of the observation, ormissingif not available. The unit depends on the element. See theunitcolumn in the data frame returned byload_elementsfor more info on the unit of each element.quality::String: The quality flag of the observation. Is one of "valid", "suspect" or "missing".
ECAD.load_sources — Method
load_sources(var::VariableData) -> DataFrameLoad the sources data frame from the zip file for the given variable.
Columns
id::Int: The source ID. Example:1.name::String: The name of the source. Example: "VAEXJOE"station_id::Int: The station ID associated with this source. Example:123.start_date::Data: The first date of observations from this source. Example:Date(1950, 1, 1).end_date::Date: The last date of observations from this source. Example:Date(2020, 12, 31).country_code::String: The ISO country code of the source. Example: "SE" for Sweden.longitude_arcsec::Int: The longitude of the source in arcseconds.latitude_arcsec::Int: The latitude of the source in arcseconds.height_meter::Int: The height of the source in meters.participant_id::Int: The ID of the participant that provided this source. Example:42.participant_name::String: The name of the participant that provided this source. Example: ""Marcus Flarup"element_id::Int: The ID of the element observed by this source. Seeload_elementsfor more info.
ECAD.load_stations — Method
load_stations(var::VariableData) -> DataFrameLoad the stations data frame from the zip file for the given variable.
Columns
id::Int: The station ID. Example:123.name::String: The name of the station. Example: "VAEXJOE".country_code::String: The ISO country code of the station. Example: "SE" for Sweden.longitude_arcsec::Int: The longitude of the station in arcseconds.latitude_arcsec::Int: The latitude of the station in arcseconds.height_meter::Int: The height of the station in meters.
ECAD.longname — Method
longname(var::Variable) -> StringReturn a human-readable long name for the variable, suitable for display in plots and tables. By default, this is generated by converting the variable type name from CamelCase to a underscore-separated lowercase string. For example, CloudCover would become cloud_cover.
You can override this by providing a custom long name when defining the variable type.
Example
julia> longname(CloudCover())
:cloud_coverECAD.pretty_name — Method
pretty_name(var::Variable) -> StringReturn a pretty name for the variable, which can be used in contexts where a more concise or stylized name is desired. By default, this is generated by converting the variable type name from CamelCase to a space-separated lowercase string. For example, CloudCover would become cloud cover.
You can override this by providing a custom pretty name when defining the variable type.
Example
julia> pretty_name(CloudCover())
"cloud cover"ECAD.station_ids — Method
station_ids(var::VariableData) -> Vector{Int}The list of station IDs available in the zip file for the given variable.
ECAD.variable — Method
variable(var::VariableData) -> VariableThe variable associated with the given VariableData.
ECAD.zipcontent — Method
zipcontent(var::VariableData) -> ZipReaderThe ZipReader instance for the zip file associated with the given VariableData.
ECAD.zipfile — Method
zipfile(var::VariableData) -> StringThe path to the zip file associated with the given VariableData.
ECAD.@defvariable — Macro
@defvariable TypeName canonical_nameDefine a Variable subtype and automatically register canonical_name and from_name methods for it.
Private API
Click to see
ECAD.GTS_FALLBACK_PARTICIPANT_ID — Constant
const GTS_FALLBACK_PARTICIPANT_ID = Int(typemax(Int32))Some GTS sources are missing participant IDs but have a consistent name ("Synoptical message from GTS"). This constant is used as a fallback participant ID for those sources to allow them to be included in the dataset without losing the information that they are GTS sources.
ECAD.all_variables — Method
all_variables() -> Vector{Variable}Return a vector of instances of all defined Variable subtypes.
ECAD.camelcase_to_words — Function
camelcase_to_words(s::String, sep=' ') -> StringConvert a CamelCase string s to a sep-separated lowercase string. Returns the converted string.
Example
julia> camelcase_to_words("CamelCaseToWords")
"camel case to words"
julia> camelcase_to_words("CamelCaseToWords", '_')
"camel_case_to_words"See also: camelcase_to_words! for an in-place version that writes to an IO stream.
ECAD.camelcase_to_words! — Function
camelcase_to_words(io::IO, s::AbstractString)Convert a CamelCase string s to a space-separated lowercase string, writing the result to the provided IO stream io. Modifies the input string in-place and returns the IO stream.
Example
julia> io = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)
julia> camelcase_to_words!(io, "CamelCaseToWords", ' ')
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=19, maxsize=Inf, ptr=20, mark=-1)
julia> String(take!(io))
"camel case to words"See also: camelcase_to_words for a version that returns a new string instead of modifying in-place.
ECAD.canonical_NAME — Method
canonical_NAME(var::Variable) -> StringReturn the canonical name of a variable as an uppercase string. This is useful for matching variable names in datasets that use uppercase naming conventions.
Example
julia> canonical_NAME(CloudCover())
"CC"ECAD.detect_header_row — Method
detect_header_row(io::IO) -> IntDetects the line number of the header row in a CSV file by reading through the lines of the provided IO stream and checking each line with is_header_row. If a header row is found, it returns the line number. If no header row is found after reading through the entire file, it throws an ArgumentError.
ECAD.dms2arcsec — Method
dms2arcsec(dms::AbstractString) -> IntConverts a DMS (Degrees, Minutes, Seconds) string to total arcseconds as an integer. Divide the result by 3600 to get decimal degrees.
ECAD.download_progress — Method
download_progress(remote_filepath, local_directory_path)Download a file from remote_filepath to local_directory_path while displaying a progress bar.
ECAD.from_name — Method
from_name(x::Union{Symbol, String, Val}) -> VariableConvert a canonical variable name to its corresponding Variable type instance.
Example
julia> from_name(:cc)
CloudCover()ECAD.human_bytes — Method
human_bytes(bytes::Integer) -> StringConvert a byte count to a human-readable string with appropriate units (B, KB, MB, GB, TB, PB).
Example
julia> human_bytes(123)
"123.00 B"
julia> human_bytes(123456)
"120.56 KB"
julia> human_bytes(123456789)
"117.74 MB"ECAD.is_header_row — Method
is_header_row(line) -> BoolChecks if a given line from the CSV file is likely to be the header row by verifying that it contains at least two comma-separated parts and that all parts are uppercase (ignoring whitespace).
ECAD.normalize_embedded_commas — Method
normalize_embedded_commas(io::IO, header_line, col_name) -> IOBufferStreams io into an IOBuffer, quoting any unquoted commas inside the field col_name.
Arguments
io: AnIOstream containing the CSV data.header_line: The line number of the header row in the CSV data.col_name: The name of the column to check for embedded commas.
Returns
- A new
IOBuffercontaining the normalized CSV data with embedded commas incol_nameproperly quoted.
ECAD.patch_first_digit! — Method
patch_first_digit!(fixes::Dict, bad_sources, available_sources)A fix for resolve_invalid_source_ids that attempts to match any bad_sources that match an available_sources after removing the first character (e.g. if the bad source is "12345" and there is an available source "22345").
ECAD.patch_substring! — Method
patch_substring!(fixes::Dict, bad_sources, available_sources)A fix for resolve_invalid_source_ids that attempts to match any bad_sources that are substrings of any available_sources.
ECAD.raw_observations — Method
raw_observations(observations::DataFrame, var_data::VariableData; include_sources = false) -> DataFrameExtract the raw observations for a given station ID from the observations data frame and variable data, and rename the columns to include the variable name.
The columns are renamed as follows:
value->variable_valuequality->variable_qualityelement_id->variable_element_idsource_id->variable_source_id(only ifinclude_sources = true)
ECAD.raw_station — Method
raw_station(observations::DataFrame, var_data::VariableData, station_id::Integer) -> DataFrameRowExtract the raw station metadata for a given station ID from the observations data frame and variable data. This is used internally to ensure that the station metadata is consistent across multiple variables when building a StationData for multiple variables.
Returns
A DataFrameRow with the following attributes:
id::Int: The station ID. Example:123.name::String: The name of the station. Example: "VAEXJOEcountry_code::String: The ISO country code of the station. Example: "SE" for Sweden.latitude_arcsec::Int: The latitude of the station in arcseconds. Example:1234567for 34.2919°.longitude_arcsec::Int: The longitude of the station in arcseconds. Example:-1234567for -34.2919°.height_meter::Int: The height of the station in meters. Example:100for 100 meters above sea level.
ECAD.repair_source_ids! — Method
repair_bad_source_ids!(observation_df, sources_df)Checks if:
- There is only one unique
station_idinobservation_df. source_idvalues inobservation_dfmatch anyidinsources_dffor the
corresponding station_id.
If there are source_id values in observation_df that do not match any id in sources_df for the same station_id, the function attempts to repair them using a series of heuristics defined in resolve_invalid_source_ids.
Arguments
observation_df: A DataFrame containing the observations, with at leaststation_idandsource_idcolumns.sources_df: A DataFrame containing the sources, with at leastidandstation_idcolumns.
Returns
trueif allsource_idvalues inobservation_dfare valid or were successfully repaired.falseif there are still invalidsource_idvalues after attempting repairs, or if there are multiplestation_idvalues inobservation_df. Also logs warnings in these cases.
ECAD.resolve_invalid_source_ids — Method
resolve_invalid_source_ids(obs_sources, available_sources; fix_funcs!)Given a list of obs_sources and available_sources, attempts to find matches for any obs_sources that are not in available_sources using a series of provided fixing functions (fix_funcs!).
Arguments
obs_sources: A collection of source IDs from the observations that need to be checked.available_sources: A collection of valid source IDs that can be matched against.fix_funcs!: An optional collection of functions that implement heuristics to find matches for bad source IDs. Each function should have the signaturefix!(fixes::Dict, bad_sources, available_sources)and should update thefixesdictionary fixes[badsource] = matchedavailable_source for any matches it finds.
Returns
fixes: A dictionary mapping anyobs_sourcesthat were successfully matched to their correspondingavailable_sources.bad_sources: A collection of anyobs_sourcesthat could not be matched to anyavailable_sourcesafter all fixing functions were applied.
ECAD.resolve_participant_id — Method
Resolve the GTS participant ID for rows whose participant is missing but name matches.