API

Index

Public API

ECAD.StationDataType
StationData(
    var_data::Union{VariableData, AbstractVector{<:VariableData}},
    station_id::Integer;
    include_sources = false
) -> StationData

A struct representing all the data available for a given station, including its metadata and observations for multiple variables.

Fields

  • id::Int: The station ID. Example: 123.
  • name::String: The name of the station. Example: "VAEXJOE
  • latitude::Int: The latitude of the station in arcseconds. Example: 1234567 for 34.2919°.
  • longitude::Int: The longitude of the station in arcseconds. Example: -1234567 for -34.2919°.
  • elevation::Int: The elevation of the station in meters. Example: 100 for 100 meters above sea level.
  • variables::Vector{VariableData}: The list of variables for which observations are included in this station data.
  • observations::DataFrame: A data frame containing the observations for this station, with one row per date and variable. The columns are:
    • station_id::Int: The station ID. Example: 123.
    • date::Date: The date of the observation. Example: Date(1950, 1, 1).
    • For each variable, there are three columns:
      • variable_value: The value of the observation for this variable. Example: 150 for a temperature of 15.0°C if the unit is 0.1
      • variable_quality: The quality flag of the observation for this variable. Is one of "valid", "suspect" or "missing". Example: "valid".
      • variable_element_id: The element ID of the observation for this variable. Example: TX1 for the first element of the :tx (temperature max) variable.
source
ECAD.VariableDataType
VariableData(variable::Variable, filepath::String; memory_map = true)

A variable with its associated data. If you don't know what memory_map means, keep it to true.

Fields

  • variable::Variable: The variable this data corresponds to.
  • filepath::String: The path to the zip file containing the data for this variable.
  • content::ZipReader{<:AbstractVector{UInt8}}: A ZipReader instance for reading the contents of the zip file.

Accessor methods are provided for convenience:

To load specific data, use the provided functions:

source
ECAD.canonical_nameMethod
canonical_name(var::Variable) -> Symbol

Return the canonical name of a variable as a symbol. The canonical name is the primary identifier for a variable, and all aliases for that variable will map to this canonical name.

Example

julia> canonical_name(CloudCover())
:cc
source
ECAD.dataset_zipMethod
dataset_zip(var)

Get the path to the zip file for the given variable. The variable can be specified using any of its aliases, e.g. :tx, :temperature_max, or "temperature_max".

source
ECAD.intersect_stationsMethod
intersect_stations(vars::AbstractVector{<:VariableData}) -> DataFrame

Find all stations that are present across all provided variables, returning their shared metadata.

This is useful when you want to build a StationData for multiple variables and need to know which station IDs are valid for all of them — i.e. which stations have observations for every variable in vars.

Returns a DataFrame with one row per common station, with columns:

  • id::Int: The station ID.
  • name::String: The name of the station.
  • country_code::String: The ISO country code.
  • latitude_arcsec::Int: The latitude in arcseconds.
  • longitude_arcsec::Int: The longitude in arcseconds.
  • height_meter::Int: The elevation in meters above sea level.

Throws

  • ArgumentError if vars is empty.

Example

vars = VariableData.([:tg, :tx, :tn])
common = intersect_stations(vars)
station_data = StationData.(Ref(vars), common.id)
source
ECAD.load_elementsMethod
load_elements() -> DataFrame
load_elements(var::VariableData, file = nothing) -> DataFrame

Load the elements for the given variable, or everything. Elements are variants of the same variable that may have different units or observation methods. For example, for :tg (temperature mean), a variant simply compute the mean of the daily max and min, while another method compute the mean of each hourly observation.

Columns

  • element_id::Int: The element ID. Example: TX1 for the first element of the :tx (temperature max) variable.
  • description::String: A description of the element. Example: "Daily maximum temperature calculated from the maximum of the daily observations".
  • unit::String: The unit of the element. Example: "0.1°C".
  • variable_id::String: The canonical name of the variable this element belongs to. Example: "TX" for :tx (temperature max).
  • variable_name::String: A human-readable name of the variable this element belongs to. Example: "MAX TEMPERATURE" for :tx (temperature max).
source
ECAD.load_observationsMethod
load_observations(var::VariableData, station_id::Integer) -> DataFrame

Load the observations for the given variable and station ID from the zip file. The station ID must be one of the IDs returned by station_ids(var).

A warning is emitted if there are multiple elements in the observations, meaning that the observations may have different units / observation methods. In that case, the elements data frame returned by load_elements(var) should be consulted to see what each element ID means.

Columns

  • station_id::Int: The station ID. Example: 123.
  • source_id::Int: The source ID associated with this observation. Example: 1. See load_sources for more info on sources.
  • element_id::Int: The element ID associated with this observation. Example: TX1 for the first element of the :tx (temperature max) variable. See load_elements for more info on elements.
  • date::Date: The date of the observation. Example: Date(1950, 1, 1).
  • value::Int64?: The value of the observation, or missing if not available. The unit depends on the element. See the unit column in the data frame returned by load_elements for more info on the unit of each element.
  • quality::String: The quality flag of the observation. Is one of "valid", "suspect" or "missing".
source
ECAD.load_sourcesMethod
load_sources(var::VariableData) -> DataFrame

Load the sources data frame from the zip file for the given variable.

Columns

  • id::Int: The source ID. Example: 1.
  • name::String: The name of the source. Example: "VAEXJOE"
  • station_id::Int: The station ID associated with this source. Example: 123.
  • start_date::Data: The first date of observations from this source. Example: Date(1950, 1, 1).
  • end_date::Date: The last date of observations from this source. Example: Date(2020, 12, 31).
  • country_code::String: The ISO country code of the source. Example: "SE" for Sweden.
  • longitude_arcsec::Int: The longitude of the source in arcseconds.
  • latitude_arcsec::Int: The latitude of the source in arcseconds.
  • height_meter::Int: The height of the source in meters.
  • participant_id::Int: The ID of the participant that provided this source. Example: 42.
  • participant_name::String: The name of the participant that provided this source. Example: ""Marcus Flarup"
  • element_id::Int: The ID of the element observed by this source. See load_elements for more info.
source
ECAD.load_stationsMethod
load_stations(var::VariableData) -> DataFrame

Load the stations data frame from the zip file for the given variable.

Columns

  • id::Int: The station ID. Example: 123.
  • name::String: The name of the station. Example: "VAEXJOE".
  • country_code::String: The ISO country code of the station. Example: "SE" for Sweden.
  • longitude_arcsec::Int: The longitude of the station in arcseconds.
  • latitude_arcsec::Int: The latitude of the station in arcseconds.
  • height_meter::Int: The height of the station in meters.
source
ECAD.longnameMethod
longname(var::Variable) -> String

Return a human-readable long name for the variable, suitable for display in plots and tables. By default, this is generated by converting the variable type name from CamelCase to a underscore-separated lowercase string. For example, CloudCover would become cloud_cover.

You can override this by providing a custom long name when defining the variable type.

Example

julia> longname(CloudCover())
:cloud_cover
source
ECAD.pretty_nameMethod
pretty_name(var::Variable) -> String

Return a pretty name for the variable, which can be used in contexts where a more concise or stylized name is desired. By default, this is generated by converting the variable type name from CamelCase to a space-separated lowercase string. For example, CloudCover would become cloud cover.

You can override this by providing a custom pretty name when defining the variable type.

Example

julia> pretty_name(CloudCover())
"cloud cover"
source
ECAD.station_idsMethod
station_ids(var::VariableData) -> Vector{Int}

The list of station IDs available in the zip file for the given variable.

source
ECAD.variableMethod
variable(var::VariableData) -> Variable

The variable associated with the given VariableData.

source
ECAD.zipcontentMethod
zipcontent(var::VariableData) -> ZipReader

The ZipReader instance for the zip file associated with the given VariableData.

source
ECAD.zipfileMethod
zipfile(var::VariableData) -> String

The path to the zip file associated with the given VariableData.

source
ECAD.@defvariableMacro
@defvariable TypeName canonical_name

Define a Variable subtype and automatically register canonical_name and from_name methods for it.

source

Private API

Click to see
ECAD.GTS_FALLBACK_PARTICIPANT_IDConstant
const GTS_FALLBACK_PARTICIPANT_ID = Int(typemax(Int32))

Some GTS sources are missing participant IDs but have a consistent name ("Synoptical message from GTS"). This constant is used as a fallback participant ID for those sources to allow them to be included in the dataset without losing the information that they are GTS sources.

source
ECAD.all_variablesMethod
all_variables() -> Vector{Variable}

Return a vector of instances of all defined Variable subtypes.

source
ECAD.camelcase_to_wordsFunction
camelcase_to_words(s::String, sep=' ') -> String

Convert a CamelCase string s to a sep-separated lowercase string. Returns the converted string.

Example

julia> camelcase_to_words("CamelCaseToWords")
"camel case to words"

julia> camelcase_to_words("CamelCaseToWords", '_')
"camel_case_to_words"

See also: camelcase_to_words! for an in-place version that writes to an IO stream.

source
ECAD.camelcase_to_words!Function
camelcase_to_words(io::IO, s::AbstractString)

Convert a CamelCase string s to a space-separated lowercase string, writing the result to the provided IO stream io. Modifies the input string in-place and returns the IO stream.

Example

julia> io = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia> camelcase_to_words!(io, "CamelCaseToWords", ' ')
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=19, maxsize=Inf, ptr=20, mark=-1)

julia> String(take!(io))
"camel case to words"

See also: camelcase_to_words for a version that returns a new string instead of modifying in-place.

source
ECAD.canonical_NAMEMethod
canonical_NAME(var::Variable) -> String

Return the canonical name of a variable as an uppercase string. This is useful for matching variable names in datasets that use uppercase naming conventions.

Example

julia> canonical_NAME(CloudCover())
"CC"
source
ECAD.detect_header_rowMethod
detect_header_row(io::IO) -> Int

Detects the line number of the header row in a CSV file by reading through the lines of the provided IO stream and checking each line with is_header_row. If a header row is found, it returns the line number. If no header row is found after reading through the entire file, it throws an ArgumentError.

source
ECAD.dms2arcsecMethod
dms2arcsec(dms::AbstractString) -> Int

Converts a DMS (Degrees, Minutes, Seconds) string to total arcseconds as an integer. Divide the result by 3600 to get decimal degrees.

source
ECAD.download_progressMethod
download_progress(remote_filepath, local_directory_path)

Download a file from remote_filepath to local_directory_path while displaying a progress bar.

source
ECAD.from_nameMethod
from_name(x::Union{Symbol, String, Val}) -> Variable

Convert a canonical variable name to its corresponding Variable type instance.

Example

julia> from_name(:cc)
CloudCover()
source
ECAD.human_bytesMethod
human_bytes(bytes::Integer) -> String

Convert a byte count to a human-readable string with appropriate units (B, KB, MB, GB, TB, PB).

Example

julia> human_bytes(123)
"123.00 B"

julia> human_bytes(123456)
"120.56 KB"

julia> human_bytes(123456789)
"117.74 MB"
source
ECAD.is_header_rowMethod
is_header_row(line) -> Bool

Checks if a given line from the CSV file is likely to be the header row by verifying that it contains at least two comma-separated parts and that all parts are uppercase (ignoring whitespace).

source
ECAD.normalize_embedded_commasMethod
normalize_embedded_commas(io::IO, header_line, col_name) -> IOBuffer

Streams io into an IOBuffer, quoting any unquoted commas inside the field col_name.

Arguments

  • io: An IO stream containing the CSV data.
  • header_line: The line number of the header row in the CSV data.
  • col_name: The name of the column to check for embedded commas.

Returns

  • A new IOBuffer containing the normalized CSV data with embedded commas in col_name properly quoted.
source
ECAD.patch_first_digit!Method
patch_first_digit!(fixes::Dict, bad_sources, available_sources)

A fix for resolve_invalid_source_ids that attempts to match any bad_sources that match an available_sources after removing the first character (e.g. if the bad source is "12345" and there is an available source "22345").

source
ECAD.raw_observationsMethod
raw_observations(observations::DataFrame, var_data::VariableData; include_sources = false) -> DataFrame

Extract the raw observations for a given station ID from the observations data frame and variable data, and rename the columns to include the variable name.

The columns are renamed as follows:

  • value -> variable_value
  • quality -> variable_quality
  • element_id -> variable_element_id
  • source_id -> variable_source_id (only if include_sources = true)
source
ECAD.raw_stationMethod
raw_station(observations::DataFrame, var_data::VariableData, station_id::Integer) -> DataFrameRow

Extract the raw station metadata for a given station ID from the observations data frame and variable data. This is used internally to ensure that the station metadata is consistent across multiple variables when building a StationData for multiple variables.

Returns

A DataFrameRow with the following attributes:

  • id::Int: The station ID. Example: 123.
  • name::String: The name of the station. Example: "VAEXJOE
  • country_code::String: The ISO country code of the station. Example: "SE" for Sweden.
  • latitude_arcsec::Int: The latitude of the station in arcseconds. Example: 1234567 for 34.2919°.
  • longitude_arcsec::Int: The longitude of the station in arcseconds. Example: -1234567 for -34.2919°.
  • height_meter::Int: The height of the station in meters. Example: 100 for 100 meters above sea level.
source
ECAD.repair_source_ids!Method
repair_bad_source_ids!(observation_df, sources_df)

Checks if:

  • There is only one unique station_id in observation_df.
  • source_id values in observation_df match any id in sources_df for the

corresponding station_id.

If there are source_id values in observation_df that do not match any id in sources_df for the same station_id, the function attempts to repair them using a series of heuristics defined in resolve_invalid_source_ids.

Arguments

  • observation_df: A DataFrame containing the observations, with at least station_id and source_id columns.
  • sources_df: A DataFrame containing the sources, with at least id and station_id columns.

Returns

  • true if all source_id values in observation_df are valid or were successfully repaired.
  • false if there are still invalid source_id values after attempting repairs, or if there are multiple station_id values in observation_df. Also logs warnings in these cases.
source
ECAD.resolve_invalid_source_idsMethod
resolve_invalid_source_ids(obs_sources, available_sources; fix_funcs!)

Given a list of obs_sources and available_sources, attempts to find matches for any obs_sources that are not in available_sources using a series of provided fixing functions (fix_funcs!).

Arguments

  • obs_sources: A collection of source IDs from the observations that need to be checked.
  • available_sources: A collection of valid source IDs that can be matched against.
  • fix_funcs!: An optional collection of functions that implement heuristics to find matches for bad source IDs. Each function should have the signature fix!(fixes::Dict, bad_sources, available_sources) and should update the fixes dictionary fixes[badsource] = matchedavailable_source for any matches it finds.

Returns

  • fixes: A dictionary mapping any obs_sources that were successfully matched to their corresponding available_sources.
  • bad_sources: A collection of any obs_sources that could not be matched to any available_sources after all fixing functions were applied.
source