API

Index

ECAD.GTS_FALLBACK_PARTICIPANT_ID
ECAD.StationData
ECAD.VariableData
ECAD.all_variables
ECAD.camelcase_to_words
ECAD.camelcase_to_words!
ECAD.canonical_NAME
ECAD.canonical_name
ECAD.dataset_zip
ECAD.detect_header_row
ECAD.dms2arcsec
ECAD.download_progress
ECAD.from_name
ECAD.human_bytes
ECAD.intersect_stations
ECAD.is_header_row
ECAD.load_elements
ECAD.load_observations
ECAD.load_sources
ECAD.load_stations
ECAD.longname
ECAD.normalize_embedded_commas
ECAD.patch_first_digit!
ECAD.patch_substring!
ECAD.pretty_name
ECAD.raw_observations
ECAD.raw_station
ECAD.repair_source_ids!
ECAD.resolve_invalid_source_ids
ECAD.resolve_participant_id
ECAD.station_ids
ECAD.variable
ECAD.zipcontent
ECAD.zipfile
ECAD.@defvariable

Public API

ECAD.StationData — Type

StationData(
    var_data::Union{VariableData, AbstractVector{<:VariableData}},
    station_id::Integer;
    include_sources = false
) -> StationData

A struct representing all the data available for a given station, including its metadata and observations for multiple variables.

Fields

id::Int: The station ID. Example: 123.
name::String: The name of the station. Example: "VAEXJOE
latitude::Int: The latitude of the station in arcseconds. Example: 1234567 for 34.2919°.
longitude::Int: The longitude of the station in arcseconds. Example: -1234567 for -34.2919°.
elevation::Int: The elevation of the station in meters. Example: 100 for 100 meters above sea level.
variables::Vector{VariableData}: The list of variables for which observations are included in this station data.
observations::DataFrame: A data frame containing the observations for this station, with one row per date and variable. The columns are:
- station_id::Int: The station ID. Example: 123.
- date::Date: The date of the observation. Example: Date(1950, 1, 1).
- For each variable, there are three columns:
  - variable_value: The value of the observation for this variable. Example: 150 for a temperature of 15.0°C if the unit is 0.1
  - variable_quality: The quality flag of the observation for this variable. Is one of "valid", "suspect" or "missing". Example: "valid".
  - variable_element_id: The element ID of the observation for this variable. Example: TX1 for the first element of the :tx (temperature max) variable.

source

ECAD.VariableData — Type

VariableData(variable::Variable, filepath::String; memory_map = true)

A variable with its associated data. If you don't know what memory_map means, keep it to true.

Fields

variable::Variable: The variable this data corresponds to.
filepath::String: The path to the zip file containing the data for this variable.
content::ZipReader{<:AbstractVector{UInt8}}: A ZipReader instance for reading the contents of the zip file.

Accessor methods are provided for convenience:

variable(var::VariableData): Get the variable.
zipfile(var::VariableData): Get the path to the zip file.
zipcontent(var::VariableData): Get the ZipReader for the zip file.
ZipArchives.zip_names(var::VariableData): Get the list of file names in the zip archive.
ZipArchives.zip_readentry(var::VariableData, args...; kwargs...): Read a specific entry from the zip archive.

To load specific data, use the provided functions:

load_sources(var::VariableData): Load the sources data frame from the zip file.
load_stations(var::VariableData): Load the stations data frame from the zip file.
load_elements(var::VariableData): Load the elements data frame from the local elements file.
load_elements(var::VariableData, file): Load the elements data frame from a specified file.
load_observations(var::VariableData, station_id): Load the observations data frame for a specific station ID from the zip file.

source

ECAD.canonical_name — Method

canonical_name(var::Variable) -> Symbol

Return the canonical name of a variable as a symbol. The canonical name is the primary identifier for a variable, and all aliases for that variable will map to this canonical name.

Example

julia> canonical_name(CloudCover())
:cc

source

ECAD.dataset_zip — Method

dataset_zip(var)

Get the path to the zip file for the given variable. The variable can be specified using any of its aliases, e.g. :tx, :temperature_max, or "temperature_max".

source

ECAD.intersect_stations — Method

intersect_stations(vars::AbstractVector{<:VariableData}) -> DataFrame

Find all stations that are present across all provided variables, returning their shared metadata.

This is useful when you want to build a StationData for multiple variables and need to know which station IDs are valid for all of them — i.e. which stations have observations for every variable in vars.

Returns a DataFrame with one row per common station, with columns:

id::Int: The station ID.
name::String: The name of the station.
country_code::String: The ISO country code.
latitude_arcsec::Int: The latitude in arcseconds.
longitude_arcsec::Int: The longitude in arcseconds.
height_meter::Int: The elevation in meters above sea level.

Throws

ArgumentError if vars is empty.

Example

vars = VariableData.([:tg, :tx, :tn])
common = intersect_stations(vars)
station_data = StationData.(Ref(vars), common.id)

source

ECAD.load_elements — Method

load_elements() -> DataFrame
load_elements(var::VariableData, file = nothing) -> DataFrame

Load the elements for the given variable, or everything. Elements are variants of the same variable that may have different units or observation methods. For example, for :tg (temperature mean), a variant simply compute the mean of the daily max and min, while another method compute the mean of each hourly observation.

Columns

element_id::Int: The element ID. Example: TX1 for the first element of the :tx (temperature max) variable.
description::String: A description of the element. Example: "Daily maximum temperature calculated from the maximum of the daily observations".
unit::String: The unit of the element. Example: "0.1°C".
variable_id::String: The canonical name of the variable this element belongs to. Example: "TX" for :tx (temperature max).
variable_name::String: A human-readable name of the variable this element belongs to. Example: "MAX TEMPERATURE" for :tx (temperature max).

source

ECAD.load_observations — Method

load_observations(var::VariableData, station_id::Integer) -> DataFrame

Load the observations for the given variable and station ID from the zip file. The station ID must be one of the IDs returned by station_ids(var).

A warning is emitted if there are multiple elements in the observations, meaning that the observations may have different units / observation methods. In that case, the elements data frame returned by load_elements(var) should be consulted to see what each element ID means.

Columns

station_id::Int: The station ID. Example: 123.
source_id::Int: The source ID associated with this observation. Example: 1. See load_sources for more info on sources.
element_id::Int: The element ID associated with this observation. Example: TX1 for the first element of the :tx (temperature max) variable. See load_elements for more info on elements.
date::Date: The date of the observation. Example: Date(1950, 1, 1).
value::Int64?: The value of the observation, or missing if not available. The unit depends on the element. See the unit column in the data frame returned by load_elements for more info on the unit of each element.
quality::String: The quality flag of the observation. Is one of "valid", "suspect" or "missing".

source

ECAD.load_sources — Method

load_sources(var::VariableData) -> DataFrame

Load the sources data frame from the zip file for the given variable.

Columns

id::Int: The source ID. Example: 1.
name::String: The name of the source. Example: "VAEXJOE"
station_id::Int: The station ID associated with this source. Example: 123.
start_date::Data: The first date of observations from this source. Example: Date(1950, 1, 1).
end_date::Date: The last date of observations from this source. Example: Date(2020, 12, 31).
country_code::String: The ISO country code of the source. Example: "SE" for Sweden.
longitude_arcsec::Int: The longitude of the source in arcseconds.
latitude_arcsec::Int: The latitude of the source in arcseconds.
height_meter::Int: The height of the source in meters.
participant_id::Int: The ID of the participant that provided this source. Example: 42.
participant_name::String: The name of the participant that provided this source. Example: ""Marcus Flarup"
element_id::Int: The ID of the element observed by this source. See load_elements for more info.

source

ECAD.load_stations — Method

load_stations(var::VariableData) -> DataFrame

Load the stations data frame from the zip file for the given variable.

Columns

id::Int: The station ID. Example: 123.
name::String: The name of the station. Example: "VAEXJOE".
country_code::String: The ISO country code of the station. Example: "SE" for Sweden.
longitude_arcsec::Int: The longitude of the station in arcseconds.
latitude_arcsec::Int: The latitude of the station in arcseconds.
height_meter::Int: The height of the station in meters.

source

ECAD.longname — Method

longname(var::Variable) -> String

Return a human-readable long name for the variable, suitable for display in plots and tables. By default, this is generated by converting the variable type name from CamelCase to a underscore-separated lowercase string. For example, CloudCover would become cloud_cover.

You can override this by providing a custom long name when defining the variable type.

Example

julia> longname(CloudCover())
:cloud_cover

source

ECAD.pretty_name — Method

pretty_name(var::Variable) -> String

Return a pretty name for the variable, which can be used in contexts where a more concise or stylized name is desired. By default, this is generated by converting the variable type name from CamelCase to a space-separated lowercase string. For example, CloudCover would become cloud cover.

You can override this by providing a custom pretty name when defining the variable type.

Example

julia> pretty_name(CloudCover())
"cloud cover"

source

ECAD.station_ids — Method

station_ids(var::VariableData) -> Vector{Int}

The list of station IDs available in the zip file for the given variable.

source

ECAD.variable — Method

variable(var::VariableData) -> Variable

The variable associated with the given VariableData.

source

ECAD.zipcontent — Method

zipcontent(var::VariableData) -> ZipReader

The ZipReader instance for the zip file associated with the given VariableData.

source

ECAD.zipfile — Method

zipfile(var::VariableData) -> String

The path to the zip file associated with the given VariableData.

source

ECAD.@defvariable — Macro

@defvariable TypeName canonical_name

Define a Variable subtype and automatically register canonical_name and from_name methods for it.

source

Private API

Click to see

ECAD.GTS_FALLBACK_PARTICIPANT_ID — Constant

const GTS_FALLBACK_PARTICIPANT_ID = Int(typemax(Int32))

Some GTS sources are missing participant IDs but have a consistent name ("Synoptical message from GTS"). This constant is used as a fallback participant ID for those sources to allow them to be included in the dataset without losing the information that they are GTS sources.

source

ECAD.all_variables — Method

all_variables() -> Vector{Variable}

Return a vector of instances of all defined Variable subtypes.

source

ECAD.camelcase_to_words — Function

camelcase_to_words(s::String, sep=' ') -> String

Convert a CamelCase string s to a sep-separated lowercase string. Returns the converted string.

Example

julia> camelcase_to_words("CamelCaseToWords")
"camel case to words"

julia> camelcase_to_words("CamelCaseToWords", '_')
"camel_case_to_words"

See also: camelcase_to_words! for an in-place version that writes to an IO stream.

source

ECAD.camelcase_to_words! — Function

camelcase_to_words(io::IO, s::AbstractString)

Convert a CamelCase string s to a space-separated lowercase string, writing the result to the provided IO stream io. Modifies the input string in-place and returns the IO stream.

Example

julia> io = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia> camelcase_to_words!(io, "CamelCaseToWords", ' ')
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=19, maxsize=Inf, ptr=20, mark=-1)

julia> String(take!(io))
"camel case to words"

See also: camelcase_to_words for a version that returns a new string instead of modifying in-place.

source

ECAD.canonical_NAME — Method

canonical_NAME(var::Variable) -> String

Return the canonical name of a variable as an uppercase string. This is useful for matching variable names in datasets that use uppercase naming conventions.

Example

julia> canonical_NAME(CloudCover())
"CC"

source

ECAD.detect_header_row — Method

detect_header_row(io::IO) -> Int

Detects the line number of the header row in a CSV file by reading through the lines of the provided IO stream and checking each line with is_header_row. If a header row is found, it returns the line number. If no header row is found after reading through the entire file, it throws an ArgumentError.

source

ECAD.dms2arcsec — Method

dms2arcsec(dms::AbstractString) -> Int

Converts a DMS (Degrees, Minutes, Seconds) string to total arcseconds as an integer. Divide the result by 3600 to get decimal degrees.

source

ECAD.download_progress — Method

download_progress(remote_filepath, local_directory_path)

Download a file from remote_filepath to local_directory_path while displaying a progress bar.

source

ECAD.from_name — Method

from_name(x::Union{Symbol, String, Val}) -> Variable

Convert a canonical variable name to its corresponding Variable type instance.

Example

julia> from_name(:cc)
CloudCover()

source

ECAD.human_bytes — Method

human_bytes(bytes::Integer) -> String

Convert a byte count to a human-readable string with appropriate units (B, KB, MB, GB, TB, PB).

Example

julia> human_bytes(123)
"123.00 B"

julia> human_bytes(123456)
"120.56 KB"

julia> human_bytes(123456789)
"117.74 MB"

source

ECAD.is_header_row — Method

is_header_row(line) -> Bool

Checks if a given line from the CSV file is likely to be the header row by verifying that it contains at least two comma-separated parts and that all parts are uppercase (ignoring whitespace).

source

ECAD.normalize_embedded_commas — Method

normalize_embedded_commas(io::IO, header_line, col_name) -> IOBuffer

Streams io into an IOBuffer, quoting any unquoted commas inside the field col_name.

Arguments

io: An IO stream containing the CSV data.
header_line: The line number of the header row in the CSV data.
col_name: The name of the column to check for embedded commas.

Returns

A new IOBuffer containing the normalized CSV data with embedded commas in col_name properly quoted.

source

ECAD.patch_first_digit! — Method

patch_first_digit!(fixes::Dict, bad_sources, available_sources)

A fix for resolve_invalid_source_ids that attempts to match any bad_sources that match an available_sources after removing the first character (e.g. if the bad source is "12345" and there is an available source "22345").

source

ECAD.patch_substring! — Method

patch_substring!(fixes::Dict, bad_sources, available_sources)

A fix for resolve_invalid_source_ids that attempts to match any bad_sources that are substrings of any available_sources.

source

ECAD.raw_observations — Method

raw_observations(observations::DataFrame, var_data::VariableData; include_sources = false) -> DataFrame

Extract the raw observations for a given station ID from the observations data frame and variable data, and rename the columns to include the variable name.

The columns are renamed as follows:

value -> variable_value
quality -> variable_quality
element_id -> variable_element_id
source_id -> variable_source_id (only if include_sources = true)

source

ECAD.raw_station — Method

raw_station(observations::DataFrame, var_data::VariableData, station_id::Integer) -> DataFrameRow

Extract the raw station metadata for a given station ID from the observations data frame and variable data. This is used internally to ensure that the station metadata is consistent across multiple variables when building a StationData for multiple variables.

Returns

A DataFrameRow with the following attributes:

id::Int: The station ID. Example: 123.
name::String: The name of the station. Example: "VAEXJOE
country_code::String: The ISO country code of the station. Example: "SE" for Sweden.
latitude_arcsec::Int: The latitude of the station in arcseconds. Example: 1234567 for 34.2919°.
longitude_arcsec::Int: The longitude of the station in arcseconds. Example: -1234567 for -34.2919°.
height_meter::Int: The height of the station in meters. Example: 100 for 100 meters above sea level.

source

ECAD.repair_source_ids! — Method

repair_bad_source_ids!(observation_df, sources_df)

Checks if:

There is only one unique station_id in observation_df.
source_id values in observation_df match any id in sources_df for the

corresponding station_id.

If there are source_id values in observation_df that do not match any id in sources_df for the same station_id, the function attempts to repair them using a series of heuristics defined in resolve_invalid_source_ids.

Arguments

observation_df: A DataFrame containing the observations, with at least station_id and source_id columns.
sources_df: A DataFrame containing the sources, with at least id and station_id columns.

Returns

true if all source_id values in observation_df are valid or were successfully repaired.
false if there are still invalid source_id values after attempting repairs, or if there are multiple station_id values in observation_df. Also logs warnings in these cases.

source

ECAD.resolve_invalid_source_ids — Method

resolve_invalid_source_ids(obs_sources, available_sources; fix_funcs!)

Given a list of obs_sources and available_sources, attempts to find matches for any obs_sources that are not in available_sources using a series of provided fixing functions (fix_funcs!).

Arguments

obs_sources: A collection of source IDs from the observations that need to be checked.
available_sources: A collection of valid source IDs that can be matched against.
fix_funcs!: An optional collection of functions that implement heuristics to find matches for bad source IDs. Each function should have the signature fix!(fixes::Dict, bad_sources, available_sources) and should update the fixes dictionary fixes[badsource] = matchedavailable_source for any matches it finds.

Returns

fixes: A dictionary mapping any obs_sources that were successfully matched to their corresponding available_sources.
bad_sources: A collection of any obs_sources that could not be matched to any available_sources after all fixing functions were applied.

source

ECAD.resolve_participant_id — Method

Resolve the GTS participant ID for rows whose participant is missing but name matches.

source