Dataset#

Classes and functions for defining, finding, and loading data.

Classes:

Dataset(**facets)

Define datasets, find the related files, and load them.

Data:

INHERITED_FACETS

Inherited facets.

Functions:

datasets_to_recipe(datasets[, recipe])

Create or update a recipe from datasets.

class esmvalcore.dataset.Dataset(**facets: str | Sequence[str] | Number)[source]#

Define datasets, find the related files, and load them.

Parameters:: **facets – Facets describing the dataset. See esmvalcore.esgf.facets.FACETS for the mapping between the facet names used by ESMValCore and those used on ESGF.

supplementaries#

List of supplementary datasets.

Type:: list[Dataset]

facets#

Facets describing the dataset.

Type:: esmvalcore.typing.Facets

Methods:

`add_supplementary`(**facets)	Add an supplementary dataset.
`augment_facets`()	Add extra facets.
`copy`(**facets)	Create a copy.
`find_files`()	Find files.
`from_files`()	Create datasets based on the available files.
`from_ranges`()	Create a list of datasets from short notations.
`from_recipe`(recipe, session)	Read datasets from a recipe.
`load`()	Load dataset.
`set_facet`(key, value[, persist])	Set facet.
`set_version`()	Set the `'version'` facet based on the available data.
`summary`([shorten])	Summarize the content of dataset.

Attributes:

`files`	The files associated with this dataset.
`minimal_facets`	Return a dictionary with the persistent facets.
`session`	A `esmvalcore.config.Session` associated with the dataset.

add_supplementary(**facets: str | Sequence[str] | Number) → None[source]#

Add an supplementary dataset.

This is a convenience function that will create a copy of the current dataset, update its facets with the values specified in **facets, and append it to Dataset.supplementaries. For more control over the creation of the supplementary dataset, first create a new Dataset describing the supplementary dataset and then append it to Dataset.supplementaries.

Parameters:: **facets – Facets describing the supplementary variable.

augment_facets() → None[source]#

Add extra facets.

This function will update the dataset with additional facets from various sources.

copy(**facets: str | Sequence[str] | Number) → Dataset[source]#

Create a copy.

Parameters:: **facets – Update these facets in the copy. Note that for supplementary datasets attached to the dataset, the 'short_name' and 'mip' facets will not be updated with these values.
Returns:: A copy of the dataset.
Return type:: Dataset

property files: Sequence[ESGFFile | LocalFile]#: The files associated with this dataset.

find_files() → None[source]#

Find files.

Look for files and populate the Dataset.files property of the dataset and its supplementary datasets.

from_files() → Iterator[Dataset][source]#

Create datasets based on the available files.

The facet values for local files are retrieved from the directory tree where the directories represent the facets values. Reading facet values from file names is not yet supported. See CMIP data for more information on this kind of file organization.

glob.glob() patterns can be used as facet values to select multiple datasets. If for some of the datasets not all glob patterns can be expanded (e.g. because the required facet values cannot be inferred from the directory names), these datasets will be ignored, unless this happens to be all datasets.

If glob.glob() patterns are used in supplementary variables and multiple matching datasets are found, only the supplementary dataset that has most facets in common with the main dataset will be attached.

Supplementary datasets will in inherit the facet values from the main dataset for those facets listed in INHERITED_FACETS.

Examples

See Discovering data for example use cases.

Yields:: Dataset – Datasets representing the available files.

from_ranges() → list['Dataset'][source]#

Create a list of datasets from short notations.

This expands the 'ensemble' and 'sub_experiment' facets in the dataset definition if they are ranges.

For example 'ensemble'='r(1:3)i1p1f1' will be expanded to three datasets, with 'ensemble' values 'r1i1p1f1', 'r2i1p1f1', 'r3i1p1f1'.

Returns:: The datasets.
Return type:: list[Dataset]

static from_recipe(recipe: Path | str | dict, session: Session) → list['Dataset'][source]#

Read datasets from a recipe.

Parameters:

recipe – Recipe to load the datasets from. The value provided here should be either a path to a file, a recipe file that has been loaded using e.g. yaml.safe_load(), or an str that can be loaded using yaml.safe_load().
session – Datasets to use in the recipe.

Returns:

A list of datasets.

Return type:

list[Dataset]

load() → Cube[source]#

Load dataset.

Raises:: InputFilesNotFound – When no files were found.
Returns:: An iris cube with the data corresponding the the dataset.
Return type:: iris.cube.Cube

property minimal_facets: Dict[str, str | Sequence[str] | Number]#: Return a dictionary with the persistent facets.

property session: Session#: A esmvalcore.config.Session associated with the dataset.

set_facet(key: str, value: str | Sequence[str] | Number, persist: bool = True)[source]#

Set facet.

Parameters:

key – The name of the facet.
value – The value of the facet.
persist – When writing a dataset to a recipe, only persistent facets will get written.

set_version() → None[source]#: Set the 'version' facet based on the available data.

summary(shorten: bool = False) → str[source]#

Summarize the content of dataset.

Parameters:: shorten – Shorten the summary.
Returns:: A summary describing the dataset.
Return type:: str

esmvalcore.dataset.INHERITED_FACETS: list[str] = ['dataset', 'domain', 'driver', 'grid', 'project', 'timerange']#

Inherited facets.

Supplementary datasets created based on the available files using the Dataset.from_files() method will inherit the values of these facets from the main dataset.

esmvalcore.dataset.datasets_to_recipe(datasets: Iterable[Dataset], recipe: Path | str | dict[str, Any] | None = None) → dict[source]#

Create or update a recipe from datasets.

Parameters:

datasets – Datasets to use in the recipe.
recipe – Recipe to load the datasets from. The value provided here should be either a path to a file, a recipe file that has been loaded using e.g. yaml.safe_load(), or an str that can be loaded using yaml.safe_load().

Examples

See Composing recipes for example use cases.

Returns:: The recipe with the datasets. To convert the dict to a recipe, use e.g. yaml.safe_dump().
Return type:: dict
Raises:: RecipeError – Raised when a dataset is missing the diagnostic facet.