Making a new dataset#

If you are contributing a new dataset, please have a look at Writing a CMORizer script for an additional dataset for how to do so. Please always create separate pull requests for CMORizer scripts, even when introducing a new dataset or updating an existing dataset with a new recipe.

If you are updating a CMORizer script to support a different dataset version, please have a look at Support for multiple versions of a dataset for how to handle multiple dataset versions.

Dataset documentation#

The documentation required for a CMORizer script is the following:

For more general information on writing documentation, see Documentation.

Testing#

When contributing a new script, add an entry for the CMORized data to recipes/examples/recipe_check_obs.yml and run the recipe, to make sure the CMOR checks pass without warnings or errors.

To test a pull request for a new CMORizer script:

  1. Download the data following the instructions included in the script and place it in the RAWOBS rootpath specified in your configuration

  2. If available, use the downloading script by running esmvaltool data download --config_file <config-file>  <dataset>

  3. Run the cmorization by running esmvaltool data format <config-file> <dataset>

  4. Copy the resulting data to the OBS (for CMIP5 compliant data) or OBS6 (for CMIP6 compliant data) rootpath specified in your configuration

  5. Run recipes/examples/recipe_check_obs.yml with the new dataset to check that the data can be used

Scientific sanity check#

When contributing a new dataset, we expect that the numbers and units of the dataset look physically meaningful. The scientific reviewer needs to check this.

Data availability#

Once your pull request has been approved by the reviewers, ask a member of @OBS-maintainers to add the new dataset to the data pool at DKRZ and CEDA-Jasmin. This team is in charge of merging CMORizer pull requests.

Detailed checklist for reviews#

This (non-exhaustive) checklist provides ideas for things to check when reviewing pull requests for new or updated CMORizer scripts.

Dataset description#

Check that new dataset has been added to the table of observations defined in the ESMValTool guide user’s guide in section Obtaining input data (generated from doc/sphinx/source/input.rst). Check that the new dataset has also been added to the file datasets.yml.

BibTeX info file#

Check that a BibTeX file, i.e. <dataset>.bibtex defining the reference for the new dataset has been created in esmvaltool/references/.

recipe_check_obs.yml#

Check that new dataset has been added to the testing recipe esmvaltool/recipes/examples/recipe_check_obs.yml

Downloader script#

If present, check that the new downloader script esmvaltool/cmorizers/data/downloaders/datasets/<dataset>.py meets standards. This includes the following items:

  • Code quality checks

    1. Code quality

    2. No Codacy errors reported

CMORizer script#

Check that the new CMORizer script esmvaltool/cmorizers/data/formatters/datasets/<dataset>.{py,ncl} meets standards. This includes the following items:

  • In-code documentation (header) contains

    1. Download instructions

    2. Reference(s)

  • Code quality checks

    1. Code quality (e.g. no hardcoded pathnames)

    2. No Codacy errors reported

Config file#

If present, check config file <dataset>.yml in esmvaltool/cmorizers/data/cmor_config/ for correctness. Use yamllint to check for syntax errors and common mistakes.

Run downloader script#

If available, make sure the downloader script is working by running esmvaltool data download --config_file <config-file> <dataset>

Run CMORizer#

Make sure CMORizer is working by running esmvaltool data format --config_file <config-file> <dataset>

Check output of CMORizer#

After successfully running the new CMORizer, check that:

  • Output contains (some) valid values (e.g. not only nan or zeros)

  • Metadata is defined properly

Run esmvaltool/recipes/examples/recipe_check_obs.yml for new dataset.

RAW data#

Contact the team in charge of ESMValTool data pool (@OBS-maintainers) and request to copy RAW data to RAWOBS/Tier2 (Tier3).

CMORized data#

Contact the team in charge of ESMValTool data pool (@OBS-maintainers) and request to

  • Merge the pull request

  • Copy CMORized dataset to OBS/Tier2 (Tier3)

  • Set file access rights for new dataset