Making a new dataset

If you are contributing a new dataset, please have a look at Writing a CMORizer script for an additional dataset for how to do so. Please always create separate pull requests for CMORizer scripts, even when introducing a new dataset or updating an existing dataset with a new recipe.

If you are updating a CMORizer script to support a different dataset version, please have a look at Support for multiple versions of a dataset for how to handle multiple dataset versions.

Dataset documentation

The documentation required for a CMORizer script is the following:

For more general information on writing documentation, see Documentation.

Testing

When contributing a new script, add an entry for the CMORized data to recipes/examples/recipe_check_obs.yml and run the recipe, to make sure the CMOR checks pass without warnings or errors.

To test a pull request for a new CMORizer script:

  1. Download the data following the instructions included in the script and place it in the RAWOBS path specified in your config-user.yml

  2. If available, use the downloading script by running esmvaltool data download --config_file <config-file>  <dataset>

  3. Run the cmorization by running esmvaltool data format <config-file> <dataset>

  4. Copy the resulting data to the OBS (for CMIP5 compliant data) or OBS6 (for CMIP6 compliant data) path specified in your config-user.yml

  5. Run recipes/examples/recipe_check_obs.yml with the new dataset to check that the data can be used

Scientific sanity check

When contributing a new dataset, we expect that the numbers and units of the dataset look physically meaningful. The scientific reviewer needs to check this.

Data availability

Once your pull request has been approved by the reviewers, ask @remi-kazeroni to add the new dataset to the data pool at DKRZ and CEDA-Jasmin. He is also the person in charge of merging CMORizer pull requests.

Detailed checklist for reviews

This (non-exhaustive) checklist provides ideas for things to check when reviewing pull requests for new or updated CMORizer scripts.

Dataset description

Check that new dataset has been added to the table of observations defined in the ESMValTool guide user’s guide in section Obtaining input data (generated from doc/sphinx/source/input.rst). Check that the new dataset has also been added to the file datasets.yml.

BibTeX info file

Check that a BibTeX file, i.e. <dataset>.bibtex defining the reference for the new dataset has been created in esmvaltool/references/.

recipe_check_obs.yml

Check that new dataset has been added to the testing recipe esmvaltool/recipes/examples/recipe_check_obs.yml

Downloader script

If present, check that the new downloader script esmvaltool/cmorizers/data/downloaders/datasets/<dataset>.py meets standards. This includes the following items:

  • Code quality checks

    1. Code quality

    2. No Codacy errors reported

CMORizer script

Check that the new CMORizer script esmvaltool/cmorizers/data/formatters/datasets/<dataset>.{py,ncl} meets standards. This includes the following items:

  • In-code documentation (header) contains

    1. Download instructions

    2. Reference(s)

  • Code quality checks

    1. Code quality (e.g. no hardcoded pathnames)

    2. No Codacy errors reported

Config file

If present, check config file <dataset>.yml in esmvaltool/cmorizers/data/cmor_config/ for correctness. Use yamllint to check for syntax errors and common mistakes.

Run downloader script

If available, make sure the downloader script is working by running esmvaltool data download --config_file <config-file> <dataset>

Run CMORizer

Make sure CMORizer is working by running esmvaltool data format --config_file <config-file> <dataset>

Check output of CMORizer

After successfully running the new CMORizer, check that:

  • Output contains (some) valid values (e.g. not only nan or zeros)

  • Metadata is defined properly

Run esmvaltool/recipes/examples/recipe_check_obs.yml for new dataset.

RAW data

Contact person in charge of ESMValTool data pool (@remi-kazeroni) and request to copy RAW data to RAWOBS/Tier2 (Tier3).

CMORized data

Contact person in charge of ESMValTool data pool (@remi-kazeroni) and request to

  • Merge the pull request

  • Copy CMORized dataset to OBS/Tier2 (Tier3)

  • Set file access rights for new dataset