Making a new dataset

If you are contributing a new dataset, please have a look at Writing a CMORizer script for an additional dataset for how to do so. If you need the new dataset for a new recipe, please make a separate pull for the CMORizer script.

Dataset documentation

The documentation required for a CMORizer script is the following:

  • Make sure that the new dataset is added to the list of Supported datasets

  • The in code documentation should contain clear instructions on how to obtain the data

  • A BibTeX file named <dataset>.bibtex defining the reference for the new dataset should be placed in the directory esmvaltool/references/, see Adding references for detailed instructions.

For more general information on writing documentation, see Documentation.


When contributing a new script, add an entry for the CMORized data to recipes/examples/recipe_check_obs.yml and run the recipe, to make sure the CMOR checks pass without warnings or errors.

To test a pull request for a new CMORizer script:

  1. Download the data following the instructions included in the script and place it in the RAWOBS path specified in your config-user.yml

  2. Run the CMORizer script by running cmorize_obs -c <config-file> -o <dataset>

  3. Copy the resulting data to the OBS (for CMIP5 compliant data) or OBS6 (for CMIP6 compliant data) path specified in your config-user.yml

  4. Run recipes/examples/recipe_check_obs.yml with the new dataset to check that the data can be used

Scientific sanity check

When contributing a new dataset, we expect that the numbers and units of the dataset look physically meaningful. The scientific reviewer needs to check this.

Data availability

Once your pull request has been approved by the reviewers, ask @remi-kazeroni to add the new dataset to the data pool at DKRZ and CEDA-Jasmin. He is also the person in charge of merging CMORizer pull requests.

Detailed checklist for reviews

This (non-exhaustive) checklist provides ideas for things to check when reviewing pull requests for new or updated CMORizer scripts.

Dataset description

Check that new dataset has been added to the table of observations defined in the ESMValTool guide user’s guide in section Obtaining input data (generated from doc/sphinx/source/input.rst).

BibTeX info file

Check that a BibTeX file, i.e. <dataset>.bibtex defining the reference for the new dataset has been created in esmvaltool/references/.


Check that new dataset has been added to the testing recipe esmvaltool/recipes/examples/recipe_check_obs.yml

CMORizer script

Check that the new CMORizer script esmvaltool/cmorizers/obs/cmorize_obs_<dataset>.{py,ncl} meets standards. This includes the following items:

  • In-code documentation (header) contains

    1. Download instructions

    2. Reference(s)

  • Code quality checks

    1. Code quality (e.g. no hardcoded pathnames)

    2. No Codacy errors reported

Config file

If present, check config file <dataset>.yml in esmvaltool/cmorizers/obs/cmor_config/ for correctness. Use yamllint to check for syntax errors and common mistakes.

Run CMORizer

Make sure CMORizer is working by running cmorize_obs -c <config-file> -o <dataset>

Check output of CMORizer

After successfully running the new CMORizer, check that:

  • Output contains (some) valid values (e.g. not only nan or zeros)

  • Metadata is defined properly

Run esmvaltool/recipes/examples/recipe_check_obs.yml for new dataset.

RAW data

Contact person in charge of ESMValTool data pool (@remi-kazeroni) and request to copy RAW data to RAWOBS/Tier2 (Tier3).

CMORized data

Contact person in charge of ESMValTool data pool (@remi-kazeroni) and request to

  • Merge the pull request

  • Copy CMORized dataset to OBS/Tier2 (Tier3)

  • Set file access rights for new dataset