Making a new dataset#
If you are contributing a new dataset, please have a look at Writing a CMORizer script for an additional dataset for how to do so. Please always create separate pull requests for CMORizer scripts, even when introducing a new dataset or updating an existing dataset with a new recipe.
If you are updating a CMORizer script to support a different dataset version, please have a look at Support for multiple versions of a dataset for how to handle multiple dataset versions.
Dataset documentation#
The documentation required for a CMORizer script is the following:
Make sure that the new dataset is added to the list of Supported datasets for which a CMORizer script is available and to the file datasets.yml.
The code documentation should contain clear instructions on how to obtain the data.
A BibTeX file named
<dataset>.bibtex
defining the reference for the new dataset should be placed in the directoryesmvaltool/references/
, see Adding references for detailed instructions.
For more general information on writing documentation, see Documentation.
Testing#
When contributing a new script, add an entry for the CMORized data to recipes/examples/recipe_check_obs.yml and run the recipe, to make sure the CMOR checks pass without warnings or errors.
To test a pull request for a new CMORizer script:
Download the data following the instructions included in the script and place it in the
RAWOBS
path specified in yourconfig-user.yml
If available, use the downloading script by running
esmvaltool data download --config_file <config-file> <dataset>
Run the cmorization by running
esmvaltool data format <config-file> <dataset>
Copy the resulting data to the
OBS
(for CMIP5 compliant data) orOBS6
(for CMIP6 compliant data) path specified in yourconfig-user.yml
Run
recipes/examples/recipe_check_obs.yml
with the new dataset to check that the data can be used
Scientific sanity check#
When contributing a new dataset, we expect that the numbers and units of the dataset look physically meaningful. The scientific reviewer needs to check this.
Data availability#
Once your pull request has been approved by the reviewers, ask a member of @OBS-maintainers to add the new dataset to the data pool at DKRZ and CEDA-Jasmin. This team is in charge of merging CMORizer pull requests.
Detailed checklist for reviews#
This (non-exhaustive) checklist provides ideas for things to check when reviewing pull requests for new or updated CMORizer scripts.
Dataset description#
Check that new dataset has been added to the table of observations defined in
the ESMValTool guide user’s guide in section Obtaining input data
(generated from doc/sphinx/source/input.rst
).
Check that the new dataset has also been added to the file datasets.yml.
BibTeX info file#
Check that a BibTeX file, i.e. <dataset>.bibtex
defining the reference for
the new dataset has been created in esmvaltool/references/
.
recipe_check_obs.yml#
Check that new dataset has been added to the testing recipe
esmvaltool/recipes/examples/recipe_check_obs.yml
Downloader script#
If present, check that the new downloader script
esmvaltool/cmorizers/data/downloaders/datasets/<dataset>.py
meets standards.
This includes the following items:
Code quality checks
Code quality
No Codacy errors reported
CMORizer script#
Check that the new CMORizer script
esmvaltool/cmorizers/data/formatters/datasets/<dataset>.{py,ncl}
meets standards.
This includes the following items:
In-code documentation (header) contains
Download instructions
Reference(s)
Code quality checks
Code quality (e.g. no hardcoded pathnames)
No Codacy errors reported
Config file#
If present, check config file <dataset>.yml
in
esmvaltool/cmorizers/data/cmor_config/
for correctness.
Use yamllint
to check for syntax errors and common mistakes.
Run downloader script#
If available, make sure the downloader script is working by running
esmvaltool data download --config_file <config-file> <dataset>
Run CMORizer#
Make sure CMORizer is working by running
esmvaltool data format --config_file <config-file> <dataset>
Check output of CMORizer#
After successfully running the new CMORizer, check that:
Output contains (some) valid values (e.g. not only nan or zeros)
Metadata is defined properly
Run esmvaltool/recipes/examples/recipe_check_obs.yml
for new dataset.
RAW data#
Contact the team in charge of ESMValTool data pool (@OBS-maintainers) and request to copy RAW data to RAWOBS/Tier2 (Tier3).
CMORized data#
Contact the team in charge of ESMValTool data pool (@OBS-maintainers) and request to
Merge the pull request
Copy CMORized dataset to OBS/Tier2 (Tier3)
Set file access rights for new dataset