Making a new dataset#
If you are contributing a new dataset, please have a look at Writing a CMORizer script for an additional dataset for how to do so. Please always create separate pull requests for CMORizer scripts, even when introducing a new dataset or updating an existing dataset with a new recipe.
If you are updating a CMORizer script to support a different dataset version, please have a look at Support for multiple versions of a dataset for how to handle multiple dataset versions.
Dataset documentation#
The documentation required for a CMORizer script is the following:
Make sure that the new dataset is added to the list of Supported datasets for which a CMORizer script is available and to the file datasets.yml.
The code documentation should contain clear instructions on how to obtain the data.
A BibTeX file named
<dataset>.bibtex
defining the reference for the new dataset should be placed in the directoryesmvaltool/references/
, see Adding references for detailed instructions.
For more general information on writing documentation, see Documentation.
Testing#
When contributing a new script, add an entry for the CMORized data to recipes/examples/recipe_check_obs.yml and run the recipe, to make sure the CMOR checks pass without warnings or errors.
To test a pull request for a new CMORizer script:
Download the data following the instructions included in the script and place it in the
RAWOBS
rootpath
specified in your configurationIf available, use the downloading script by running
esmvaltool data download --config_file <config-file> <dataset>
Run the cmorization by running
esmvaltool data format <config-file> <dataset>
Copy the resulting data to the
OBS
(for CMIP5 compliant data) orOBS6
(for CMIP6 compliant data)rootpath
specified in your configurationRun
recipes/examples/recipe_check_obs.yml
with the new dataset to check that the data can be used
Scientific sanity check#
When contributing a new dataset, we expect that the numbers and units of the dataset look physically meaningful. The scientific reviewer needs to check this.
Data availability#
Once your pull request has been approved by the reviewers, ask a member of @OBS-maintainers to add the new dataset to the data pool at DKRZ and CEDA-Jasmin. This team is in charge of merging CMORizer pull requests.
Detailed checklist for reviews#
This (non-exhaustive) checklist provides ideas for things to check when reviewing pull requests for new or updated CMORizer scripts.
Dataset description#
Check that new dataset has been added to the table of observations defined in
the ESMValTool guide user’s guide in section Obtaining input data
(generated from doc/sphinx/source/input.rst
).
Check that the new dataset has also been added to the file datasets.yml.
BibTeX info file#
Check that a BibTeX file, i.e. <dataset>.bibtex
defining the reference for
the new dataset has been created in esmvaltool/references/
.
recipe_check_obs.yml#
Check that new dataset has been added to the testing recipe
esmvaltool/recipes/examples/recipe_check_obs.yml
Downloader script#
If present, check that the new downloader script
esmvaltool/cmorizers/data/downloaders/datasets/<dataset>.py
meets standards.
This includes the following items:
Code quality checks
Code quality
No Codacy errors reported
CMORizer script#
Check that the new CMORizer script
esmvaltool/cmorizers/data/formatters/datasets/<dataset>.{py,ncl}
meets standards.
This includes the following items:
In-code documentation (header) contains
Download instructions
Reference(s)
Code quality checks
Code quality (e.g. no hardcoded pathnames)
No Codacy errors reported
Config file#
If present, check config file <dataset>.yml
in
esmvaltool/cmorizers/data/cmor_config/
for correctness.
Use yamllint
to check for syntax errors and common mistakes.
Run downloader script#
If available, make sure the downloader script is working by running
esmvaltool data download --config_file <config-file> <dataset>
Run CMORizer#
Make sure CMORizer is working by running
esmvaltool data format --config_file <config-file> <dataset>
Check output of CMORizer#
After successfully running the new CMORizer, check that:
Output contains (some) valid values (e.g. not only nan or zeros)
Metadata is defined properly
Run esmvaltool/recipes/examples/recipe_check_obs.yml
for new dataset.
RAW data#
Contact the team in charge of ESMValTool data pool (@OBS-maintainers) and request to copy RAW data to RAWOBS/Tier2 (Tier3).
CMORized data#
Contact the team in charge of ESMValTool data pool (@OBS-maintainers) and request to
Merge the pull request
Copy CMORized dataset to OBS/Tier2 (Tier3)
Set file access rights for new dataset