Making a new diagnostic or recipe

Getting started

Please discuss your idea for a new diagnostic or recipe with the development team before getting started, to avoid disappointment later. A good way to do this is to open an issue on GitHub. This is also a good way to get help.

Creating a recipe and diagnostic script(s)

First create a recipe in esmvaltool/recipes to define the input data your analysis script needs and optionally preprocessing and other settings. Also create a script in the esmvaltool/diag_scripts directory and make sure it is referenced from your recipe. The easiest way to do this is probably to copy the example recipe and diagnostic script and adjust those to your needs.

If you have no preferred programming language yet, Python 3 is highly recommended, because it is most well supported. However, NCL, R, and Julia scripts are also supported.

Good example recipes for the different languages are:

Good example diagnostics are:

For an explanation of the recipe format, you might want to read about the ESMValTool recipe and have a look at the available preprocessor functions. For further inspiration, check out the already available recipes and diagnostics.

There is a directory esmvaltool/diag_scripts/shared for code that is shared by many diagnostics. This directory contains code for creating common plot types, generating output file names, selecting input data, and other commonly needed functions. See Shared diagnostic script code for the documentation of the shared Python code.

Re-using existing code

Always make sure your code is or can be released under a license that is compatible with the Apache 2.0 license.

If you have existing code in a supported scripting language, you have two options for re-using it. If it is fairly mature and a large amount of code, the preferred way is to package and publish it on the official package repository for that language and add it as a dependency of ESMValTool. If it is just a few simple scripts or packaging is not possible (i.e. for NCL) you can simply copy and paste the source code into the esmvaltool/diag_scripts directory.

If you have existing code in a compiled language like C, C++, or Fortran that you want to re-use, the recommended way to proceed is to add Python bindings and publish the package on PyPI so it can be installed as a Python dependency. You can then call the functions it provides using a Python diagnostic.

Recipe and diagnostic documentation

This section describes how to document a recipe. For more general information on writing documentation, see Documentation.

On readthedocs

Recipes should have a page in the Recipes chapter which describes what the recipe/diagnostic calculates.

When adding a completely new recipe, please start by copying doc/sphinx/source/recipes/recipe_template.rst.template to a new file doc/sphinx/source/recipes/recipe_<name of diagnostic>.rst and do not forget to add your recipe to the index.

Fill all sections from the template:

Add a brief description of the method
Add references
Document recipe options for the diagnostic scripts
Fill in the list of variables required to run the recipe
Add example images

An example image for each type of plot produced by the recipe should be added to the documentation page to show the kind of output the recipe produces. The ‘.png’ files can be stored in a subdirectory specific for the recipe under doc/sphinx/source/recipes/figures and linked from the recipe documentation page. A resolution of 150 dpi is recommended for these image files, as this is high enough for the images to look good on the documentation webpage, but not so high that the files become large.

In the recipe

Fill in the documentation section of the recipe as described in Recipe section: documentation and add a description to each diagnostic entry. Please note that the maintainer entry is per se not necessary to run a recipe, but mandatory for recipes within the ESMValTool repository (enforced by a unit test). If no maintainer is available, use the single entry unmaintained. When reviewing a recipe, check that these entries have been filled with descriptive content.

In the diagnostic scripts

Functions implementing scientific formula should contain comments with references to the source paper(s) and formula number(s).

When reviewing diagnostic code, check that formulas are implemented according to the referenced paper(s) and/or other resources and that the computed numbers look as expected from literature.

Diagnostic output

Typically, diagnostic scripts create plots, but any other output such as e.g. text files or tables is also possible. Figures should be saved in the plot_dir, either in both .pdf and .png format (preferred), or respect the output_file_type specified in the User configuration file. Data should be saved in the work_dir, preferably as a .nc (NetCDF) file, following the CF-Conventions as much as possible.

Have a look at the example scripts for how to access the value of work_dir, plot_dir, and output_file_type from the diagnostic script code. More information on the interface between ESMValCore and the diagnostic script is available here and the description of the Output may also help to understand this.

If a diagnostic script creates plots, it should save the data used to create those plots also to a NetCDF file. If at all possible, there will be one NetCDF file for each plot the diagnostic script creates. There are several reasons why it is useful to have the plotted data available in a NetCDF file:

for interactive visualization of the recipe on a website
for automated regression tests, e.g. checking that the numbers are still the same with newer versions of libraries

If the output data is prohibitively large, diagnostics authors can choose to implement a write_netcdf: false diagnostic script option, so writing the NetCDF files can be disabled from the recipe.

When doing a scientific review, please check that the figures and data look as expected from the literature and that appropriate references have been added.

Recording provenance

When ESMValCore (the esmvaltool command) runs a recipe, it will first find all data and run the default preprocessor steps plus any additional preprocessing steps defined in the recipe. Next it will run the diagnostic script defined in the recipe and finally it will store provenance information. Provenance information is stored in the W3C PROV XML format and provided that the provenance tree is small, also plotted in an SVG file for human inspection. In addition to provenance information, a caption is also added to the plots. When contributing a diagnostic, please make sure it records the provenance, and that no warnings related to provenance are generated when running the recipe. To allow the ESMValCore to keep track of provenance (e.g. which input files were used to create what plots by the diagnostic script), it needs the Information provided by the diagnostic script to ESMValCore.

Note

Provenance is recorded by the esmvaltool command provided by the ESMValCore package. No *_provenance.xml files will be generated when re-running just the diagnostic script with the command that is displayed on the screen during a recipe run, because that will only run the diagnostic script.

Provenance items provided by the recipe

Provenance tags can be added in several places in the recipe. The Recipe section: documentation section provides information about the entire recipe.

For each diagnostic in the recipe, ESMValCore supports the following additional information:

realms a list of high-level modeling components
themes a list of themes

Please see the (installed version of the) file esmvaltool/config-references.yml for all available information on each item.

Provenance items provided by the diagnostic script

For each output file produced by the diagnostic script, ESMValCore supports the following additional information:

ancestors a list of input files used to create the plot.
caption a caption text for the plot

Note that the level of detail is limited, the only valid choices for ancestors are files produced by ancestor tasks.

It is also possible to add more information for the implemented diagnostics using the following items:

authors a list of authors
references a list of references, see Adding references below
projects a list of projects
domains a list of spatial coverage of the dataset
plot_types a list of plot types if the diagnostic created a plot, e.g. error bar
statistics a list of types of the statistic, e.g. anomaly

Arbitrarily named other items are also supported.

Please see the (installed version of the) file esmvaltool/config-references.yml for all available information on each item, see References configuration file for an introduction. In this file, the information is written in the form of key: value. Note that we add the keys to the diagnostics. The keys will automatically be replaced by their values in the final provenance records. For example, in the config-references.yml there is a category for types of the plots:

plot_types:
  errorbar: error bar plot

In the diagnostics, we add the key as: plot_types: [errorbar] It is also possible to add custom provenance information by adding items to each category in this file.

In order to communicate with the diagnostic script, two interfaces have been defined, which are described in the ESMValCore documentation. Note that for Python and NCL diagnostics much more convenient methods are available than directly reading and writing the interface files. For other languages these are not implemented (yet).

Depending on your preferred programming language for developing a diagnostic, see the instructions and examples below on how to add provenance information:

Recording provenance in a Python diagnostic script

Always use esmvaltool.diag_scripts.shared.run_diagnostic() at the end of your script:

if __name__ == '__main__':
  with run_diagnostic() as config:
      main(config)

And make use of a esmvaltool.diag_scripts.shared.ProvenanceLogger to log provenance:

with ProvenanceLogger(cfg) as provenance_logger:
      provenance_logger.log(diagnostic_file, provenance_record)

The diagnostic_file can be obtained using esmvaltool.diag_scripts.shared.get_diagnostic_filename.

The provenance_record is a dictionary of provenance items, for example:

provenance_record = {
      'ancestors': ancestor_files,
      'authors': [
          'andela_bouwe',
          'righi_mattia',
      ],
      'caption': caption,
      'domains': ['global'],
      'plot_types': ['zonal'],
      'references': [
          'acknow_project',
      ],
      'statistics': ['mean'],
    }

Have a look at the example Python diagnostic in esmvaltool/diag_scripts/examples/diagnostic.py for a complete example.

Recording provenance in an NCL diagnostic script

Always call the log_provenance procedure after plotting from your NCL diag_script:

log_provenance(nc-file,plot_file,caption,statistics,domain,plottype,authors,references,input-files)

For example:

log_provenance(ncdf_outfile, \
               map@outfile, \
               "Mean of variable: " + var0, \
               "mean", \
               "global", \
               "geo", \
               (/"righi_mattia", "gottschaldt_klaus-dirk"/), \
               (/"acknow_author"/), \
               metadata_att_as_array(info0, "filename"))

Have a look at the example NCL diagnostic in esmvaltool/diag_scripts/examples/diagnostic.ncl for a complete example.

Recording provenance in a Julia diagnostic script

The provenance information is written in a diagnostic_provenance.yml that will be located in run_dir. For example a provenance_record can be stored in a yaml file as:

provenance_file = string(run_dir, "/diagnostic_provenance.yml")

open(provenance_file, "w") do io
    JSON.print(io, provenance_records, 4)
end

The provenance_records can be defined as a dictionary of provenance items. For example:

provenance_records = Dict()

provenance_record = Dict(
    "ancestors" => [input_file],
    "authors" => ["vonhardenberg_jost", "arnone_enrico"],
    "caption" => "Example diagnostic in Julia",
    "domains" => ["global"],
    "projects" => ["crescendo", "c3s-magic"],
    "references" => ["zhang11wcc"],
    "statistics" => ["other"],
)

provenance_records[output_file] = provenance_record

Have a look at the example Julia diagnostic in esmvaltool/diag_scripts/examples/diagnostic.jl for a complete example.

Recording provenance in an R diagnostic script

The provenance information is written in a diagnostic_provenance.yml that will be located in run_dir. For example a provenance_record can be stored in a yaml file as:

provenance_file <- paste0(run_dir, "/", "diagnostic_provenance.yml")
write_yaml(provenance_records, provenance_file)

The provenance_records can be defined as a list of provenance items. For example:

provenance_records <- list()

provenance_record <- list(
  ancestors = input_filenames,
  authors = list("hunter_alasdair", "perez-zanon_nuria"),
  caption = title,
  projects = list("c3s-magic"),
  statistics = list("other"),
)

provenance_records[[output_file]] <- provenance_record

Adding references

Recipes and diagnostic scripts can include references. When a recipe is run, citation information is stored in BibTeX format. Follow the steps below to add a reference to a recipe (or a diagnostic):

make a tag that is representative of the reference entry. For example, righi15gmd shows the last name of the first author, year and journal abbreviation.
add the tag to the references section in the recipe (or the diagnostic script provenance, see recording-provenance).
make a BibTeX file for the reference entry. There are some online tools to convert a doi to BibTeX format like https://doi2bib.org/
rename the file to the tag, keep the .bibtex extension.
add the file to the folder esmvaltool/references.

Note: the references section in config-references.yaml has been replaced by the folder esmvaltool/references.

Testing recipes

To test a recipe, you can run it yourself on your local infrastructure or you can ask the @esmvalbot to run it for you. To request a run of recipe_xyz.yml, write the following comment below a pull request:

@esmvalbot Please run recipe_xyz.yml

Note that only members of the @ESMValGroup/esmvaltool-developmentteam can request runs. The memory of the @esmvalbot is limited to 16 GB and it only has access to data available at DKRZ.

When reviewing a pull request, at the very least check that a recipes runs without any modifications. For a more thorough check, you might want to try out different datasets or changing some settings if the diagnostic scripts support those. A simple tool is available for testing recipes with various settings.

Detailed checklist for reviews

This (non-exhaustive) checklist provides ideas for things to check when reviewing pull requests for new or updated recipes and/or diagnostic scripts.

Technical reviews

Documentation

Check that the scientific documentation of the new diagnostic has been added to the user’s guide:

A file doc/sphinx/source/recipes/recipe_<diagnostic>.rst exists
New documentation is included in doc/sphinx/source/recipes/index.rst
Documentation follows template doc/sphinx/source/recipes/recipe_template.rst.template
Description of configuration options
Description of variables
Valid image files
Resolution of image files (~150 dpi is usually enough; file size should be kept small)

Recipe

Check yaml syntax (with yamllint) and that new recipe contains:

Documentation: description, authors, maintainer, references, projects
Provenance tags: themes, realms

Diagnostic script

Check that the new diagnostic script(s) meet(s) standards. This includes the following items:

In-code documentation (comments, docstrings)
Code quality (e.g. no hardcoded pathnames)
No Codacy errors reported
Re-use of existing functions whenever possible
Provenance implemented

Run recipe

Make sure new diagnostic(s) is working by running the ESMValTool with the recipe.

Check output of diagnostic

After successfully running the new recipe, check that:

NetCDF output has been written
Output contains (some) valid values (e.g. not only nan or zeros)
Provenance information has been written

Check automated tests

Check for errors reported by automated tests

Codacy
CircleCI
Documentation build

Scientific reviews

Documentation added to user’s guide

Check that the scientific documentation of the new diagnostic in doc/sphinx/source/recipes/recipe_<diagnostic>.rst:

Meets scientific documentation standard and
Contains brief description of method
Contains references
Check for typos / broken text
Documentation is complete and written in an understandable language
References are complete

Recipe

Check that new recipe contains valid:

Documentation: description, references
Provenance tags: themes, realms

Diagnostic script

Check that the new diagnostic script(s) meet(s) scientific standards. This can include the following items:

Clear and understandable in-code documentation including brief description of diagnostic
References
Method / equations match reference(s) given

Run recipe

Make sure new diagnostic(s) is working by running the ESMValTool.

Check output of diagnostic

After successfully running the new recipe, check that:

Output contains (some) valid values (e.g. not only nan or zeros)
If applicable, check plots and compare with corresponding plots in the paper(s) cited