Making a new diagnostic or recipe

Getting started

Please discuss your idea for a new diagnostic or recipe with the development team before getting started, to avoid disappointment later. A good way to do this is to open an issue on GitHub. This is also a good way to get help.

Creating a recipe and diagnostic script(s)

First create a recipe in esmvaltool/recipes to define the input data your analysis script needs and optionally preprocessing and other settings. Also create a script in the esmvaltool/diag_scripts directory and make sure it is referenced from your recipe. The easiest way to do this is probably to copy the example recipe and diagnostic script and adjust those to your needs.

If you have no preferred programming language yet, Python 3 is highly recommended, because it is most well supported. However, NCL, R, and Julia scripts are also supported.

Good example recipes for the different languages are:

Good example diagnostics are:

Unfortunately not much documentation is available at this stage, so have a look at the other recipes and diagnostics for further inspiration.

Re-using existing code

Always make sure your code is or can be released under a license that is compatible with the Apache 2 license.

If you have existing code in a supported scripting language, you have two options for re-using it. If it is fairly mature and a large amount of code, the preferred way is to package and publish it on the official package repository for that language and add it as a dependency of ESMValTool. If it is just a few simple scripts or packaging is not possible (i.e. for NCL) you can simply copy and paste the source code into the esmvaltool/diag_scripts directory.

If you have existing code in a compiled language like C, C++, or Fortran that you want to re-use, the recommended way to proceed is to add Python bindings and publish the package on PyPI so it can be installed as a Python dependency. You can then call the functions it provides using a Python diagnostic.

Additional dependencies

Add any additional dependencies needed for the diagnostic script to setup.py, esmvaltool/install/R/r_requirements.txt or esmvaltool/install/Julia/Project.toml (depending on the language of your script) and also to package/meta.yaml for conda dependencies (includes Python and others, but not R/Julia). Also check that the license of the dependency you want to add and any of its dependencies are compatible with Apache 2.0 (link).

Recording provenance

When ESMValCore (the esmvaltool command) runs a recipe, it will first find all data and run the default preprocessor steps plus any additional preprocessing steps defined in the recipe. Next it will run the diagnostic script defined in the recipe and finally it will store provenance information. Provenance information is stored in the W3C PROV XML format and also plotted in an SVG file for human inspection. In addition to provenance information, a caption is also added to the plots. When contributing a diagnostic, please make sure it records the provenance, and that no warnings related to provenance are generated when running the recipe.

Provenance items provided by the recipe

For each diagnostic in the recipe, ESMValCore supports the following additional information:

  • realms a list of high-level modeling components

  • themes a list of themes

Please see the (installed version of the) file esmvaltool/config-references.yml for all available information on each item.

Provenance items provided by the diagnostic script

For each output file produced by the diagnostic script, ESMValCore supports the following additional information:

  • ancestors a list of input files used to create the plot.

  • caption a caption text for the plot

Note that the level of detail is limited, the only valid choices for ancestors are files produced by ancestor tasks.

It is also possible to add more information for the implemented diagnostics using the following items:

  • authors a list of authors

  • references a list of references, see Adding references below

  • projects a list of projects

  • domains a list of spatial coverage of the dataset

  • plot_types a list of plot types if the diagnostic created a plot, e.g. error bar

  • statistics a list of types of the statistic, e.g. anomaly

Arbitrarily named other items are also supported.

Please see the (installed version of the) file esmvaltool/config-references.yml for all available information on each item. In this file, the information is written in the form of key: value. Note that we add the keys to the diagnostics. The keys will automatically be replaced by their values in the final provenance records. For example, in the config-references.yml there is a category for types of the plots:

plot_types:
  errorbar: error bar plot

In the diagnostics, we add the key as: plot_types: [errorbar] It is also possible to add custom provenance information by adding items to each category in this file.

In order to communicate with the diagnostic script, two interfaces have been defined, which are described in the ESMValCore documentation. Note that for Python and NCL diagnostics much more convenient methods are available than directly reading and writing the interface files. For other languages these are not implemented (yet).

Depending on your preferred programming language for developing a diagnostic, see the instructions and examples below on how to add provenance information:

Recording provenance in a Python diagnostic script

Always use esmvaltool.diag_scripts.shared.run_diagnostic() at the end of your script:

if __name__ == '__main__':
  with run_diagnostic() as config:
      main(config)

And make use of a esmvaltool.diag_scripts.shared.ProvenanceLogger to log provenance:

with ProvenanceLogger(cfg) as provenance_logger:
      provenance_logger.log(diagnostic_file, provenance_record)

The diagnostic_file can be obtained using esmvaltool.diag_scripts.shared.get_diagnostic_filename.

The provenance_record is a dictionary of provenance items, for example:

provenance_record = {
      'ancestors': ancestor_files,
      'authors': [
          'andela_bouwe',
          'righi_mattia',
      ],
      'caption': caption,
      'domains': ['global'],
      'plot_types': ['zonal'],
      'references': [
          'acknow_project',
      ],
      'statistics': ['mean'],
    }

Have a look at the example Python diagnostic in esmvaltool/diag_scripts/examples/diagnostic.py for a complete example.

Recording provenance in an NCL diagnostic script

Always call the log_provenance procedure after plotting from your NCL diag_script:

log_provenance(nc-file,plot_file,caption,statistics,domain,plottype,authors,references,input-files)

For example:

log_provenance(ncdf_outfile, \
               map@outfile, \
               "Mean of variable: " + var0, \
               "mean", \
               "global", \
               "geo", \
               (/"righi_mattia", "gottschaldt_klaus-dirk"/), \
               (/"acknow_author"/), \
               metadata_att_as_array(info0, "filename"))

Have a look at the example NCL diagnostic in esmvaltool/diag_scripts/examples/diagnostic.ncl for a complete example.

Recording provenance in a Julia diagnostic script

The provenance information is written in a diagnostic_provenance.yml that will be located in run_dir. For example a provenance_record can be stored in a yaml file as:

provenance_file = string(run_dir, "/diagnostic_provenance.yml")

open(provenance_file, "w") do io
    JSON.print(io, provenance_records, 4)
end

The provenance_records can be defined as a dictionary of provenance items. For example:

provenance_records = Dict()

provenance_record = Dict(
    "ancestors" => [input_file],
    "authors" => ["vonhardenberg_jost", "arnone_enrico"],
    "caption" => "Example diagnostic in Julia",
    "domains" => ["global"],
    "projects" => ["crescendo", "c3s-magic"],
    "references" => ["zhang11wcc"],
    "statistics" => ["other"],
)

provenance_records[output_file] = provenance_record

Have a look at the example Julia diagnostic in esmvaltool/diag_scripts/examples/diagnostic.jl for a complete example.

Recording provenance in an R diagnostic script

The provenance information is written in a diagnostic_provenance.yml that will be located in run_dir. For example a provenance_record can be stored in a yaml file as:

provenance_file <- paste0(run_dir, "/", "diagnostic_provenance.yml")
write_yaml(provenance_records, provenance_file)

The provenance_records can be defined as a list of provenance items. For example:

provenance_records <- list()

provenance_record <- list(
  ancestors = input_filenames,
  authors = list("hunter_alasdair", "perez-zanon_nuria"),
  caption = title,
  projects = list("c3s-magic"),
  statistics = list("other"),
)

provenance_records[[output_file]] <- provenance_record

Adding references

Recipes and diagnostic scripts can include references. When a recipe is run, citation information is stored in BibTeX format. Follow the steps below to add a reference to a recipe (or a diagnostic):

  • make a tag that is representative of the reference entry. For example, righi15gmd shows the last name of the first author, year and journal abbreviation.

  • add the tag to the references section in the recipe (or the diagnostic).

  • make a BibTeX file for the reference entry. There are some online tools to convert a doi to BibTeX format like https://doi2bib.org/

  • rename the file to the tag, keep the .bibtex extension.

  • add the file to the folder esmvaltool/references.

Note: the references section in config-references.yaml has been replaced by the folder esmvaltool/references.