Making a new diagnostic or recipe
Getting started
Please discuss your idea for a new diagnostic or recipe with the development team before getting started, to avoid disappointment later. A good way to do this is to open an issue on GitHub. This is also a good way to get help.
Creating a recipe and diagnostic script(s)
First create a recipe in esmvaltool/recipes to define the input data your analysis script needs and optionally preprocessing and other settings. Also create a script in the esmvaltool/diag_scripts directory and make sure it is referenced from your recipe. The easiest way to do this is probably to copy the example recipe and diagnostic script and adjust those to your needs.
If you have no preferred programming language yet, Python 3 is highly recommended, because it is most well supported. However, NCL, R, and Julia scripts are also supported.
Good example recipes for the different languages are:
Good example diagnostics are:
For an explanation of the recipe format, you might want to read about the ESMValTool recipe and have a look at the available preprocessor functions. For further inspiration, check out the already available recipes and diagnostics.
There is a directory esmvaltool/diag_scripts/shared for code that is shared by many diagnostics. This directory contains code for creating common plot types, generating output file names, selecting input data, and other commonly needed functions. See Shared diagnostic script code for the documentation of the shared Python code.
Re-using existing code
Always make sure your code is or can be released under a license that is compatible with the Apache 2.0 license.
If you have existing code in a supported scripting language, you have two options for re-using it. If it is fairly
mature and a large amount of code, the preferred way is to package and publish it on the
official package repository for that language and add it as a dependency of ESMValTool.
If it is just a few simple scripts or packaging is not possible (i.e. for NCL) you can simply copy
and paste the source code into the esmvaltool/diag_scripts
directory.
If you have existing code in a compiled language like C, C++, or Fortran that you want to re-use, the recommended way to proceed is to add Python bindings and publish the package on PyPI so it can be installed as a Python dependency. You can then call the functions it provides using a Python diagnostic.
Recipe and diagnostic documentation
This section describes how to document a recipe. For more general information on writing documentation, see Documentation.
On readthedocs
Recipes should have a page in the Recipes chapter which describes what the recipe/diagnostic calculates.
When adding a completely new recipe, please start by copying
doc/sphinx/source/recipes/recipe_template.rst.template
to a new file doc/sphinx/source/recipes/recipe_<name of diagnostic>.rst
and do not forget to add your recipe to the
index.
Fill all sections from the template:
Add a brief description of the method
Add references
Document recipe options for the diagnostic scripts
Fill in the list of variables required to run the recipe
Add example images
An example image for each type of plot produced by the recipe should be added to the documentation page to show the kind of output the recipe produces. The ‘.png’ files can be stored in a subdirectory specific for the recipe under doc/sphinx/source/recipes/figures and linked from the recipe documentation page. A resolution of 150 dpi is recommended for these image files, as this is high enough for the images to look good on the documentation webpage, but not so high that the files become large.
In the recipe
Fill in the documentation
section of the recipe as described in
Recipe section: documentation and add a description
to each
diagnostic entry.
Please note that the maintainer
entry is per se not necessary to run a
recipe, but mandatory for recipes within the ESMValTool repository (enforced by
a unit test).
If no maintainer is available, use the single entry unmaintained
.
When reviewing a recipe, check that these entries have been filled with
descriptive content.
In the diagnostic scripts
Functions implementing scientific formula should contain comments with references to the source paper(s) and formula number(s).
When reviewing diagnostic code, check that formulas are implemented according to the referenced paper(s) and/or other resources and that the computed numbers look as expected from literature.
Diagnostic output
Typically, diagnostic scripts create plots, but any other output such as e.g.
text files or tables is also possible.
Figures should be saved in the plot_dir
, either in both .pdf
and
.png
format (preferred), or
respect the output_file_type
specified in the
User configuration file.
Data should be saved in the work_dir
, preferably as a .nc
(NetCDF) file, following the
CF-Conventions as much as possible.
Have a look at the example scripts for how to
access the value of work_dir
, plot_dir
, and output_file_type
from
the diagnostic script code.
More information on the interface between ESMValCore and the diagnostic script
is available here and
the description of the Output may also help to understand this.
If a diagnostic script creates plots, it should save the data used to create those plots also to a NetCDF file. If at all possible, there will be one NetCDF file for each plot the diagnostic script creates. There are several reasons why it is useful to have the plotted data available in a NetCDF file:
for interactive visualization of the recipe on a website
for automated regression tests, e.g. checking that the numbers are still the same with newer versions of libraries
If the output data is prohibitively large, diagnostics authors can choose to
implement a write_netcdf: false
diagnostic script option, so writing the
NetCDF files can be disabled from the recipe.
When doing a scientific review, please check that the figures and data look as expected from the literature and that appropriate references have been added.
Recording provenance
When ESMValCore (the esmvaltool
command) runs a recipe,
it will first find all data and run the default preprocessor steps plus any
additional preprocessing steps defined in the recipe. Next it will run the diagnostic script defined in the recipe
and finally it will store provenance information. Provenance information is stored in the
W3C PROV XML format
and provided that the provenance tree is small, also plotted in an SVG file for
human inspection.
In addition to provenance information, a caption is also added to the plots.
When contributing a diagnostic, please make sure it records the provenance,
and that no warnings related to provenance are generated when running the recipe.
To allow the ESMValCore to keep track of provenance (e.g. which input files
were used to create what plots by the diagnostic script), it needs the
Information provided by the diagnostic script to ESMValCore.
Note
Provenance is recorded by the esmvaltool
command provided by the
ESMValCore package.
No *_provenance.xml
files will be generated when re-running just
the diagnostic script with the command that is displayed on the screen
during a recipe run, because that will only run the diagnostic script.
Provenance items provided by the recipe
Provenance tags can be added in several places in the recipe. The Recipe section: documentation section provides information about the entire recipe.
For each diagnostic in the recipe, ESMValCore supports the following additional information:
realms
a list of high-level modeling componentsthemes
a list of themes
Please see the (installed version of the) file esmvaltool/config-references.yml for all available information on each item.
Provenance items provided by the diagnostic script
For each output file produced by the diagnostic script, ESMValCore supports the following additional information:
ancestors
a list of input files used to create the plot.caption
a caption text for the plot
Note that the level of detail is limited, the only valid choices for ancestors
are files produced by
ancestor tasks.
It is also possible to add more information for the implemented diagnostics using the following items:
authors
a list of authorsreferences
a list of references, see Adding references belowprojects
a list of projectsdomains
a list of spatial coverage of the datasetplot_types
a list of plot types if the diagnostic created a plot, e.g. error barstatistics
a list of types of the statistic, e.g. anomaly
Arbitrarily named other items are also supported.
Please see the (installed version of the) file
esmvaltool/config-references.yml
for all available information on each item, see References configuration file for
an introduction.
In this file, the information is written in the form of key: value
.
Note that we add the keys to the diagnostics.
The keys will automatically be replaced by their values in the final provenance records.
For example, in the config-references.yml
there is a category for types of the plots:
plot_types:
errorbar: error bar plot
In the diagnostics, we add the key as:
plot_types: [errorbar]
It is also possible to add custom provenance information by adding items to each category in this file.
In order to communicate with the diagnostic script, two interfaces have been defined, which are described in the ESMValCore documentation. Note that for Python and NCL diagnostics much more convenient methods are available than directly reading and writing the interface files. For other languages these are not implemented (yet).
Depending on your preferred programming language for developing a diagnostic, see the instructions and examples below on how to add provenance information:
Recording provenance in a Python diagnostic script
Always use esmvaltool.diag_scripts.shared.run_diagnostic()
at the end of your script:
if __name__ == '__main__':
with run_diagnostic() as config:
main(config)
And make use of a esmvaltool.diag_scripts.shared.ProvenanceLogger
to log provenance:
with ProvenanceLogger(cfg) as provenance_logger:
provenance_logger.log(diagnostic_file, provenance_record)
The diagnostic_file
can be obtained using esmvaltool.diag_scripts.shared.get_diagnostic_filename
.
The provenance_record
is a dictionary of provenance items, for example:
provenance_record = {
'ancestors': ancestor_files,
'authors': [
'andela_bouwe',
'righi_mattia',
],
'caption': caption,
'domains': ['global'],
'plot_types': ['zonal'],
'references': [
'acknow_project',
],
'statistics': ['mean'],
}
Have a look at the example Python diagnostic in esmvaltool/diag_scripts/examples/diagnostic.py for a complete example.
Recording provenance in an NCL diagnostic script
Always call the log_provenance
procedure after plotting from your NCL diag_script:
log_provenance(nc-file,plot_file,caption,statistics,domain,plottype,authors,references,input-files)
For example:
log_provenance(ncdf_outfile, \
map@outfile, \
"Mean of variable: " + var0, \
"mean", \
"global", \
"geo", \
(/"righi_mattia", "gottschaldt_klaus-dirk"/), \
(/"acknow_author"/), \
metadata_att_as_array(info0, "filename"))
Have a look at the example NCL diagnostic in esmvaltool/diag_scripts/examples/diagnostic.ncl for a complete example.
Recording provenance in a Julia diagnostic script
The provenance information is written in a diagnostic_provenance.yml
that will be located in run_dir
.
For example a provenance_record
can be stored in a yaml file as:
provenance_file = string(run_dir, "/diagnostic_provenance.yml")
open(provenance_file, "w") do io
JSON.print(io, provenance_records, 4)
end
The provenance_records
can be defined as a dictionary of provenance items.
For example:
provenance_records = Dict()
provenance_record = Dict(
"ancestors" => [input_file],
"authors" => ["vonhardenberg_jost", "arnone_enrico"],
"caption" => "Example diagnostic in Julia",
"domains" => ["global"],
"projects" => ["crescendo", "c3s-magic"],
"references" => ["zhang11wcc"],
"statistics" => ["other"],
)
provenance_records[output_file] = provenance_record
Have a look at the example Julia diagnostic in esmvaltool/diag_scripts/examples/diagnostic.jl for a complete example.
Recording provenance in an R diagnostic script
The provenance information is written in a diagnostic_provenance.yml
that will be located in run_dir
.
For example a provenance_record
can be stored in a yaml file as:
provenance_file <- paste0(run_dir, "/", "diagnostic_provenance.yml")
write_yaml(provenance_records, provenance_file)
The provenance_records
can be defined as a list of provenance items.
For example:
provenance_records <- list()
provenance_record <- list(
ancestors = input_filenames,
authors = list("hunter_alasdair", "perez-zanon_nuria"),
caption = title,
projects = list("c3s-magic"),
statistics = list("other"),
)
provenance_records[[output_file]] <- provenance_record
Adding references
Recipes and diagnostic scripts can include references. When a recipe is run, citation information is stored in BibTeX format. Follow the steps below to add a reference to a recipe (or a diagnostic):
make a
tag
that is representative of the reference entry. For example,righi15gmd
shows the last name of the first author, year and journal abbreviation.add the
tag
to thereferences
section in the recipe (or the diagnostic script provenance, see recording-provenance).make a BibTeX file for the reference entry. There are some online tools to convert a doi to BibTeX format like https://doi2bib.org/
rename the file to the
tag
, keep the.bibtex
extension.add the file to the folder
esmvaltool/references
.
Note: the references
section in config-references.yaml
has been replaced by the folder esmvaltool/references
.
Testing recipes
To test a recipe, you can run it yourself on your local infrastructure or you
can ask the @esmvalbot to run it for you.
To request a run of recipe_xyz.yml
, write the following comment below a pull
request:
@esmvalbot Please run recipe_xyz.yml
Note that only members of the @ESMValGroup/esmvaltool-developmentteam can request runs. The memory of the @esmvalbot is limited to 16 GB and it only has access to data available at DKRZ.
When reviewing a pull request, at the very least check that a recipes runs without any modifications. For a more thorough check, you might want to try out different datasets or changing some settings if the diagnostic scripts support those. A simple tool is available for testing recipes with various settings.
Detailed checklist for reviews
This (non-exhaustive) checklist provides ideas for things to check when reviewing pull requests for new or updated recipes and/or diagnostic scripts.
Technical reviews
Documentation
Check that the scientific documentation of the new diagnostic has been added to the user’s guide:
A file
doc/sphinx/source/recipes/recipe_<diagnostic>.rst
existsNew documentation is included in
doc/sphinx/source/recipes/index.rst
Documentation follows template doc/sphinx/source/recipes/recipe_template.rst.template
Description of configuration options
Description of variables
Valid image files
Resolution of image files (~150 dpi is usually enough; file size should be kept small)
Recipe
Check yaml syntax (with yamllint
) and that new recipe contains:
Documentation: description, authors, maintainer, references, projects
Provenance tags: themes, realms
Diagnostic script
Check that the new diagnostic script(s) meet(s) standards. This includes the following items:
In-code documentation (comments, docstrings)
Code quality (e.g. no hardcoded pathnames)
No Codacy errors reported
Re-use of existing functions whenever possible
Provenance implemented
Run recipe
Make sure new diagnostic(s) is working by running the ESMValTool with the recipe.
Check output of diagnostic
After successfully running the new recipe, check that:
NetCDF output has been written
Output contains (some) valid values (e.g. not only nan or zeros)
Provenance information has been written
Check automated tests
Check for errors reported by automated tests
Codacy
CircleCI
Documentation build
Scientific reviews
Documentation added to user’s guide
Check that the scientific documentation of the new diagnostic
in doc/sphinx/source/recipes/recipe_<diagnostic>.rst
:
Meets scientific documentation standard and
Contains brief description of method
Contains references
Check for typos / broken text
Documentation is complete and written in an understandable language
References are complete
Recipe
Check that new recipe contains valid:
Documentation: description, references
Provenance tags: themes, realms
Diagnostic script
Check that the new diagnostic script(s) meet(s) scientific standards. This can include the following items:
Clear and understandable in-code documentation including brief description of diagnostic
References
Method / equations match reference(s) given
Run recipe
Make sure new diagnostic(s) is working by running the ESMValTool.
Check output of diagnostic
After successfully running the new recipe, check that:
Output contains (some) valid values (e.g. not only nan or zeros)
If applicable, check plots and compare with corresponding plots in the paper(s) cited