Configuration files¶
Overview¶
There are several configuration files in ESMValCore:
config-user.yml
: sets a number of user-specific options like desired graphical output format, root paths to data, etc.;config-developer.yml
: sets a number of standardized file-naming and paths to data formatting;
and one configuration file which is distributed with ESMValTool:
config-references.yml
: stores information on diagnostic and recipe authors and scientific journals references;
User configuration file¶
The config-user.yml
configuration file contains all the global level
information needed by ESMValTool. It can be reused as many times the user needs
to before changing any of the options stored in it. This file is essentially
the gateway between the user and the machine-specific instructions to
esmvaltool
. By default, esmvaltool looks for it in the home directory,
inside the .esmvaltool
folder.
Users can get a copy of this file with default values by running
esmvaltool config get-config-user --path=${TARGET_FOLDER}
If the option --path
is omitted, the file will be created in
${HOME}/.esmvaltool
The following shows the default settings from the config-user.yml
file
with explanations in a commented line above each option:
# Set the console log level debug, [info], warning, error
# for much more information printed to screen set log_level: debug
log_level: info
# Exit on warning (only for NCL diagnostic scripts)? true/[false]
exit_on_warning: false
# Plot file format? [png]/pdf/ps/eps/epsi
output_file_type: png
# Destination directory where all output will be written
# including log files and performance stats
output_dir: ./esmvaltool_output
# Auxiliary data directory (used for some additional datasets)
# this is where e.g. files can be downloaded to by a download
# script embedded in the diagnostic
auxiliary_data_dir: ./auxiliary_data
# Use netCDF compression true/[false]
compress_netcdf: false
# Save intermediary cubes in the preprocessor true/[false]
# set to true will save the output cube from each preprocessing step
# these files are numbered according to the preprocessing order
save_intermediary_cubes: false
# Remove the preproc dir if all fine
# if this option is set to "true", ALL preprocessor files will be removed
# CAUTION when using: if you need those files, set it to false
remove_preproc_dir: true
# Run at most this many tasks in parallel [null]/1/2/3/4/..
# Set to null to use the number of available CPUs.
# If you run out of memory, try setting max_parallel_tasks to 1 and check the
# amount of memory you need for that by inspecting the file
# run/resource_usage.txt in the output directory. Using the number there you
# can increase the number of parallel tasks again to a reasonable number for
# the amount of memory available in your system.
max_parallel_tasks: null
# Path to custom config-developer file, to customise project configurations.
# See config-developer.yml for an example. Set to None to use the default
config_developer_file: null
# Use a profiling tool for the diagnostic run [false]/true
# A profiler tells you which functions in your code take most time to run.
# For this purpose we use vprof, see below for notes
# Only available for Python diagnostics
profile_diagnostic: false
# Rootpaths to the data from different projects (lists are also possible)
rootpath:
CMIP5: [~/cmip5_inputpath1, ~/cmip5_inputpath2]
OBS: ~/obs_inputpath
default: ~/default_inputpath
# Directory structure for input data: [default]/BADC/DKRZ/ETHZ/etc
# See config-developer.yml for definitions.
drs:
CMIP5: default
There used to be a setting write_plots
and write_netcdf
in the config user file, but these have been deprecated since ESMValCore v2.2 and
will be removed in v2.4, because only some diagnostic scripts supported these settings.
For those diagnostic scripts that do support these settings, they can now be configured
in the diagnostic script section of the recipe.
# Auxiliary data directory (used for some additional datasets)
auxiliary_data_dir: ~/auxiliary_data
The auxiliary_data_dir
setting is the path to place any required
additional auxiliary data files. This is necessary because certain
Python toolkits, such as cartopy, will attempt to download data files at run
time, typically geographic data files such as coastlines or land surface maps.
This can fail if the machine does not have access to the wider internet. This
location allows the user to specify where to find such files if they can not be
downloaded at runtime.
Warning
This setting is not for model or observational datasets, rather it is for data files used in plotting such as coastline descriptions and so on.
The profile_diagnostic
setting triggers profiling of Python diagnostics,
this will tell you which functions in the diagnostic took most time to run.
For this purpose we use vprof.
For each diagnostic script in the recipe, the profiler writes a .json
file
that can be used to plot a
flame graph
of the profiling information by running
vprof --input-file esmvaltool_output/recipe_output/run/diagnostic/script/profile.json
Note that it is also possible to use vprof to understand other resources used while running the diagnostic, including execution time of different code blocks and memory usage.
A detailed explanation of the data finding-related sections of the
config-user.yml
(rootpath
and drs
) is presented in the
Data retrieval section. This section relates directly to the data
finding capabilities of ESMValTool and are very important to be understood by
the user.
Note
You can choose your config-user.yml
file at run time, so you could have several of
them available with different purposes. One for a formalised run, another for
debugging, etc. You can even provide any config user value as a run flag
--argument_name argument_value
Developer configuration file¶
Most users and diagnostic developers will not need to change this file, but it may be useful to understand its content. It will be installed along with ESMValCore and can also be viewed on GitHub: esmvalcore/config-developer.yml. This configuration file describes the file system structure and CMOR tables for several key projects (CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines (e.g. BADC, CP4CDS, DKRZ, ETHZ, SMHI, BSC), and for native output data for some models (IPSL, … see Configuring native models and observation data sets). CMIP data is stored as part of the Earth System Grid Federation (ESGF) and the standards for file naming and paths to files are set out by CMOR and DRS. For a detailed description of these standards and their adoption in ESMValCore, we refer the user to CMIP data section where we relate these standards to the data retrieval mechanism of the ESMValCore.
By default, esmvaltool looks for it in the home directory, inside the ‘.esmvaltool’ folder.
Users can get a copy of this file with default values by running
esmvaltool config get-config-developer --path=${TARGET_FOLDER}
If the option --path
is omitted, the file will be created in
`${HOME}/.esmvaltool
.
Note
Remember to change your config-user file if you want to use a custom config-developer.
Example of the CMIP6 project configuration:
CMIP6:
input_dir:
default: '/'
BADC: '{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{latestversion}'
DKRZ: '{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{latestversion}'
ETHZ: '{exp}/{mip}/{short_name}/{dataset}/{ensemble}/{grid}/'
input_file: '{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc'
output_file: '{project}_{dataset}_{mip}_{exp}_{ensemble}_{short_name}'
cmor_type: 'CMIP6'
cmor_strict: true
Input file paths¶
When looking for input files, the esmvaltool
command provided by
ESMValCore replaces the placeholders {item}
in
input_dir
and input_file
with the values supplied in the recipe.
ESMValCore will try to automatically fill in the values for institute, frequency,
and modeling_realm based on the information provided in the CMOR tables
and/or config-developer.yml
when reading the recipe. If this fails for some reason,
these values can be provided in the recipe too.
The data directory structure of the CMIP projects is set up differently at each site. As an example, the CMIP6 directory path on BADC would be:
'{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{latestversion}'
The resulting directory path would look something like this:
CMIP/MOHC/HadGEM3-GC31-LL/historical/r1i1p1f3/Omon/tos/gn/latest
Please, bear in mind that input_dirs
can also be a list for those cases in
which may be needed:
- '{exp}/{ensemble}/original/{mip}/{short_name}/{grid}/{latestversion}'
- '{exp}/{ensemble}/computed/{mip}/{short_name}/{grid}/{latestversion}'
In that case, the resultant directories will be:
historical/r1i1p1f3/original/Omon/tos/gn/latest
historical/r1i1p1f3/computed/Omon/tos/gn/latest
For a more in-depth description of how to configure ESMValCore so it can find your data please see CMIP data.
Preprocessor output files¶
The filename to use for preprocessed data is configured in a similar manner
using output_file
. Note that the extension .nc
(and if applicable,
a start and end time) will automatically be appended to the filename.
Project CMOR table configuration¶
ESMValCore comes bundled with several CMOR tables, which are stored in the directory esmvalcore/cmor/tables. These are copies of the tables available from PCMDI.
For every project
that can be used in the recipe, there are four settings
related to CMOR table settings available:
cmor_type
: can beCMIP5
if the CMOR table is in the same format as the CMIP5 table orCMIP6
if the table is in the same format as the CMIP6 table.cmor_strict
: if this is set tofalse
, the CMOR table will be extended with variables from theesmvalcore/cmor/tables/custom
directory and it is possible to use variables with amip
which is different from the MIP table in which they are defined.cmor_path
: path to the CMOR table. Relative paths are with respect to esmvalcore/cmor/tables. Defaults to the value provided incmor_type
written in lower case.cmor_default_table_prefix
: Prefix that needs to be added to themip
to get the name of the file containing themip
table. Defaults to the value provided incmor_type
.
Configuring native models and observation data sets¶
ESMValCore can be configured for handling native model output formats
and specific
observation data sets without preliminary reformatting. You can choose
to host this new data source either under a dedicated project or under
project native6
; when choosing the latter, such a configuration
involves the following steps:
allowing for ESMValTool to locate the data files:
entry
native6
ofconfig-developer.yml
should be complemented with sub-entries forinput_dir
andinput_file
that goes under a new key representing the data organization (such asMY_DATA_ORG
), and these sub-entries can use an arbitrary list of{placeholders}
. Example :native6: ... input_dir: default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}' MY_DATA_ORG: '{model}/{exp}/{simulation}/{version}/{type}' input_file: default: '*.nc' MY_DATA_ORG: '{simulation}_*.nc' ...if necessary, provide a so-called
extra facets file
which allows to cope e.g. with variable naming issues for finding files. See Extra Facets andthis example of such a file for IPSL-CM6
.ensuring that ESMValCore get the right metadata and data out of your data files: this is described in Fixing data
References configuration file¶
The esmvaltool/config-references.yml file contains the list of ESMValTool diagnostic and recipe authors, references and projects. Each author, project and reference referred to in the documentation section of a recipe needs to be in this file in the relevant section.
For instance, the recipe recipe_ocean_example.yml
file contains the
following documentation section:
documentation:
authors:
- demo_le
maintainer:
- demo_le
references:
- demora2018gmd
projects:
- ukesm
These four items here are named people, references and projects listed in the
config-references.yml
file.
Extra Facets¶
Sometimes it is useful to provide extra information for the loading of data, particularly in the case of native model data, or observational or other data, that generally follows the established standards, but is not part of the big supported projects like CMIP, CORDEX, obs4MIPs.
To support this, we provide the extra facets facilities. Facets are the key-value pairs described in Recipe section: datasets. Extra facets allows for the addition of more details per project, dataset, mip table, and variable name.
More precisely, one can provide this information in an extra yaml file, named {project}-something.yml, where {project} corresponds to the project as used by ESMValTool in Recipe section: datasets and “something” is arbitrary.
Format of the extra facets files¶
The extra facets are given in a yaml file, whose file name identifies the project. Inside the file there is a hierarchy of nested dictionaries with the following levels. At the top there is the dataset facet, followed by the mip table, and finally the short_name. The leaf dictionary placed here gives the extra facets that will be made available to data finder and the fix infrastructure. The following example illustrates the concept.
ERA5:
Amon:
tas: {source_var_name: "t2m", cds_var_name: "2m_temperature"}
Location of the extra facets files¶
Extra facets files can be placed in several different places. When we use them to support a particular use-case within the ESMValTool project, they will be provided in the sub-folder extra_facets inside the package esmvalcore._config. If they are used from the user side, they can be either placed in ~/.esmvaltool/extra_facets or in any other directory of the users choosing. In that case this directory must be added to the config-user.yml file under the extra_facets_dir setting, which can take a single directory or a list of directories.
The order in which the directories are searched is
The internal directory esmvalcore._config/extra_facets
The default user directory ~/.esmvaltool/extra_facets
The custom user directories in the order in which they are given in config-user.yml.
The extra facets files within each of these directories are processed in lexicographical order according to their file name.
In all cases it is allowed to supersede information from earlier files in later files. This makes it possible for the user to effectively override even internal default facets, for example to deal with local particularities in the data handling.
Use of extra facets¶
For extra facets to be useful, the information that they provide must be applied. There are fundamentally two places where this comes into play. One is the datafinder, the other are fixes.