Configuration#
Overview#
Similar to Dask, ESMValCore provides one single configuration object that consists of a single nested dictionary for its configuration.
Note
In v2.12.0, a redesign process of ESMValTool/Core’s configuration started. Its main aim is to simplify the configuration by moving from many different configuration files for individual components to one configuration object that consists of a single nested dictionary (similar to Dask’s configuration). This change will not be implemented in one large pull request but rather in a step-by-step procedure. Thus, the configuration might appear inconsistent until this redesign is finished. A detailed plan for this new configuration is outlined in Issue #2371.
Specify configuration for esmvaltool
command line tool#
When running recipes via the command line, configuration options can be specified via YAML files and command line arguments.
YAML files#
Configuration options can be specified via YAML files
(i.e., *.yaml
and *.yml
).
A file could look like this (for example, located at
~/.config/esmvaltool/config.yml
):
output_dir: ~/esmvaltool_output
search_esgf: when_missing
download_dir: ~/downloaded_data
These files can live in any of the following locations:
The directory specified via the
--config_dir
command line argument.The user configuration directory: by default
~/.config/esmvaltool
, but this can be changed with theESMVALTOOL_CONFIG_DIR
environment variable. If~/.config/esmvaltool
does not exist, this will be silently ignored.
ESMValCore searches for all YAML files within each of these directories and
merges them together using dask.config.collect()
.
This properly considers nested objects; see dask.config.update()
for
details.
Preference follows the order in the list above (i.e., the directory specified
via command line argument is preferred over the user configuration directory).
Within a directory, files are sorted alphabetically, and later files (e.g.,
z.yml
) will take precedence over earlier files (e.g., a.yml
).
Warning
ESMValCore will read all YAML files in these configuration directories.
Thus, other YAML files in this directory which are not valid configuration
files (like the old config-developer.yml
files) will lead to errors.
Make sure to move these files to a different directory.
To get a copy of the default configuration file, you can run
esmvaltool config get_config_user --path=/target/file.yml
If the option --path
is omitted, the file will be copied to
~/.config/esmvaltool/config-user.yml
.
Command line arguments#
All configuration options can also be given as command
line arguments to the esmvaltool
executable.
Example:
esmvaltool run --search_esgf=when_missing --max_parallel_tasks=2 /path/to/recipe.yml
Options given via command line arguments will always take precedence over options specified via YAML files.
Specify/access configuration for Python API#
When running recipes with the experimental Python API, configuration options can be specified and accessed via
the CFG
object.
For example:
>>> from esmvalcore.config import CFG
>>> CFG['output_dir'] = '~/esmvaltool_output'
>>> CFG['output_dir']
PosixPath('/home/user/esmvaltool_output')
This will also consider YAML configuration files in the user configuration
directory (by default ~/.config/esmvaltool
, but this can be changed with
the ESMVALTOOL_CONFIG_DIR
environment variable).
More information about this can be found here.
Top level configuration options#
Note: the following entries use Python syntax.
For example, Python’s None
is YAML’s null
, Python’s True
is YAML’s
true
, and Python’s False
is YAML’s false
.
Option |
Description |
Type |
Default value |
---|---|---|---|
|
Directory where auxiliary data is stored. [1] |
|
|
|
Sensitivity of the CMOR check
( |
|
|
|
Use netCDF compression. |
|
|
|
Path to custom Developer configuration file. |
|
|
|
See Default options |
||
|
Only run the selected diagnostics from the recipe, see Running. |
|
|
|
Directory where downloaded data will be stored. [4] |
|
|
|
Directory structure for input data. [2] |
|
|
|
Exit on warning (only used in NCL diagnostic scripts). |
|
|
|
Additional custom directory for Extra Facets. |
|
|
|
Log level of the console ( |
|
|
|
|||
|
Maximum number of datasets to use, see Running. |
|
|
|
Maximum number of parallel processes, see Task priority. [5] |
|
|
|
Maximum number of years to use, see Running. |
|
|
|
Directory where all output will be written, see Output. |
|
|
|
Plot file type. |
|
|
|
Use a profiling tool for the diagnostic run. [3] |
|
|
|
Remove the |
|
|
|
Resume previous run(s) by using preprocessor output files from these output directories, see ref:running. |
|
|
|
Rootpaths to the data from different projects. [2] |
|
|
|
Run diagnostic scripts, see Running. |
|
|
|
Save intermediary cubes from the preprocessor, see also Preprocessed datasets. |
|
|
|
Automatic data download from ESGF
( |
|
|
|
Skip non-existent datasets, see Running. |
|
Dask configuration#
Configure Dask in the dask
section.
The preprocessor functions and many of the
Python diagnostics in ESMValTool make use of the
Iris library to work with the data.
In Iris, data can be either real or lazy.
Lazy data is represented by dask arrays.
Dask arrays consist of many small
numpy arrays
(called chunks) and if possible, computations are run on those small arrays in
parallel.
In order to figure out what needs to be computed when, Dask makes use of a
‘scheduler’.
The default (thread-based) scheduler in Dask is rather basic, so it can only
run on a single computer and it may not always find the optimal task scheduling
solution, resulting in excessive memory use when using e.g. the
esmvalcore.preprocessor.multi_model_statistics()
preprocessor function.
Therefore it is recommended that you take a moment to configure the
Dask distributed scheduler.
A Dask scheduler and the ‘workers’ running the actual computations, are
collectively called a ‘Dask cluster’.
Dask profiles#
Because some recipes require more computational resources than others, ESMValCore provides the option to define “Dask profiles”. These profiles can be used to update the Dask user configuration per recipe run. The Dask profile can be selected in a YAML configuration file via
dask:
use: <NAME_OF_PROFILE>
or alternatively in the command line via
esmvaltool run --dask='{"use": "<NAME_OF_PROFILE>"}' recipe_example.yml
Available predefined Dask profiles:
local_threaded
(selected by default): use threaded scheduler without any further options.local_distributed
: use local distributed scheduler without any further options.debug
: use synchronous Dask scheduler for debugging purposes. Best used withmax_parallel_tasks: 1
.
Dask distributed scheduler configuration#
Here, some examples are provided on how to use a custom Dask distributed scheduler. Extensive documentation on setting up Dask Clusters is available here.
Note
If not all preprocessor functions support lazy data, computational performance may be best with the threaded scheduler. See Issue #674 for progress on making all preprocessor functions lazy.
Personal computer
Create a distributed.LocalCluster
on the computer running ESMValCore
using all available resources:
dask:
use: local_cluster # use "local_cluster" defined below
profiles:
local_cluster:
cluster:
type: distributed.LocalCluster
This should work well for most personal computers.
Note
If running this configuration on a shared node of an HPC cluster, Dask will try and use as many resources it can find available, and this may lead to overcrowding the node by a single user (you)!
Shared computer
Create a distributed.LocalCluster
on the computer running ESMValCore,
with 2 workers with 2 threads/4 GiB of memory each (8 GiB in total):
dask:
use: local_cluster # use "local_cluster" defined below
profiles:
local_cluster:
cluster:
type: distributed.LocalCluster
n_workers: 2
threads_per_worker: 2
memory_limit: 4GiB
this should work well for shared computers.
Computer cluster
Create a Dask distributed cluster on the Levante supercomputer using the Dask-Jobqueue package:
dask:
use: slurm_cluster # use "slurm_cluster" defined below
profiles:
slurm_cluster:
cluster:
type: dask_jobqueue.SLURMCluster
queue: shared
account: <YOUR_SLURM_ACCOUNT>
cores: 8
memory: 7680MiB
processes: 2
interface: ib0
local_directory: "/scratch/b/<YOUR_DKRZ_ACCOUNT>/dask-tmp"
n_workers: 24
This will start 24 workers with cores / processes = 4
threads each,
resulting in n_workers / processes = 12
Slurm jobs, where each Slurm job
will request 8 CPU cores and 7680 MiB of memory and start processes = 2
workers.
This example will use the fast infiniband network connection (called ib0
on Levante) for communication between workers running on different nodes.
It is important to set the right location for temporary storage, in this
case the /scratch
space is used.
It is also possible to use environmental variables to configure the temporary
storage location, if you cluster provides these.
A configuration like this should work well for larger computations where it is advantageous to use multiple nodes in a compute cluster. See Deploying Dask Clusters on High Performance Computers for more information.
Externally managed Dask cluster
To use an externally managed cluster, specify an scheduler_address
for the
selected profile.
Such a cluster can e.g. be started using the Dask Jupyterlab extension:
dask:
use: external # Use the `external` profile defined below
profiles:
external:
scheduler_address: "tcp://127.0.0.1:43605"
See here for an example of how to configure this on a remote system.
For debugging purposes, it can be useful to start the cluster outside of ESMValCore because then Dask dashboard remains available after ESMValCore has finished running.
Advice on choosing performant configurations
The threads within a single worker can access the same memory locations, so they may freely pass around chunks, while communicating a chunk between workers is done by copying it, so this is (a bit) slower. Therefore it is beneficial for performance to have multiple threads per worker. However, due to limitations in the CPython implementation (known as the Global Interpreter Lock or GIL), only a single thread in a worker can execute Python code (this limitation does not apply to compiled code called by Python code, e.g. numpy), therefore the best performing configurations will typically not use much more than 10 threads per worker.
Due to limitations of the NetCDF library (it is not thread-safe), only one of the threads in a worker can read or write to a NetCDF file at a time. Therefore, it may be beneficial to use fewer threads per worker if the computation is very simple and the runtime is determined by the speed with which the data can be read from and/or written to disk.
Custom Dask threaded scheduler configuration#
The Dask threaded scheduler can be a good choice for recipes using a small amount of data or when running a recipe where not all preprocessor functions are lazy yet (see Issue #674 for the current status).
To avoid running out of memory, it is important to set the number of workers (threads) used by Dask to run its computations to a reasonable number. By default, the number of CPU cores in the machine will be used, but this may be too many on shared machines or laptops with a large number of CPU cores compared to the amount of memory they have available.
Typically, Dask requires about 2 GiB of RAM per worker, but this may be more depending on the computation.
To set the number of workers used by the Dask threaded scheduler, use the following configuration:
dask:
use: local_threaded # This can be omitted
profiles:
local_threaded:
num_workers: 4
Default options#
By default, the following Dask configuration is used:
dask:
use: local_threaded # use the `local_threaded` profile defined below
profiles:
local_threaded:
scheduler: threads
local_distributed:
cluster:
type: distributed.LocalCluster
debug:
scheduler: synchronous
All available options#
Option |
Description |
Type |
Default value |
---|---|---|---|
|
Different Dask profiles that can be
selected via the |
See Default options |
|
|
Dask profile that is used; must be
defined in the option |
|
Options for Dask profiles#
Option |
Description |
Type |
Default value |
---|---|---|---|
|
Keyword arguments to initialize a Dask
distributed cluster. Needs the option
|
If omitted, use externally managed
cluster if |
|
|
Scheduler address of an externally
managed cluster. Will be passed to
|
If omitted, use a Dask distributed
cluster if |
|
All other options |
Passed as keyword arguments to
|
Any |
No defaults. |
Logging configuration#
Configure what information is logged and how it is presented in the logging
section.
Note
Not all logging configuration is available here yet, see Issue #2596.
Configuration file example:
logging:
log_progress_interval: 10s
will log progress of Dask computations every 10 seconds instead of showing a progress bar.
Command line example:
esmvaltool run --logging='{"log_progress_interval": "1m"}' recipe_example.yml
will log progress of Dask computations every minute instead of showing a progress bar.
Available options:
Option |
Description |
Type |
Default value |
---|---|---|---|
|
When running computations with Dask,
log progress every
|
0 |
ESGF configuration#
The esmvaltool run
command can automatically download the files required
to run a recipe from ESGF for the projects CMIP3, CMIP5, CMIP6, CORDEX, and obs4MIPs.
The downloaded files will be stored in the directory specified via the
configuration option download_dir
.
To enable automatic downloads from ESGF, use the configuration options search_esgf: when_missing
or search_esgf: always
.
Note
When running a recipe that uses many or large datasets on a machine that does not have any data available locally, the amount of data that will be downloaded can be in the range of a few hundred gigabyte to a few terrabyte. See Obtaining input data for advice on getting access to machines with large datasets already available.
A log message will be displayed with the total amount of data that will
be downloaded before starting the download.
If you see that this is more than you would like to download, stop the
tool by pressing the Ctrl
and C
keys on your keyboard simultaneously
several times, edit the recipe so it contains fewer datasets and try again.
Configuration file#
An optional configuration file can be created for configuring how the tool uses
esgf-pyclient
to find and download data.
The name of this file is ~/.esmvaltool/esgf-pyclient.yml
.
Search#
Any arguments to pyesgf.search.connection.SearchConnection
can
be provided in the section search_connection
, for example:
search_connection:
expire_after: 2592000 # the number of seconds in a month
to keep cached search results for a month.
The default settings are:
urls:
- 'https://esgf.ceda.ac.uk/esg-search'
- 'https://esgf-node.llnl.gov/esg-search'
- 'https://esgf-data.dkrz.de/esg-search'
- 'https://esgf-node.ipsl.upmc.fr/esg-search'
- 'https://esg-dn1.nsc.liu.se/esg-search'
- 'https://esgf.nci.org.au/esg-search'
- 'https://esgf.nccs.nasa.gov/esg-search'
- 'https://esgdata.gfdl.noaa.gov/esg-search'
distrib: true
timeout: 120 # seconds
cache: '~/.esmvaltool/cache/pyesgf-search-results'
expire_after: 86400 # cache expires after 1 day
Note that by default the tool will try the ESGF index nodes in the order provided in the configuration file and use the first one that is online. Some ESGF index nodes may return search results faster than others, so you may be able to speed up the search for files by experimenting with placing different index nodes at the top of the list.
If you experience errors while searching, it sometimes helps to delete the cached results.
Download statistics#
The tool will maintain statistics of how fast data can be downloaded from what host in the file ~/.esmvaltool/cache/esgf-hosts.yml and automatically select hosts that are faster. There is no need to manually edit this file, though it can be useful to delete it if you move your computer to a location that is very different from the place where you previously downloaded data. An entry in the file might look like this:
esgf2.dkrz.de:
duration (s): 8
error: false
size (bytes): 69067460
speed (MB/s): 7.9
The tool only uses the duration and size to determine the download speed,
the speed shown in the file is not used.
If error
is set to true
, the most recent download request to that
host failed and the tool will automatically try this host only as a last
resort.
Developer configuration file#
Most users and diagnostic developers will not need to change this file, but it may be useful to understand its content. It will be installed along with ESMValCore and can also be viewed on GitHub: esmvalcore/config-developer.yml. This configuration file describes the file system structure and CMOR tables for several key projects (CMIP6, CMIP5, obs4MIPs, OBS6, OBS) on several key machines (e.g. BADC, CP4CDS, DKRZ, ETHZ, SMHI, BSC), and for native output data for some models (ICON, IPSL, … see Configuring datasets in native format). CMIP data is stored as part of the Earth System Grid Federation (ESGF) and the standards for file naming and paths to files are set out by CMOR and DRS. For a detailed description of these standards and their adoption in ESMValCore, we refer the user to CMIP data section where we relate these standards to the data retrieval mechanism of the ESMValCore.
Users can get a copy of this file with default values by running
esmvaltool config get_config_developer --path=${TARGET_FOLDER}
If the option --path
is omitted, the file will be created in
~/.esmvaltool
.
Note
Remember to change the configuration option config_developer_file
if you
want to use a custom config developer file.
Warning
For now, make sure that the custom config-developer.yml
is not saved
in the ESMValTool/Core configuration directories (see
YAML files for details).
This will change in the future due to the redesign of ESMValTool/Core’s
configuration.
Example of the CMIP6 project configuration:
CMIP6:
input_dir:
default: '/'
BADC: '{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}'
DKRZ: '{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}'
ETHZ: '{exp}/{mip}/{short_name}/{dataset}/{ensemble}/{grid}/'
input_file: '{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc'
output_file: '{project}_{dataset}_{mip}_{exp}_{ensemble}_{short_name}'
cmor_type: 'CMIP6'
cmor_strict: true
Input file paths#
When looking for input files, the esmvaltool
command provided by
ESMValCore replaces the placeholders {item}
in
input_dir
and input_file
with the values supplied in the recipe.
ESMValCore will try to automatically fill in the values for institute, frequency,
and modeling_realm based on the information provided in the CMOR tables
and/or extra_facets when reading the recipe.
If this fails for some reason, these values can be provided in the recipe too.
The data directory structure of the CMIP projects is set up differently at each site. As an example, the CMIP6 directory path on BADC would be:
'{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}'
The resulting directory path would look something like this:
CMIP/MOHC/HadGEM3-GC31-LL/historical/r1i1p1f3/Omon/tos/gn/latest
Please, bear in mind that input_dirs
can also be a list for those cases in
which may be needed:
- '{exp}/{ensemble}/original/{mip}/{short_name}/{grid}/{version}'
- '{exp}/{ensemble}/computed/{mip}/{short_name}/{grid}/{version}'
In that case, the resultant directories will be:
historical/r1i1p1f3/original/Omon/tos/gn/latest
historical/r1i1p1f3/computed/Omon/tos/gn/latest
For a more in-depth description of how to configure ESMValCore so it can find your data please see CMIP data.
Preprocessor output files#
The filename to use for preprocessed data is configured in a similar manner
using output_file
. Note that the extension .nc
(and if applicable,
a start and end time) will automatically be appended to the filename.
Project CMOR table configuration#
ESMValCore comes bundled with several CMOR tables, which are stored in the directory esmvalcore/cmor/tables. These are copies of the tables available from PCMDI.
For every project
that can be used in the recipe, there are four settings
related to CMOR table settings available:
cmor_type
: can beCMIP5
if the CMOR table is in the same format as the CMIP5 table orCMIP6
if the table is in the same format as the CMIP6 table.cmor_strict
: if this is set tofalse
, the CMOR table will be extended with variables from the Custom CMOR tables (by default loaded from theesmvalcore/cmor/tables/custom
directory) and it is possible to use variables with amip
which is different from the MIP table in which they are defined. Note that this option is always enabled for derived variables.cmor_path
: path to the CMOR table. Relative paths are with respect to esmvalcore/cmor/tables. Defaults to the value provided incmor_type
written in lower case.cmor_default_table_prefix
: Prefix that needs to be added to themip
to get the name of the file containing themip
table. Defaults to the value provided incmor_type
.
Custom CMOR tables#
As mentioned in the previous section, the CMOR tables of projects that use
cmor_strict: false
will be extended with custom CMOR tables.
For derived variables (the ones with derive: true
in the recipe), the
custom CMOR tables will always be considered.
By default, these custom tables are loaded from esmvalcore/cmor/tables/custom.
However, by using the special project custom
in the
config-developer.yml
file with the option cmor_path
, a custom location
for these custom CMOR tables can be specified.
In this case, the default custom tables are extended with those entries from
the custom location (in case of duplication, the custom location tables take
precedence).
Example:
custom:
cmor_path: ~/my/own/custom_tables
This path can be given as relative path (relative to esmvalcore/cmor/tables) or as absolute path. Other options given for this special table will be ignored.
Custom tables in this directory need to follow the naming convention
CMOR_{short_name}.dat
and need to be given in CMIP5 format.
Example for the file CMOR_asr.dat
:
SOURCE: CMIP5
!============
variable_entry: asr
!============
modeling_realm: atmos
!----------------------------------
! Variable attributes:
!----------------------------------
standard_name:
units: W m-2
cell_methods: time: mean
cell_measures: area: areacella
long_name: Absorbed shortwave radiation
!----------------------------------
! Additional variable information:
!----------------------------------
dimensions: longitude latitude time
type: real
positive: down
!----------------------------------
!
It is also possible to use a special coordinates file CMOR_coordinates.dat
,
which will extend the entries from the default one
(esmvalcore/cmor/tables/custom/CMOR_coordinates.dat).
Filter preprocessor warnings#
It is possible to ignore specific warnings of the preprocessor for a given
project
.
This is particularly useful for native datasets which do not follow the CMOR
standard by default and consequently produce a lot of warnings when handled by
Iris.
This can be configured in the config-developer.yml
file for some steps of
the preprocessing chain.
Currently supported preprocessor steps:
Here is an example on how to ignore specific warnings during the preprocessor
step load
for all datasets of project EMAC
(taken from the default
config-developer.yml
file):
ignore_warnings:
load:
- {message: 'Missing CF-netCDF formula term variable .*, referenced by netCDF variable .*', module: iris}
- {message: 'Ignored formula of unrecognised type: .*', module: iris}
The keyword arguments specified in the list items are directly passed to
warnings.filterwarnings()
in addition to action=ignore
(may be
overwritten in config-developer.yml
).
Configuring datasets in native format#
ESMValCore can be configured for handling native model output formats and
specific reanalysis/observation datasets without preliminary reformatting.
These datasets can be either hosted under the native6
project (mostly
native reanalysis/observational datasets) or under a dedicated project, e.g.,
ICON
(mostly native models).
Example:
native6:
cmor_strict: false
input_dir:
default: 'Tier{tier}/{dataset}/{version}/{frequency}/{short_name}'
input_file:
default: '*.nc'
output_file: '{project}_{dataset}_{type}_{version}_{mip}_{short_name}'
cmor_type: 'CMIP6'
cmor_default_table_prefix: 'CMIP6_'
ICON:
cmor_strict: false
input_dir:
default:
- '{exp}'
- '{exp}/outdata'
- '{exp}/output'
input_file:
default: '{exp}_{var_type}*.nc'
output_file: '{project}_{dataset}_{exp}_{var_type}_{mip}_{short_name}'
cmor_type: 'CMIP6'
cmor_default_table_prefix: 'CMIP6_'
A detailed description on how to add support for further native datasets is given here.
Hint
When using native datasets, it might be helpful to specify a custom location
for the Custom CMOR tables.
This allows reading arbitrary variables from native datasets.
Note that this requires the option cmor_strict: false
in the
project configuration used for the native
model output.
References configuration file#
The esmvaltool/config-references.yml file contains the list of ESMValTool diagnostic and recipe authors, references and projects. Each author, project and reference referred to in the documentation section of a recipe needs to be in this file in the relevant section.
For instance, the recipe recipe_ocean_example.yml
file contains the
following documentation section:
documentation:
authors:
- demo_le
maintainer:
- demo_le
references:
- demora2018gmd
projects:
- ukesm
These four items here are named people, references and projects listed in the
config-references.yml
file.
Extra Facets#
It can be useful to automatically add extra key-value pairs to variables or datasets in the recipe. These key-value pairs can be used for finding data or for providing extra information to the functions that fix data before passing it on to the preprocessor.
To support this, we provide the extra facets facilities. Facets are the key-value pairs described in Recipe section: datasets. Extra facets allows for the addition of more details per project, dataset, mip table, and variable name.
More precisely, one can provide this information in an extra yaml file, named {project}-something.yml, where {project} corresponds to the project as used by ESMValCore in Recipe section: datasets and “something” is arbitrary.
Format of the extra facets files#
The extra facets are given in a yaml file, whose file name identifies the project. Inside the file there is a hierarchy of nested dictionaries with the following levels. At the top there is the dataset facet, followed by the mip table, and finally the short_name. The leaf dictionary placed here gives the extra facets that will be made available to data finder and the fix infrastructure. The following example illustrates the concept.
ERA5:
Amon:
tas: {source_var_name: "t2m", cds_var_name: "2m_temperature"}
The three levels of keys in this mapping can contain Unix shell-style wildcards. The special characters used in shell-style wildcards are:
Pattern |
Meaning |
---|---|
|
matches everything |
|
matches any single character |
|
matches any character in |
|
matches any character not in |
where seq
can either be a sequence of characters or just a bunch of characters,
for example [A-C]
matches the characters A
, B
, and C
,
while [AC]
matches the characters A
and C
.
For example, this is used to automatically add product: output1
to any
variable of any CMIP5 dataset that does not have a product
key yet:
'*':
'*':
'*': {product: output1}
Location of the extra facets files#
Extra facets files can be placed in several different places. When we use them
to support a particular use-case within the ESMValCore project, they will be
provided in the sub-folder extra_facets inside the package
esmvalcore.config
. If they are used from the user side, they can be either
placed in ~/.esmvaltool/extra_facets or in any other directory of the users
choosing. In that case, the configuration option extra_facets_dir
must be
set, which can take a single directory or a list of directories.
The order in which the directories are searched is
The internal directory esmvalcore.config/extra_facets
The default user directory ~/.esmvaltool/extra_facets
The custom user directories given by the configuration option
extra_facets_dir
The extra facets files within each of these directories are processed in lexicographical order according to their file name.
In all cases it is allowed to supersede information from earlier files in later files. This makes it possible for the user to effectively override even internal default facets, for example to deal with local particularities in the data handling.
Use of extra facets#
For extra facets to be useful, the information that they provide must be applied. There are fundamentally two places where this comes into play. One is the datafinder, the other are fixes.