Preprocessor functions#

All preprocessor functions are designed to preserve floating point data types. For example, input data of type float32 will give output of float32. However, other data types may change, e.g., data of type int may give output of type float64.

esmvalcore.preprocessor.DEFAULT_ORDER: tuple[str, ...] = ('fix_file', 'load', 'fix_metadata', 'concatenate', 'cmor_check_metadata', 'clip_timerange', 'fix_data', 'cmor_check_data', 'add_supplementary_variables', 'derive', 'align_metadata', 'extract_time', 'extract_season', 'extract_month', 'resample_hours', 'resample_time', 'extract_levels', 'extract_surface_from_atm', 'weighting_landsea_fraction', 'mask_landsea', 'mask_glaciated', 'mask_landseaice', 'regrid', 'extract_coordinate_points', 'extract_point', 'extract_location', 'mask_multimodel', 'mask_fillvalues', 'mask_above_threshold', 'mask_below_threshold', 'mask_inside_range', 'mask_outside_range', 'clip', 'rolling_window_statistics', 'cumulative_sum', 'extract_region', 'extract_shape', 'extract_volume', 'extract_trajectory', 'extract_transect', 'detrend', 'extract_named_regions', 'axis_statistics', 'depth_integration', 'area_statistics', 'volume_statistics', 'local_solar_time', 'amplitude', 'zonal_statistics', 'meridional_statistics', 'accumulate_coordinate', 'hourly_statistics', 'daily_statistics', 'monthly_statistics', 'seasonal_statistics', 'annual_statistics', 'decadal_statistics', 'climate_statistics', 'anomalies', 'regrid_time', 'timeseries_filter', 'linear_trend', 'linear_trend_stderr', 'convert_units', 'histogram', 'ensemble_statistics', 'multi_model_statistics', 'bias', 'distance_metric', 'remove_supplementary_variables', 'save')#: By default, preprocessor functions are applied in this order.

Preprocessor module.

Functions:

`accumulate_coordinate`(cube, coordinate)	Weight data using the bounds from a given coordinate.
`add_supplementary_variables`(cube, ...)	Add ancillary variables and/or cell measures to cube (in-place).
`align_metadata`(cube, target_project, ...[, ...])	Set cube metadata to entries from a specific target project.
`amplitude`(cube, coords)	Calculate amplitude of cycles by aggregating over coordinates.
`annual_statistics`(cube[, operator, ...])	Compute annual statistics.
`anomalies`(cube, period[, reference, ...])	Compute anomalies using a mean with the specified granularity.
`area_statistics`(cube, operator[, normalize])	Apply a statistical operator in the horizontal plane.
`axis_statistics`(cube, axis, operator[, ...])	Perform statistics along a given axis.
`bias`(products[, reference, bias_type, ...])	Calculate biases relative to a reference dataset.
`climate_statistics`(cube[, operator, period, ...])	Compute climate statistics with the specified granularity.
`clip`(cube[, minimum, maximum])	Clip values at a specified minimum and/or maximum value.
`clip_timerange`(cube, timerange)	Extract time range with a resolution up to seconds.
`cmor_check_data`(cube, cmor_table, mip, ...)	Check if data conforms to variable's CMOR definition.
`cmor_check_metadata`(cube, cmor_table, mip, ...)	Check if metadata conforms to variable's CMOR definition.
`concatenate`(cubes[, check_level])	Concatenate all cubes after fixing metadata.
`convert_units`(cube, units)	Convert the units of a cube to new ones (in-place).
`cumulative_sum`(cube, coord[, weights, method])	Calculate cumulative sum of the elements along a given coordinate.
`daily_statistics`(cube[, operator, ...])	Compute daily statistics.
`decadal_statistics`(cube[, operator, ...])	Compute decadal statistics.
`depth_integration`(cube)	Determine the total sum over the vertical component.
`derive`(cubes, short_name, long_name, units)	Derive variable.
`detrend`(cube[, dimension, method])	Detrend data along a given dimension.
`distance_metric`(products, metric[, ...])	Calculate distance metrics.
`ensemble_statistics`(products, statistics, ...)	Compute ensemble statistics.
`extract_coordinate_points`(cube, definition, ...)	Extract points from any coordinate with interpolation.
`extract_levels`(cube, levels, scheme[, ...])	Perform vertical interpolation.
`extract_location`(cube, location, scheme)	Extract a point using a location name, with interpolation.
`extract_month`(cube, month)	Slice cube to get only the data belonging to a specific month.
`extract_named_regions`(cube, regions)	Extract a specific named region.
`extract_point`(cube, latitude, longitude, scheme)	Extract a point, with interpolation.
`extract_region`(cube, start_longitude, ...)	Extract a region from a cube.
`extract_season`(cube, season)	Slice cube to get only the data belonging to a specific season.
`extract_shape`(cube, shapefile[, method, ...])	Extract a region defined by a shapefile using masking.
`extract_surface_from_atm`(cube)	Extract surface from 3D atmospheric variable based on surface pressure.
`extract_time`(cube, start_year, start_month, ...)	Extract a time range from a cube.
`extract_trajectory`(cube, latitudes, longitudes)	Extract data along a trajectory.
`extract_transect`(cube[, latitude, longitude])	Extract data along a line of constant latitude or longitude.
`extract_volume`(cube, z_min, z_max[, ...])	Subset a cube based on a range of values in the z-coordinate.
`fix_data`(cube, short_name, project, dataset, mip)	Fix cube data if fixes are required.
`fix_file`(file, short_name, project, dataset, ...)	Fix files before loading them into a `CubeList`.
`fix_metadata`(cubes, short_name, project, ...)	Fix cube metadata if fixes are required.
`histogram`(cube[, coords, bins, bin_range, ...])	Calculate histogram.
`hourly_statistics`(cube, hours[, operator, ...])	Compute hourly statistics.
`linear_trend`(cube[, coordinate])	Calculate linear trend of data along a given coordinate.
`linear_trend_stderr`(cube[, coordinate])	Calculate standard error of linear trend along a given coordinate.
`load`(file[, ignore_warnings, backend_kwargs])	Load Iris cubes.
`local_solar_time`(cube)	Convert UTC time coordinate to local solar time (LST).
`mask_above_threshold`(cube, threshold)	Mask above a specific threshold value.
`mask_below_threshold`(cube, threshold)	Mask below a specific threshold value.
`mask_fillvalues`(products, threshold_fraction)	Compute and apply a multi-dataset fillvalues mask.
`mask_glaciated`(cube[, mask_out])	Mask out glaciated areas.
`mask_inside_range`(cube, minimum, maximum)	Mask inside a specific threshold range.
`mask_landsea`(cube, mask_out)	Mask out either land mass or sea (oceans, seas and lakes).
`mask_landseaice`(cube, mask_out)	Mask out either landsea (combined) or ice.
`mask_multimodel`(products)	Apply common mask to all datasets (using logical OR).
`mask_outside_range`(cube, minimum, maximum)	Mask outside a specific threshold range.
`meridional_statistics`(cube, operator[, ...])	Compute meridional statistics.
`monthly_statistics`(cube[, operator, ...])	Compute monthly statistics.
`multi_model_statistics`(products, span, ...)	Compute multi-model statistics.
`regrid`(cube, target_grid, scheme[, ...])	Perform horizontal regridding.
`regrid_time`(cube, frequency[, calendar, units])	Align time coordinate for cubes.
`remove_supplementary_variables`(cube)	Remove supplementary variables from cube (in-place).
`resample_hours`(cube, interval[, offset, ...])	Convert x-hourly data to y-hourly data.
`resample_time`(cube[, month, day, hour])	Change frequency of data by resampling it.
`rolling_window_statistics`(cube, coordinate, ...)	Compute rolling-window statistics over a coordinate.
`save`(cubes, filename[, optimize_access, ...])	Save iris cubes to file.
`seasonal_statistics`(cube[, operator, ...])	Compute seasonal statistics.
`timeseries_filter`(cube, window, span[, ...])	Apply a timeseries filter.
`volume_statistics`(cube, operator[, normalize])	Apply a statistical operation over a volume.
`weighting_landsea_fraction`(cube, area_type)	Weight fields using land or sea fraction.
`zonal_statistics`(cube, operator[, normalize])	Compute zonal statistics.

esmvalcore.preprocessor.accumulate_coordinate(cube: Cube, coordinate: str | DimCoord | AuxCoord) → Cube[source]#

Weight data using the bounds from a given coordinate.

The resulting cube will then have units given by cube_units * coordinate_units.

Parameters:

cube (Cube) – Data cube for the flux.
coordinate (str | DimCoord | AuxCoord) – Name of the coordinate that will be used as weights.

Returns:

Cube with the aggregated data.

Return type:

iris.cube.Cube

Raises:

ValueError – If the coordinate is not found in the cube.
NotImplementedError – If the coordinate is multidimensional.

esmvalcore.preprocessor.add_supplementary_variables(cube: Cube, supplementary_cubes: Iterable[Cube]) → Cube[source]#

Add ancillary variables and/or cell measures to cube (in-place).

Parameters:

cube (Cube) – Cube to add to.
supplementary_cubes (Iterable[Cube]) – Iterable of cubes containing the supplementary variables.

Returns:

Cube with added ancillary variables and/or cell measures.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.align_metadata(cube: Cube, target_project: str, target_mip: str, target_short_name: str, target_branding_suffix: str | None = None, strict: bool = True) → Cube[source]#

Set cube metadata to entries from a specific target project.

This is useful to align variable metadata of different projects prior to performing multi-model operations (e.g., multi_model_statistics()). For example, standard names differ for some variables between CMIP5 and CMIP6 which would prevent the calculation of multi-model statistics between CMIP5 and CMIP6 data.

Parameters:

cube (Cube) – Input cube.
target_project (str) – Project from which target metadata is read.
target_mip (str) – MIP table from which target metadata is read.
target_short_name (str) – Variable short name from which target metadata is read.
target_branding_suffix (str | None) – Branding suffix from which target metadata is read.
strict (bool) – If True, raise an error if desired metadata cannot be read for variable target_short_name of MIP table target_mip and project target_project. If False, no error is raised.

Returns:

Cube with updated metadata.

Return type:

iris.cube.Cube

Raises:

KeyError – Invalid target_project given.
ValueError – If strict=True: Variable target_short_name not available for MIP table target_mip of project target_project.

esmvalcore.preprocessor.amplitude(cube, coords)[source]#

Calculate amplitude of cycles by aggregating over coordinates.

Note

The amplitude is calculated as peak-to-peak amplitude (difference between maximum and minimum value of the signal). Other amplitude types are currently not supported.

Parameters:

cube (iris.cube.Cube) – Input data.
coords (str or list of str) – Coordinates over which is aggregated. For example, use 'year' to extract the annual cycle amplitude for each year in the data or ['day_of_year', 'year'] to extract the diurnal cycle amplitude for each individual day in the data. If the coordinates are not found in cube, try to add it via iris.coord_categorisation (at the moment, this only works for the temporal coordinates day_of_month, day_of_year, hour, month, month_fullname, month_number, season, season_number, season_year, weekday, weekday_fullname, weekday_number or year.

Returns:

Amplitudes.

Return type:

iris.cube.Cube

Raises:

iris.exceptions.CoordinateNotFoundError – A coordinate is not found in cube and cannot be added via iris.coord_categorisation.

esmvalcore.preprocessor.annual_statistics(cube: Cube, operator: str = 'mean', keep_group_coordinates: bool = False, **operator_kwargs: Any) → Cube[source]#

Compute annual statistics.

Note that this function does not weight the annual mean if uneven time periods are present. Ie, all data inside the year are treated equally.

Parameters:

cube (Cube) – Input cube.
operator (str) – The operation. Used to determine the iris.analysis.Aggregator object used to calculate the statistics. Allowed options are given in this table.
keep_group_coordinates (bool) – If True, keep temporal group coordinates (i.e., year) in the result. If False, remove them.
**operator_kwargs (Any) – Optional keyword arguments for the iris.analysis.Aggregator object defined by operator.

Returns:

Annual statistics cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.anomalies(cube: Cube, period: str, reference: dict | None = None, standardize: bool = False, relative: bool = False, seasons: Iterable[str] = ('DJF', 'MAM', 'JJA', 'SON')) → Cube[source]#

Compute anomalies using a mean with the specified granularity.

Computes anomalies based on hourly, daily, monthly, seasonal or yearly means for the full available period.

Parameters:

cube (Cube) – Input cube.
period (str) – Period to compute the statistic over. Available periods: full, season, seasonal, monthly, month, mon, daily, day, hourly, hour, hr.
reference (optional) – Period of time to use a reference, as needed for the extract_time() preprocessor function. If None, all available data is used as a reference.
relative (optional) – If True relative anomalies are calculated.
standardize (optional) – If True standardized anomalies are calculated.
seasons (optional) – Seasons to use if needed. Defaults to (‘DJF’, ‘MAM’, ‘JJA’, ‘SON’).

Returns:

Anomalies cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.area_statistics(cube: Cube, operator: str, normalize: Literal['subtract', 'divide'] | None = None, **operator_kwargs: Any) → Cube[source]#

Apply a statistical operator in the horizontal plane.

We assume that the horizontal directions are [‘longitude’, ‘latitude’].

This table shows a list of supported operators. All operators that support weights are by default weighted with the grid cell areas. Note that for area-weighted sums, the units of the resulting cube will be multiplied by m \(^2\).

Parameters:

cube (Cube) – Input cube. The input cube should have a iris.coords.CellMeasure named 'cell_area', unless it has regular 1D latitude and longitude coordinates so the cell areas can be computed using iris.analysis.cartography.area_weights().
operator (str) – The operation. Used to determine the iris.analysis.Aggregator object used to calculate the statistics. Allowed options are given in this table.
normalize (Literal['subtract', 'divide'] | None) – If given, do not return the statistics cube itself, but rather, the input cube, normalized with the statistics cube. Can either be subtract (statistics cube is subtracted from the input cube) or divide (input cube is divided by the statistics cube).
**operator_kwargs (Any) – Optional keyword arguments for the iris.analysis.Aggregator object defined by operator.

Returns:

Collapsed cube.

Return type:

iris.cube.Cube

Raises:

iris.exceptions.CoordinateMultiDimError – Cube has irregular or unstructured grid but supplementary variable cell_area is not available.

esmvalcore.preprocessor.axis_statistics(cube: Cube, axis: str, operator: str, normalize: Literal['subtract', 'divide'] | None = None, **operator_kwargs: Any) → Cube[source]#

Perform statistics along a given axis.

Operates over an axis direction.

Note

The mean, sum and rms operations are weighted by the corresponding coordinate bounds by default. For sum, the units of the resulting cube will be multiplied by corresponding coordinate units.

Parameters:

cube (Cube) – Input cube.
axis (str) – Direction over where to apply the operator. Possible values are x, y, z, t.
operator (str) – The operation. Used to determine the iris.analysis.Aggregator object used to calculate the statistics. Allowed options are given in this table.
normalize (Literal['subtract', 'divide'] | None) – If given, do not return the statistics cube itself, but rather, the input cube, normalized with the statistics cube. Can either be subtract (statistics cube is subtracted from the input cube) or divide (input cube is divided by the statistics cube).
**operator_kwargs (Any) – Optional keyword arguments for the iris.analysis.Aggregator object defined by operator.

Returns:

Collapsed cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.bias(products: set[PreprocessorFile] | Iterable[Cube], reference: Cube | None = None, bias_type: BiasType = 'absolute', denominator_mask_threshold: float = 0.001, keep_reference_dataset: bool = False) → set[PreprocessorFile] | CubeList[source]#

Calculate biases relative to a reference dataset.

The reference dataset needs to be broadcastable to all input products. This supports iris’ rich broadcasting abilities. To ensure this, the preprocessors esmvalcore.preprocessor.regrid() and/or esmvalcore.preprocessor.regrid_time() might be helpful.

Notes

The reference dataset can be specified with the reference argument. If reference is None, exactly one input dataset in the products set needs to have the facet reference_for_bias: true defined in the recipe. Please do not specify the option reference when using this preprocessor function in a recipe.

Parameters:

products (set[PreprocessorFile] | Iterable[Cube]) – Input datasets/cubes for which the bias is calculated relative to a reference dataset/cube.
reference (Cube | None) – Cube which is used as reference for the bias calculation. If None, products needs to be a set of ~esmvalcore.preprocessor.PreprocessorFile objects and exactly one dataset in products needs the facet reference_for_bias: true. Do not specify this argument in a recipe.
bias_type (BiasType) – Bias type that is calculated. Must be one of 'absolute' (dataset - ref) or 'relative' ((dataset - ref) / ref).
denominator_mask_threshold (float) – Threshold to mask values close to zero in the denominator (i.e., the reference dataset) during the calculation of relative biases. All values in the reference dataset with absolute value less than the given threshold are masked out. This setting is ignored when bias_type is set to 'absolute'. Please note that for some variables with very small absolute values (e.g., carbon cycle fluxes, which are usually \(< 10^{-6}\) kg m \(^{-2}\) s \(^{-1}\)) it is absolutely essential to change the default value in order to get reasonable results.
keep_reference_dataset (bool) – If True, keep the reference dataset in the output. If False, drop the reference dataset. Ignored if reference is given.

Returns:

Output datasets/cubes. Will be a set of PreprocessorFile objects if products is also one, a CubeList otherwise.

Return type:

set[PreprocessorFile] | CubeList

Raises:

ValueError – Not exactly one input datasets contains the facet reference_for_bias: true if reference=None; reference=None and the input products are given as iterable of Cube objects; bias_type is not one of 'absolute' or 'relative'.

esmvalcore.preprocessor.climate_statistics(cube: Cube, operator: str = 'mean', period: str = 'full', seasons: Iterable[str] = ('DJF', 'MAM', 'JJA', 'SON'), **operator_kwargs: Any) → Cube[source]#

Compute climate statistics with the specified granularity.

Computes statistics for the whole dataset. It is possible to get them for the full period or with the data grouped by hour, day, month or season.

Note

The mean, sum and rms operations over the full period are weighted by the time coordinate, i.e., the length of the time intervals. For sum, the units of the resulting cube will be multiplied by corresponding time units (e.g., days).

If a period other than full is used, time points will be put into bins, which may shift existing time points. For example, for period=hourly, a time point at 01:30h will be moved to the corresponding full hour (01:00h).

Parameters:

cube (Cube) – Input cube.
operator (str) – The operation. Used to determine the iris.analysis.Aggregator object used to calculate the statistics. Allowed options are given in this table.
period (str) – Period to compute the statistic over. Available periods: full, season, seasonal, monthly, month, mon, daily, day, hourly, hour, hr.
seasons (Iterable[str]) – Seasons to use if needed. Defaults to (‘DJF’, ‘MAM’, ‘JJA’, ‘SON’).
**operator_kwargs (Any) – Optional keyword arguments for the iris.analysis.Aggregator object defined by operator.

Returns:

Climate statistics cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.clip(cube: Cube, minimum: float | None = None, maximum: float | None = None) → Cube[source]#

Clip values at a specified minimum and/or maximum value.

Values lower than minimum are set to minimum and values higher than maximum are set to maximum.

Parameters:

cube (Cube) – Input cube.
minimum (float | None) – Lower threshold to be applied on input cube data.
maximum (float | None) – Upper threshold to be applied on input cube data.

Returns:

Clipped cube.

Return type:

iris.cube.Cube

Raises:

ValueError – minimum and maximum are set to None.

esmvalcore.preprocessor.clip_timerange(cube: Cube, timerange: str) → Cube[source]#

Extract time range with a resolution up to seconds.

Parameters:

cube (Cube) – Input cube.
timerange (str) – Time range in ISO 8601 format.

Returns:

Sliced cube.

Return type:

iris.cube.Cube

Raises:

ValueError – Time ranges are outside the cube’s time limits.
isodate.isoerror.ISO8601Error – Start/end times can not be parsed by isodate.

esmvalcore.preprocessor.cmor_check_data(cube: Cube, cmor_table: str, mip: str, short_name: str, *, branding_suffix: str | None = None, frequency: str | None = None, check_level: CheckLevels = CheckLevels.DEFAULT) → Cube[source]#

Check if data conforms to variable’s CMOR definition.

Parameters:

cube (Cube) – Data cube to check.
cmor_table (str) – CMOR definitions to use (i.e., the variable’s project).
mip (str) – Variable’s MIP.
short_name (str) – Variable’s short name
branding_suffix (str | None) – A suffix that will be appended to short_name when looking up the variable in the CMOR table. Used by the CMIP7 project.
frequency (str | None) – Data frequency. If not given, use the one from the CMOR table of the variable.
check_level (CheckLevels) – Level of strictness of the checks.

Returns:

Checked cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.cmor_check_metadata(cube: Cube, cmor_table: str, mip: str, short_name: str, *, branding_suffix: str | None = None, frequency: str | None = None, check_level: CheckLevels = CheckLevels.DEFAULT) → Cube[source]#

Check if metadata conforms to variable’s CMOR definition.

None of the checks at this step will force the cube to load the data.

Parameters:

cube (Cube) – Data cube to check.
cmor_table (str) – CMOR definitions to use (i.e., the variable’s project).
mip (str) – Variable’s MIP.
short_name (str) – Variable’s short name.
branding_suffix (str | None) – A suffix that will be appended to short_name when looking up the variable in the CMOR table. Used by the CMIP7 project.
frequency (str | None) – Data frequency. If not given, use the one from the CMOR table of the variable.
check_level (CheckLevels) – Level of strictness of the checks.

Returns:

Checked cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.concatenate(cubes: Sequence[Cube], check_level: CheckLevels = CheckLevels.DEFAULT) → Cube[source]#

Concatenate all cubes after fixing metadata.

Parameters:

cubes (iterable of iris.cube.Cube) – Data cubes to be concatenated
check_level (CheckLevels) – Level of strictness of the checks in the concatenation.

Returns:

cube – Resulting concatenated cube.

Return type:

iris.cube.Cube

Raises:

ValueError – Concatenation was not possible.

esmvalcore.preprocessor.convert_units(cube: Cube, units: str | Unit) → Cube[source]#

Convert the units of a cube to new ones (in-place).

Note

Allows special unit conversions which transforms one quantity to another (physically related) quantity, which may also change the input cube’s standard_name. These quantities are identified via their standard_name and their units (units convertible to the ones defined are also supported). For example, this enables conversions between precipitation fluxes measured in kg m-2 s-1 and precipitation rates measured in mm day-1 (and vice versa).

Currently, the following special conversions are supported:

precipitation_flux (kg m-2 s-1) – lwe_precipitation_rate (mm day-1)
water_evaporation_flux (kg m-2 s-1) – lwe_water_evaporation_rate (mm day-1)
water_potential_evaporation_flux (kg m-2 s-1) – None (mm day-1)
equivalent_thickness_at_stp_of_atmosphere_ozone_content (m) – equivalent_thickness_at_stp_of_atmosphere_ozone_content (DU)
surface_air_pressure (Pa) – atmosphere_mass_of_air_per_unit_area (kg m-2)

Names in the list correspond to standard_names of the input data. Conversions are allowed from each quantity to any other quantity given in a bullet point. The corresponding target quantity is inferred from the desired target units. In addition, any other units convertible to the ones given are also supported (e.g., instead of mm day-1, m s-1 is also supported).

Note that for precipitation and evaporation variables, a water density of 1000 kg m-3 is assumed.

Parameters:

cube (Cube) – Input cube (modified in place).
units (str | Unit) – New units.

Returns:

Converted cube. Just returned for convenience; input cube is modified in place.

Return type:

iris.cube.Cube

Raises:

iris.exceptions.UnitConversionError – Old units are unknown.
ValueError – Old units are not convertible to new units.

esmvalcore.preprocessor.cumulative_sum(cube: Cube, coord: Coord | str, weights: np.ndarray | da.Array | bool | None = None, method: Literal['sequential', 'blelloch'] = 'sequential') → Cube[source]#

Calculate cumulative sum of the elements along a given coordinate.

Parameters:

cube (Cube) – Input cube.
coord (Coord | str) – Coordinate over which the cumulative sum is calculated. Must be 0D or 1D.
weights (np.ndarray | da.Array | bool | None) – Weights for the calculation of the cumulative sum. Each element in the data is multiplied by the corresponding weight before summing. Can be an array of the same shape as the input data, False or None (no weighting), or True (calculate the weights from the coordinate bounds; only works if each coordinate point has exactly 2 bounds).
method (Literal['sequential', 'blelloch']) – Method used to perform the cumulative sum. Only relevant if the cube has lazy data. See dask.array.cumsum() for details.

Returns:

Cube of cumulative sum. Has same dimensions and coordinates of the input cube.

Return type:

Cube

Raises:

iris.exceptions.CoordinateMultiDimError – coord is not 0D or 1D.
iris.exceptions.CoordinateNotFoundError – coord is not found in cube.

esmvalcore.preprocessor.daily_statistics(cube: Cube, operator: str = 'mean', keep_group_coordinates: bool = False, **operator_kwargs: Any) → Cube[source]#

Compute daily statistics.

Chunks time in daily periods and computes statistics over them;

Parameters:

cube (Cube) – Input cube.
operator (str) – The operation. Used to determine the iris.analysis.Aggregator object used to calculate the statistics. Allowed options are given in this table.
keep_group_coordinates (bool) – If True, keep temporal group coordinates (i.e., day_of_year, year) in the result. If False, remove them.
**operator_kwargs (Any) – Optional keyword arguments for the iris.analysis.Aggregator object defined by operator.

Returns:

Daily statistics cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.decadal_statistics(cube: Cube, operator: str = 'mean', keep_group_coordinates: bool = False, **operator_kwargs: Any) → Cube[source]#

Compute decadal statistics.

Note that this function does not weight the decadal mean if uneven time periods are present. Ie, all data inside the decade are treated equally.

Parameters:

cube (Cube) – Input cube.
operator (str) – The operation. Used to determine the iris.analysis.Aggregator object used to calculate the statistics. Allowed options are given in this table.
keep_group_coordinates (bool) – If True, keep temporal group coordinates (i.e., decade) in the result. If False, remove them.
**operator_kwargs (Any) – Optional keyword arguments for the iris.analysis.Aggregator object defined by operator.

Returns:

Decadal statistics cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.depth_integration(cube: Cube) → Cube[source]#

Determine the total sum over the vertical component.

Requires a 3D cube. The z-coordinate integration is calculated by taking the sum in the z direction of the cell contents multiplied by the cell thickness. The units of the resulting cube are multiplied by the z-coordinate units.

Parameters:: cube (Cube) – Input cube.
Returns:: Collapsed cube.
Return type:: iris.cube.Cube

esmvalcore.preprocessor.derive(cubes: CubeList, short_name: str, long_name: str, units: str | Unit, standard_name: str | None = None) → Cube[source]#

Derive variable.

Parameters:

cubes (CubeList) – Includes all the needed variables for derivation defined in get_required().
short_name (str) – short_name
long_name (str) – long_name
units (str | Unit) – units
standard_name (str | None) – standard_name

Returns:

The new derived variable.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.detrend(cube, dimension='time', method='linear')[source]#

Detrend data along a given dimension.

Parameters:

cube (iris.cube.Cube) – input cube.
dimension (str) – Dimension to detrend
method (str) – Method to detrend. Available: linear, constant. See documentation of ‘scipy.signal.detrend’ for details

Returns:

Detrended cube

Return type:

iris.cube.Cube

esmvalcore.preprocessor.distance_metric(products: set[PreprocessorFile] | Iterable[Cube], metric: MetricType, reference: Cube | None = None, coords: Iterable[Coord] | Iterable[str] | None = None, keep_reference_dataset: bool = True, **kwargs: Any) → set[PreprocessorFile] | CubeList[source]#

Calculate distance metrics.

All input datasets need to have identical dimensional coordinates. This can for example be ensured with the preprocessors esmvalcore.preprocessor.regrid() and/or esmvalcore.preprocessor.regrid_time().

Notes

This preprocessor requires a reference dataset, which can be specified with the reference argument. If reference is None, exactly one input dataset in the products set needs to have the facet reference_for_metric: true defined in the recipe. Please do not specify the option reference when using this preprocessor function in a recipe.

Parameters:

products (set[PreprocessorFile] | Iterable[Cube]) – Input datasets/cubes for which the distance metric is calculated relative to a reference dataset/cube.
metric (MetricType) –
Distance metric that is calculated. Must be one of
- 'rmse': Unweighted root mean square error
- 'weighted_rmse': Weighted root mean square error
- 'pearsonr': Unweighted Pearson correlation coefficient
- 'weighted_pearsonr': Weighted Pearson correlation coefficient
- 'emd': Unweighted Earth mover’s distance
- 'weighted_emd': Weighted Earth mover’s distance
The Earth mover’s distance is also known as first Wasserstein metric W₁.

A detailed description of these metrics can be found here.

Note

Metrics starting with weighted_ will calculate weighted distance metrics if possible. Currently, the following coords (or any combinations that include them) will trigger weighting: time (will use lengths of time intervals as weights) and latitude (will use cell area weights). Time weights are always calculated from the input data. Area weights can be given as supplementary variables to the recipe (areacella or areacello, see Defining supplementary variables (ancillary variables and cell measures)) or calculated from the input data (this only works for regular grids). By default, NO supplementary variables will be used; they need to be explicitly requested in the recipe.
reference (Cube | None) – Cube which is used as reference for the distance metric calculation. If None, products needs to be a set of PreprocessorFile objects and exactly one dataset in products needs the facet reference_for_metric: true. Do not specify this argument in a recipe.
coords (Iterable[Coord] | Iterable[str] | None) – Coordinates over which the distance metric is calculated. If None, calculate the metric over all coordinates, which results in a scalar cube.
keep_reference_dataset (bool) – If True, also calculate the distance of the reference dataset with itself. If False, drop the reference dataset.
**kwargs (Any) –
Additional options for the metric calculation. The following keyword arguments are supported:
- rmse and weighted_rmse: none.
- pearsonr and weighted_pearsonr: mdtol, common_mask (all keyword arguments are passed to iris.analysis.stats.pearsonr(), see that link for more details on these arguments). Note: in contrast to pearsonr(), common_mask=True by default.
- emd and weighted_emd: n_bins = number of bins used to create discrete probability distribition of data before calculating the EMD (int, default: 100).

Returns:

Output datasets/cubes. Will be a set of PreprocessorFile objects if products is also one, a CubeList otherwise.

Return type:

set of esmvalcore.preprocessor.PreprocessorFile or iris.cube.CubeList

Raises:

ValueError – Shape and coordinates of products and reference data does not match; not exactly one input datasets contains the facet reference_for_metric: true if reference=None`; ``reference=None and the input products are given as iterable of Cube objects; an invalid metric has been given.
iris.exceptions.CoordinateNotFoundError – longitude is not found in cube if a weighted metric shall be calculated, latitude is in coords, and no cell_area is given as Defining supplementary variables (ancillary variables and cell measures).

esmvalcore.preprocessor.ensemble_statistics(products: set[PreprocessorFile] | Iterable[Cube], statistics: list[str | dict], output_products: dict[str, PreprocessorFile] | None, span: str = 'overlap', ignore_scalar_coords: bool = False) → dict | set[source]#

Compute ensemble statistics.

An ensemble grouping is performed on the input products (using the ensemble facet of input datasets). The statistics are then computed calling esmvalcore.preprocessor.multi_model_statistics() with appropriate groups.

Parameters:

products (set[PreprocessorFile] | Iterable[Cube]) – Cubes (or products) over which the statistics will be computed.
statistics (list[str | dict]) – Statistical operations to be computed, e.g., ['mean', 'median']. For some statistics like percentiles, it is also possible to pass additional keyword arguments, e.g., [{'operator': 'percentile', 'percent': 20}]. All supported options are are given in this table.
output_products (dict[str, PreprocessorFile] | None) – For internal use only. A dict with statistics names as keys and preprocessorfiles as values. If products are passed as input, the statistics cubes will be assigned to these output products.
span (str) – Overlap or full; if overlap, statitstics are computed on common time- span; if full, statistics are computed on full time spans, ignoring missing data.
ignore_scalar_coords (bool) – If True, remove any scalar coordinate in the input datasets before merging the input cubes into the multi-dataset cube. The resulting multi-dataset cube will have no scalar coordinates (the actual input datasets will remain unchanged). If False, scalar coordinates will remain in the input datasets, which might lead to merge conflicts in case the input datasets have different scalar coordinates.

Returns:

A dict of cubes or set of output_products depending on the type of products.

Return type:

dict | set

esmvalcore.preprocessor.extract_coordinate_points(cube: Cube, definition: dict[str, ArrayLike], scheme: NamedPointInterpolationScheme) → Cube[source]#

Extract points from any coordinate with interpolation.

Multiple points can also be extracted, by supplying an array of coordinates. The resulting point cube will match the respective coordinates to those of the input coordinates. If the input coordinate is a scalar, the dimension will be a scalar in the output cube.

Parameters:

cube (Cube) – The source cube to extract a point from.
definition (dict[str, ArrayLike]) – The coordinate - values pairs to extract
scheme (NamedPointInterpolationScheme) – The interpolation scheme. ‘linear’ or ‘nearest’. No default.

Returns:

Returns a cube with the extracted point(s), and with adjusted latitude and longitude coordinates (see above). If desired point outside values for at least one coordinate, this cube will have fully masked data.

Return type:

iris.cube.Cube

Raises:

ValueError: – If the interpolation scheme is not provided or is not recognised.

esmvalcore.preprocessor.extract_levels(cube: Cube, levels: ArrayLike | Array, scheme: Literal['linear', 'nearest', 'linear_extrapolate', 'nearest_extrapolate'], coordinate: str | None = None, rtol: float = 1e-07, atol: float | None = None) → Cube[source]#

Perform vertical interpolation.

Parameters:

cube (Cube) – The source cube to be vertically interpolated.
levels (ArrayLike | Array) – One or more target levels for the vertical interpolation. Assumed to be in the same S.I. units of the source cube vertical dimension coordinate. If the requested levels are sufficiently close to the levels of the cube, cube slicing will take place instead of interpolation.
scheme (Literal['linear', 'nearest', 'linear_extrapolate', 'nearest_extrapolate']) – The vertical interpolation scheme to use. Choose from ‘linear’, ‘nearest’, ‘linear_extrapolate’, ‘nearest_extrapolate’.
coordinate (str | None) – The coordinate to interpolate. If specified, pressure levels (if present) can be converted to height levels and vice versa using the US standard atmosphere. E.g. ‘coordinate = altitude’ will convert existing pressure levels (air_pressure) to height levels (altitude); ‘coordinate = air_pressure’ will convert existing height levels (altitude) to pressure levels (air_pressure).
rtol (float) – Relative tolerance for comparing the levels in cube to the requested levels. If the levels are sufficiently close, the requested levels will be assigned to the cube and no interpolation will take place.
atol (float | None) – Absolute tolerance for comparing the levels in cube to the requested levels. If the levels are sufficiently close, the requested levels will be assigned to the cube and no interpolation will take place. By default, atol will be set to 10^-7 times the mean value of the levels on the cube.

Returns:

A cube with the requested vertical levels.

Return type:

iris.cube.Cube

See also

regrid: Perform horizontal regridding.

esmvalcore.preprocessor.extract_location(cube: Cube, location: str, scheme: Literal['linear', 'nearest']) → Cube[source]#

Extract a point using a location name, with interpolation.

Extracts a single location point from a cube, according to the interpolation scheme scheme.

The function just retrieves the coordinates of the location and then calls the extract_point preprocessor.

It can be used to locate cities and villages, but also mountains or other geographical locations.

Note

The geolocator needs a working internet connection.

Parameters:

cube (Cube) – The source cube to extract a point from.
location (str) – The reference location. Examples: ‘mount everest’, ‘romania’,’new york, usa’
scheme (Literal['linear', 'nearest']) – The interpolation scheme. ‘linear’ or ‘nearest’. No default.

Returns:

Returns a cube with the extracted point, and with adjusted latitude and longitude coordinates.

Return type:

iris.cube.Cube

Raises:

ValueError: – If location is not supplied as a preprocessor parameter.
ValueError: – If scheme is not supplied as a preprocessor parameter.
ValueError: – If given location cannot be found by the geolocator.

esmvalcore.preprocessor.extract_month(cube: Cube, month: int) → Cube[source]#

Slice cube to get only the data belonging to a specific month.

Parameters:

cube (Cube) – Original data
month (int) – Month to extract as a number from 1 to 12.

Returns:

Cube for specified month.

Return type:

iris.cube.Cube

Raises:

ValueError – Requested month is not present in the cube.

esmvalcore.preprocessor.extract_named_regions(cube: Cube, regions: str | Iterable[str]) → Cube[source]#

Extract a specific named region.

The region coordinate exist in certain CMIP datasets. This preprocessor allows a specific named regions to be extracted.

Parameters:

cube (Cube) – Input cube.
regions (str | Iterable[str]) – A region or list of regions to extract.

Returns:

Smaller cube.

Return type:

iris.cube.Cube

Raises:

ValueError – regions is not list or tuple or set.
ValueError – region not included in cube.

esmvalcore.preprocessor.extract_point(cube: Cube, latitude: ArrayLike, longitude: ArrayLike, scheme: NamedPointInterpolationScheme) → Cube[source]#

Extract a point, with interpolation.

Extracts a single latitude/longitude point from a cube, according to the interpolation scheme scheme.

Multiple points can also be extracted, by supplying an array of latitude and/or longitude coordinates. The resulting point cube will match the respective latitude and longitude coordinate to those of the input coordinates. If the input coordinate is a scalar, the dimension will be missing in the output cube (that is, it will be a scalar).

If the point to be extracted has at least one of the coordinate point values outside the interval of the cube’s same coordinate values, then no extrapolation will be performed, and the resulting extracted cube will have fully masked data.

Parameters:

cube (Cube) – The source cube to extract a point from.
latitude (ArrayLike) – The latitude of the point.
longitude (ArrayLike) – The longitude of the point.
scheme (NamedPointInterpolationScheme) – The interpolation scheme. ‘linear’ or ‘nearest’. No default.

Returns:

Return type:

iris.cube.Cube

Raises:

ValueError: – If the interpolation scheme is None or unrecognized.

Examples

With a cube that has the coordinates

latitude: [1, 2, 3, 4]
longitude: [1, 2, 3, 4]
data values: [[[1, 2, 3, 4], [5, 6, …], […], […],
… ]]]

>>> point = extract_point(cube, 2.5, 2.5, 'linear')
>>> point.data
array([ 8.5, 24.5, 40.5, 56.5])

Extraction of multiple points at once, with a nearest matching scheme. The values for 0.1 will result in masked values, since this lies outside the cube grid.

>>> point = extract_point(cube, [1.4, 2.1], [0.1, 1.1],
...                       'nearest')
>>> point.data.shape
(4, 2, 2)
>>> # x, y, z indices of masked values
>>> np.where(~point.data.mask)
(array([0, 0, 1, 1, 2, 2, 3, 3]), array([0, 1, 0, 1, 0, 1, 0, 1]),
array([1, 1, 1, 1, 1, 1, 1, 1]))
>>> point.data[~point.data.mask].data
array([ 1,  5, 17, 21, 33, 37, 49, 53])

esmvalcore.preprocessor.extract_region(cube: Cube, start_longitude: float, end_longitude: float, start_latitude: float, end_latitude: float) → Cube[source]#

Extract a region from a cube.

Function that subsets a cube on a box (start_longitude, end_longitude, start_latitude, end_latitude).

Parameters:

cube (Cube) – Input data cube.
start_longitude (float) – Western boundary longitude.
end_longitude (float) – Eastern boundary longitude.
start_latitude (float) – Southern Boundary latitude.
end_latitude (float) – Northern Boundary Latitude.

Returns:

Smaller cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.extract_season(cube: Cube, season: str) → Cube[source]#

Slice cube to get only the data belonging to a specific season.

Parameters:

cube (Cube) – Original data
season (str) – Season to extract. Available: DJF, MAM, JJA, SON and all sequentially correct combinations: e.g. JJAS

Returns:

data cube for specified season.

Return type:

iris.cube.Cube

Raises:

ValueError – Requested season is not present in the cube.

esmvalcore.preprocessor.extract_shape(cube: Cube, shapefile: str | Path, method: str = 'contains', crop: bool = True, decomposed: bool = False, ids: list | dict | None = None) → Cube[source]#

Extract a region defined by a shapefile using masking.

Note that this function does not work for shapes crossing the prime meridian or poles.

Parameters:

cube (Cube) – Input cube.
shapefile (str | Path) –
A shapefile defining the region(s) to extract. Also accepts the following strings to load special shapefiles:
- 'ar6': IPCC WG1 reference regions (v4) used in Assessment Report 6 (https://doi.org/10.5281/zenodo.5176260). Should be used in combination with a dict for the argument ids, e.g., ids={'Acronym': ['GIC', 'WNA']}.
method (str) – Select all points contained by the shape or select a single representative point. Choose either ‘contains’ or ‘representative’. If ‘contains’ is used, but not a single grid point is contained by the shape, a representative point will be selected.
crop (bool) – In addition to masking, crop the resulting cube using extract_region(). Data on irregular grids will not be cropped.
decomposed (bool) – If set to True, the output cube will have an additional dimension shape_id describing the requested regions.
ids (list | dict | None) –
Shapes to be read from the shapefile. Can be given as:
- list: IDs are assigned from the attributes name, NAME, Name, id, or ID (in that priority order; the first one available is used). If none of these attributes are available in the shapefile, assume that the given ids correspond to the reading order of the individual shapes, e.g., ids=[0, 2] corresponds to the first and third shape read from the shapefile. Note: An empty list is interpreted as ids=None.
- dict: IDs (dictionary value; list of str) are assigned from attribute given as dictionary key (str). Only dictionaries with length 1 are supported. Example: ids={'Acronym': ['GIC', 'WNA']} for shapefile='ar6'.
- None: select all available shapes from the shapefile.

Returns:

Cube containing the extracted region.

Return type:

iris.cube.Cube

See also

extract_region: Extract a region from a cube.

esmvalcore.preprocessor.extract_surface_from_atm(cube: Cube) → Cube[source]#

Extract surface from 3D atmospheric variable based on surface pressure.

Parameters:: cube (Cube) – Input cube. Needs AncillaryVariable surface_air_pressure.
Returns:: Collapsed cube.
Return type:: iris.cube.Cube

esmvalcore.preprocessor.extract_time(cube: Cube, start_year: int | None, start_month: int, start_day: int, end_year: int | None, end_month: int, end_day: int) → Cube[source]#

Extract a time range from a cube.

Given a time range passed in as a series of years, months and days, it returns a time-extracted cube with data only within the specified time range.

Parameters:

cube (Cube) – Input cube.
start_year (int | None) – Start year. If None, the date ranges (start_month-start_day to end_month-end_day) are selected in each year. For example, ranges Feb 3 - Apr 6 in each year are selected if start_year=None, start_month=2, start_day=3, end_year=None, end_month=4, end_day=6. If start_year is None, end_year has to be None too.
start_month (int) – Start month.
start_day (int) – Start day.
end_year (int | None) – End year. If None, the date ranges (start_month-start_day to end_month-end_day) are selected in each year. For example, ranges Feb 3 - Apr 6 in each year are selected if start_year=None, start_month=2, start_day=3, end_year=None, end_month=4, end_day=6. If end_year is None, start_year has to be None too.
end_month (int) – End month.
end_day (int) – End day.

Returns:

Sliced cube.

Return type:

iris.cube.Cube

Raises:

ValueError – Time ranges are outside the cube time limits.

esmvalcore.preprocessor.extract_trajectory(cube: Cube, latitudes: Sequence[float], longitudes: Sequence[float], number_points: int = 2) → Cube[source]#

Extract data along a trajectory.

latitudes and longitudes are the pairs of coordinates for two points. number_points is the number of points between the two points.

This version uses the expensive interpolate method, but it may be necceasiry for irregular grids.

If only two latitude and longitude coordinates are given, extract_trajectory will produce a cube will extrapolate along a line between those two points, and will add number_points points between the two corners.

If more than two points are provided, then extract_trajectory will produce a cube which has extrapolated the data of the cube to those points, and number_points is not needed.

Parameters:

cube (Cube) – Input cube.
latitudes (Sequence[float]) – Latitude coordinates.
longitudes (Sequence[float]) – Longitude coordinates.
number_points (optional) – Number of points to extrapolate.

Returns:

Collapsed cube.

Return type:

iris.cube.Cube

Raises:

ValueError – Latitude and longitude have different dimensions.

esmvalcore.preprocessor.extract_transect(cube: Cube, latitude: None | float | Iterable[float] = None, longitude: None | float | Iterable[float] = None) → Cube[source]#

Extract data along a line of constant latitude or longitude.

Both arguments, latitude and longitude, are treated identically. Either argument can be a single float, or a pair of floats, or can be left empty. The single float indicates the latitude or longitude along which the transect should be extracted. A pair of floats indicate the range that the transect should be extracted along the secondairy axis.

For instance ‘extract_transect(cube, longitude=-28)’ will produce a transect along 28 West.

Also, ‘extract_transect(cube, longitude=-28, latitude=[-50, 50])’ will produce a transect along 28 West between 50 south and 50 North.

This function is not yet implemented for irregular arrays - instead try the extract_trajectory function, but note that it is currently very slow. Alternatively, use the regrid preprocessor to regrid along a regular grid and then extract the transect.

Parameters:

cube (Cube) – Input cube.
latitude (optional) – Transect latitude or range.
longitude (optional) – Transect longitude or range.

Returns:

Collapsed cube.

Return type:

iris.cube.Cube

Raises:

ValueError – Slice extraction not implemented for irregular grids.
ValueError – Latitude and longitude are both floats or lists; not allowed to slice on both axes at the same time.

esmvalcore.preprocessor.extract_volume(cube: Cube, z_min: float, z_max: float, interval_bounds: str = 'open', nearest_value: bool = False) → Cube[source]#

Subset a cube based on a range of values in the z-coordinate.

Function that subsets a cube on a box of (z_min, z_max), (z_min, z_max], [z_min, z_max) or [z_min, z_max] Note that this requires the requested z-coordinate range to be the same sign as the iris cube. ie, if the cube has z-coordinate as negative, then z_min and z_max need to be negative numbers. If nearest_value is set to False, the extraction will be performed with the given z_min and z_max values. If nearest_value is set to True, the cube extraction will be performed taking into account the z_coord values that are closest to the z_min and z_max values.

Parameters:

cube (Cube) – Input cube.
z_min (float) – Minimum depth to extract.
z_max (float) – Maximum depth to extract.
interval_bounds (str) – Sets left bound of the interval to either ‘open’, ‘closed’, ‘left_closed’ or ‘right_closed’.
nearest_value (bool) – Extracts considering the nearest value of z-coord to z_min and z_max.

Returns:

z-coord extracted cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.fix_data(cube: Cube, short_name: str, project: str, dataset: str, mip: str, frequency: str | None = None, session: Session | None = None, **extra_facets: Any) → Cube[source]#

Fix cube data if fixes are required.

This method assumes that metadata is already fixed and checked.

This method collects all the relevant fixes (including generic ones) for a given variable and applies them.

Parameters:

cube (Cube) – Cube to fix.
short_name (str) – Variable’s short name.
project (str) – Project of the dataset.
dataset (str) – Name of the dataset.
mip (str) – Variable’s MIP.
frequency (str | None) – Variable’s data frequency, if available.
session (Session | None) – Current session which includes configuration and directory information.
**extra_facets (Any) – Extra facets. For details, see Extra Facets.

Returns:

Fixed cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.fix_file(file: Path, short_name: str, project: str, dataset: str, mip: str, output_dir: Path, add_unique_suffix: bool = False, session: Session | None = None, frequency: str | None = None, **extra_facets: Any) → Path | Sequence[Cube][source]#

Fix files before loading them into a CubeList.

This is mainly intended to fix errors that prevent loading the data with Iris (e.g., those related to missing_value or _FillValue) or operations that are more efficient with other packages (e.g., loading files with lots of variables is much faster with Xarray than Iris).

Parameters:

file (Path) – Path to the original file. Original files are not overwritten.
short_name (str) – Variable’s short name.
project (str) – Project of the dataset.
dataset (str) – Name of the dataset.
mip (str) – Variable’s MIP.
output_dir (Path) – Output directory for fixed files.
add_unique_suffix (bool) – Adds a unique suffix to output_dir for thread safety.
session (Session | None) – Current session which includes configuration and directory information.
frequency (str | None) – Variable’s data frequency, if available.
**extra_facets (Any) – Extra facets. For details, see Extra Facets.

Returns:

Fixed data or a path to them.

Return type:

Path | Sequence[Cube]

esmvalcore.preprocessor.fix_metadata(cubes: Sequence[Cube], short_name: str, project: str, dataset: str, mip: str, frequency: str | None = None, session: Session | None = None, **extra_facets: Any) → CubeList[source]#

Fix cube metadata if fixes are required.

This method collects all the relevant fixes (including generic ones) for a given variable and applies them.

Parameters:

cubes (Sequence[Cube]) – Cubes to fix.
short_name (str) – Variable’s short name.
project (str) – Project of the dataset.
dataset (str) – Name of the dataset.
mip (str) – Variable’s MIP.
frequency (str | None) – Variable’s data frequency, if available.
session (Session | None) – Current session which includes configuration and directory information.
**extra_facets (Any) – Extra facets. For details, see Extra Facets.

Returns:

Fixed cubes.

Return type:

iris.cube.CubeList

Calculate histogram.

Very similar to numpy.histogram(), but calculates histogram only over the given coordinates.

Handles lazy data and masked data.

Parameters:

cube (Cube) – Input cube.
coords (Iterable[Coord] | Iterable[str] | None) – Coordinates over which the histogram is calculated. If None, calculate the histogram over all coordinates, which results in a scalar cube.
bins (int | Sequence[float]) – If bins is an int, it defines the number of equal-width bins in the given bin_range. If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.
bin_range (tuple[float, float] | None) – The lower and upper range of the bins. If None, bin_range is simply (cube.core_data().min(), cube.core_data().max()). Values outside the range are ignored. The first element of the range must be less than or equal to the second. bin_range affects the automatic bin computation as well if bins is an int (see description for bins above).
weights (np.ndarray | da.Array | bool | None) – Weights for the histogram calculation. Each value in the input data only contributes its associated weight towards the bin count (instead of 1). Weights are normalized before entering the calculation if normalization is 'integral' or 'sum'. Can be an array of the same shape as the input data, False or None (no weighting), or True. In the latter case, weighting will depend on coords, and the following coordinates will trigger weighting: time (will use lengths of time intervals as weights) and latitude (will use cell area weights). Time weights are always calculated from the input data. Area weights can be given as supplementary variables to the recipe (areacella or areacello, see Defining supplementary variables (ancillary variables and cell measures)) or calculated from the input data (this only works for regular grids). By default, NO supplementary variables will be used; they need to be explicitly requested in the recipe.
normalization (Literal['sum', 'integral'] | None) – If None, the result will contain the number of samples in each bin. If 'integral', the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. If 'sum', the result is the value of the probability mass function at the bin, normalized such that the sum over the range is 1. Normalization will be applied across coords, not the entire cube.

Returns:

Histogram cube. The shape of this cube will be (x1, x2, …, n_bins), where xi are the dimensions of the input cube not appearing in coords and n_bins is the number of bins.

Return type:

Cube

Raises:

TypeError – Invalid bin type given
ValueError – Invalid normalization or bin_range given or bin_range is None and data is fully masked.
iris.exceptions.CoordinateNotFoundError – A given coordinate of coords is not found in cube.
iris.exceptions.CoordinateNotFoundError – longitude is not found in cube if weights=True, latitude is in coords, and no cell_area is given as Defining supplementary variables (ancillary variables and cell measures).

esmvalcore.preprocessor.hourly_statistics(cube: Cube, hours: int, operator: str = 'mean', keep_group_coordinates: bool = False, **operator_kwargs: Any) → Cube[source]#

Compute hourly statistics.

Chunks time in x hours periods and computes statistics over them.

Parameters:

cube (Cube) – Input cube.
hours (int) – Number of hours per period. Must be a divisor of 24, i.e., (1, 2, 3, 4, 6, 8, 12).
operator (str) – The operation. Used to determine the iris.analysis.Aggregator object used to calculate the statistics. Allowed options are given in this table.
keep_group_coordinates (bool) – If True, keep temporal group coordinates (i.e., hour_group, day_of_year, year) in the result. If False, remove them.
**operator_kwargs (Any) – Optional keyword arguments for the iris.analysis.Aggregator object defined by operator.

Returns:

Hourly statistics cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.linear_trend(cube, coordinate='time')[source]#

Calculate linear trend of data along a given coordinate.

The linear trend is defined as the slope of an ordinary linear regression.

Parameters:

cube (iris.cube.Cube) – Input data.
coordinate (str, optional (default: 'time')) – Dimensional coordinate over which the trend is calculated.

Returns:

Trends.

Return type:

iris.cube.Cube

Raises:

iris.exceptions.CoordinateNotFoundError – The dimensional coordinate with the name coordinate is not found in cube.

esmvalcore.preprocessor.linear_trend_stderr(cube, coordinate='time')[source]#

Calculate standard error of linear trend along a given coordinate.

This gives the standard error (not confidence intervals!) of the trend defined as the standard error of the estimated slope of an ordinary linear regression.

Parameters:

cube (iris.cube.Cube) – Input data.
coordinate (str, optional (default: 'time')) – Dimensional coordinate over which the standard error of the trend is calculated.

Returns:

Standard errors of trends.

Return type:

iris.cube.Cube

Raises:

iris.exceptions.CoordinateNotFoundError – The dimensional coordinate with the name coordinate is not found in cube.

Load Iris cubes.

Parameters:

file (str | Path | DataElement | Cube | CubeList | Dataset | NcData) – File to be loaded. If file is already a loaded dataset, return it as a CubeList. File as Path object could be a Zarr store.
ignore_warnings (list[dict[str, Any]] | None) – Keyword arguments passed to warnings.filterwarnings() used to ignore warnings issued by iris.load_raw(). Each list element corresponds to one call to warnings.filterwarnings().
backend_kwargs (dict[str, Any] | None) – Dict to hold info needed by storage backend e.g. to access a PRIVATE S3 bucket containing object stores (e.g. netCDF4 files); needed by fsspec and its extensions e.g. s3fs, so most of the times this will include storage_options. Note that Zarr files are opened via http extension of fsspec, so no need for storage_options in that case (ie anon/anon). Currently only used in Zarr file opening.

Returns:

Loaded cubes.

Return type:

iris.cube.CubeList

Raises:

ValueError – Cubes are empty.
TypeError – Invalid type for file.

esmvalcore.preprocessor.local_solar_time(cube: Cube) → Cube[source]#

Convert UTC time coordinate to local solar time (LST).

This preprocessor transforms input data with a UTC-based time coordinate to a local solar time (LST) coordinate. In LST, 12:00 noon is defined as the moment when the sun reaches its highest point in the sky. Thus, LST is mainly determined by longitude of a location. LST is particularly suited to analyze diurnal cycles across larger regions of the globe, which would be phase-shifted against each other when using UTC time.

To transform data from UTC to LST, this function shifts data along the time axis based on the longitude. In addition to the cube’s data, this function also considers auxiliary coordinates, cell measures and ancillary variables that span both the time and longitude dimension.

Note

This preprocessor preserves the temporal frequency of the input data. For example, hourly input data will be transformed into hourly output data. For this, a location’s exact LST will be put into corresponding bins defined by the bounds of the input time coordinate (in this example, the bin size is 1 hour). If time bounds are not given or cannot be approximated (only one time step is given), a bin size of 1 hour is assumed.

LST is approximated as UTC_time + 12*longitude/180, where longitude is assumed to be in [-180, 180] (this function automatically calculates the correct format for the longitude). This is only an approximation since the exact LST also depends on the day of year due to the eccentricity of Earth’s orbit (see equation of time). However, since the corresponding error is ~15 min at most, this is ignored here, as most climate model data has a courser temporal resolution and the time scale for diurnal evolution of meteorological phenomena is usually in the order of hours, not minutes.

Parameters:

cube (Cube) – Input cube. Needs a 1D monotonically increasing dimensional coordinate time (assumed to refer to UTC time) and a 1D coordinate longitude.

Returns:

Transformed cube of same shape as input cube with an LST coordinate instead of UTC time.

Return type:

Cube

Raises:

iris.exceptions.CoordinateNotFoundError – Input cube does not contain valid time and/or longitude coordinate.
iris.exceptions.CoordinateMultiDimError – Input cube has multidimensional longitude coordinate.
ValueError – time coordinate of input cube is not monotonically increasing.

esmvalcore.preprocessor.mask_above_threshold(cube, threshold)[source]#

Mask above a specific threshold value.

Takes a value ‘threshold’ and masks off anything that is above it in the cube data. Values equal to the threshold are not masked.

Parameters:

cube (iris.cube.Cube) – iris cube to be thresholded.
threshold (float) – threshold to be applied on input cube data.

Returns:

thresholded cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.mask_below_threshold(cube, threshold)[source]#

Mask below a specific threshold value.

Takes a value ‘threshold’ and masks off anything that is below it in the cube data. Values equal to the threshold are not masked.

Parameters:

cube (iris.cube.Cube) – iris cube to be thresholded
threshold (float) – threshold to be applied on input cube data.

Returns:

thresholded cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.mask_fillvalues(products: Sequence[PreprocessorFile], threshold_fraction: float, min_value: float | None = None, time_window: int = 1) → Sequence[PreprocessorFile][source]#

Compute and apply a multi-dataset fillvalues mask.

Construct the mask that fills a certain time window with missing values if the number of values in that specific window is less than a given fractional threshold. This function is the extension of _get_fillvalues_mask and performs the combination of missing values masks from each model (of multimodels) into a single fillvalues mask to be applied to each model.

Parameters:

products (Sequence[PreprocessorFile]) – data products to be masked.
threshold_fraction (float) – fractional threshold to be used as argument for Aggregator. Must be between 0 and 1.
min_value (float | None) – minimum value threshold; default None If default, no thresholding applied so the full mask will be selected.
time_window (int) – time window to compute missing data counts; default set to 1.

Returns:

Masked iris cubes.

Raises:

NotImplementedError – Implementation missing for data with higher dimensionality than 4.

Return type:

Sequence[PreprocessorFile]

esmvalcore.preprocessor.mask_glaciated(cube: Cube, mask_out: str = 'glaciated') → Cube[source]#

Mask out glaciated areas.

It applies a Natural Earth mask. Note that for computational reasons only the 10 largest polygons are used for masking.

Parameters:

cube (iris.cube.Cube) – data cube to be masked.
mask_out (str) – “glaciated” to mask out glaciated areas

Returns:

Returns the masked iris cube.

Return type:

iris.cube.Cube

Raises:

ValueError – Error raised if masking on irregular grids is attempted or if mask_out has a wrong value.

esmvalcore.preprocessor.mask_inside_range(cube, minimum, maximum)[source]#

Mask inside a specific threshold range.

Takes a MINIMUM and a MAXIMUM value for the range, and masks off anything that’s between the two in the cube data.

Parameters:

cube (iris.cube.Cube) – iris cube to be thresholded
minimum (float) – lower threshold to be applied on input cube data.
maximum (float) – upper threshold to be applied on input cube data.

Returns:

thresholded cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.mask_landsea(cube: Cube, mask_out: Literal['land', 'sea']) → Cube[source]#

Mask out either land mass or sea (oceans, seas and lakes).

It uses dedicated ancillary variables (sftlf or sftof) or, in their absence, it applies a Natural Earth mask (land or ocean contours). Note that the Natural Earth masks have different resolutions: 10m for land, and 50m for seas. These are more than enough for masking climate model data.

Parameters:

cube (Cube) – Data cube to be masked. If the cube has an iris.coords.AncillaryVariable with standard name 'land_area_fraction' or 'sea_area_fraction' that will be used. If both are present, only the ‘land_area_fraction’ will be used. If the ancillary variable is not available, the mask will be calculated from Natural Earth shapefiles.
mask_out (Literal['land', 'sea']) – Either 'land' to mask out land mass or 'sea' to mask out seas.

Returns:

Returns the masked iris cube.

Return type:

iris.cube.Cube

Raises:

ValueError – Error raised if masking on irregular grids is attempted without an ancillary variable. Irregular grids are not currently supported for masking with Natural Earth shapefile masks.

esmvalcore.preprocessor.mask_landseaice(cube: Cube, mask_out: Literal['landsea', 'ice']) → Cube[source]#

Mask out either landsea (combined) or ice.

Function that masks out either landsea (land and seas) or ice (Antarctica, Greenland and some glaciers).

It uses dedicated ancillary variables (sftgif).

Parameters:

cube (Cube) – Data cube to be masked. It should have an iris.coords.AncillaryVariable with standard name 'land_ice_area_fraction'.
mask_out (str) – Either 'landsea' to mask out land and oceans or 'ice' to mask out ice.

Returns:

Returns the masked iris cube with either land or ice masked out.

Return type:

iris.cube.Cube

Raises:

ValueError – Error raised if landsea-ice mask not found as an ancillary variable.

esmvalcore.preprocessor.mask_multimodel(products)[source]#

Apply common mask to all datasets (using logical OR).

Parameters:

products (iris.cube.CubeList or list of PreprocessorFile) – Data products/cubes to be masked.

Returns:

Masked data products/cubes.

Return type:

iris.cube.CubeList or list of PreprocessorFile

Raises:

ValueError – Datasets have different shapes.
TypeError – Invalid input data.

esmvalcore.preprocessor.mask_outside_range(cube, minimum, maximum)[source]#

Mask outside a specific threshold range.

Takes a MINIMUM and a MAXIMUM value for the range, and masks off anything that’s outside the two in the cube data.

Parameters:

cube (iris.cube.Cube) – iris cube to be thresholded
minimum (float) – lower threshold to be applied on input cube data.
maximum (float) – upper threshold to be applied on input cube data.

Returns:

thresholded cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.meridional_statistics(cube: Cube, operator: str, normalize: Literal['subtract', 'divide'] | None = None, **operator_kwargs: Any) → Cube[source]#

Compute meridional statistics.

Parameters:

cube (Cube) – Input cube.
operator (str) – The operation. Used to determine the iris.analysis.Aggregator object used to calculate the statistics. Allowed options are given in this table.
normalize (Literal['subtract', 'divide'] | None) – If given, do not return the statistics cube itself, but rather, the input cube, normalized with the statistics cube. Can either be subtract (statistics cube is subtracted from the input cube) or divide (input cube is divided by the statistics cube).
**operator_kwargs (Any) – Optional keyword arguments for the iris.analysis.Aggregator object defined by operator.

Returns:

Meridional statistics cube.

Return type:

iris.cube.Cube

Raises:

ValueError – Error raised if computation on irregular grids is attempted. Zonal statistics not yet implemented for irregular grids.

esmvalcore.preprocessor.monthly_statistics(cube: Cube, operator: str = 'mean', keep_group_coordinates: bool = False, **operator_kwargs: Any) → Cube[source]#

Compute monthly statistics.

Chunks time in monthly periods and computes statistics over them;

Parameters:

cube (Cube) – Input cube.
operator (str) – The operation. Used to determine the iris.analysis.Aggregator object used to calculate the statistics. Allowed options are given in this table.
keep_group_coordinates (bool) – If True, keep temporal group coordinates (i.e., month_number, year) in the result. If False, remove them.
**operator_kwargs (Any) – Optional keyword arguments for the iris.analysis.Aggregator object defined by operator.

Returns:

Monthly statistics cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.multi_model_statistics(products: set[PreprocessorFile] | Iterable[Cube], span: str, statistics: list[str | dict], output_products: dict[str, PreprocessorFile] | None = None, groupby: tuple | None = None, keep_input_datasets: bool = True, ignore_scalar_coords: bool = False) → dict | set[source]#

Compute multi-model statistics.

This function computes multi-model statistics on a list of products, which can be instances of Cube or PreprocessorFile. The latter is used internally by ESMValCore to store workflow and provenance information, and this option should typically be ignored.

Cubes must have consistent shapes apart from a potential time dimension. There are two options to combine time coordinates of different lengths, see the span argument.

Desired statistics need to be given as a list, e.g., statistics: ['mean', 'median']. For some statistics like percentiles, it is also possible to pass additional keyword arguments, for example statistics: [{'operator': 'percentile', 'percent': 20}]. A full list of supported statistics is available in the section on Statistical preprocessors.

This function can handle cubes with differing metadata:

Cubes with identical name() and units will get identical values for standard_name, long_name, and var_name (which will be arbitrarily set to the first encountered value if different cubes have different values for them).
attributes: Differing attributes are deleted, see iris.util.equalise_attributes().
cell_methods: All cell methods are deleted prior to combining cubes.
cell_measures(): All cell measures are deleted prior to combining cubes, see esmvalcore.preprocessor.remove_fx_variables().
ancillary_variables(): All ancillary variables are deleted prior to combining cubes, see esmvalcore.preprocessor.remove_fx_variables().
coords(): Exactly identical coordinates are preserved. For coordinates with equal name() and units, names are equalized, attributes deleted and circular is set to False. For all other coordinates, long_name is removed, attributes deleted and circular is set to False. Scalar coordinates can be removed if desired by the option ignore_scalar_coords=True. Please note that some special scalar coordinates which are expected to differ across cubes (ancillary coordinates for derived coordinates like p0 and ptop) are always removed.

Notes

Some of the operators in iris.analysis require additional arguments. Except for percentiles, these operators are currently not supported.

Lazy operation is supported for all statistics, except median.

Parameters:

products (set[PreprocessorFile] | Iterable[Cube]) – Cubes (or products) over which the statistics will be computed.
span (str) – Overlap or full; if overlap, statitstics are computed on common time- span; if full, statistics are computed on full time spans, ignoring missing data. This option is ignored if input cubes do not have time dimensions.
statistics (list[str | dict]) – Statistical operations to be computed, e.g., ['mean', 'median']. For some statistics like percentiles, it is also possible to pass additional keyword arguments, e.g., [{'operator': 'percentile', 'percent': 20}]. All supported options are are given in this table.
output_products (dict[str, PreprocessorFile] | None) – For internal use only. A dict with statistics names as keys and preprocessorfiles as values. If products are passed as input, the statistics cubes will be assigned to these output products.
groupby (tuple | None) – Group products by a given tag or attribute, e.g., (‘project’, ‘dataset’, ‘tag1’). This is ignored if products is a list of cubes.
keep_input_datasets (bool) – If True, the output will include the input datasets. If False, only the computed statistics will be returned.
ignore_scalar_coords (bool) – If True, remove any scalar coordinate in the input datasets before merging the input cubes into the multi-dataset cube. The resulting multi-dataset cube will have no scalar coordinates (the actual input datasets will remain unchanged). If False, scalar coordinates will remain in the input datasets, which might lead to merge conflicts in case the input datasets have different scalar coordinates.

Returns:

A dict of cubes or set of output_products depending on the type of products.

Return type:

dict | set

Raises:

ValueError – If span is neither overlap nor full, or if input type is neither cubes nor products.

esmvalcore.preprocessor.regrid(cube: Cube, target_grid: Cube | Dataset | Path | str | dict, scheme: NamedHorizontalScheme | dict, lat_offset: bool = True, lon_offset: bool = True, cache_weights: bool = False, use_src_coords: Iterable[str] = ('latitude', 'longitude')) → Cube[source]#

Perform horizontal regridding.

Note that the target grid can be a Cube, a Dataset, a path to a cube (Path or str), a grid spec (str) in the form of MxN degrees, or a dict specifying the target grid.

For the latter, the target_grid should be a dict with the following keys:

start_longitude: longitude at the center of the first grid cell.
end_longitude: longitude at the center of the last grid cell.
step_longitude: constant longitude distance between grid cell centers.
start_latitude: latitude at the center of the first grid cell.
end_latitude: latitude at the center of the last grid cell.
step_latitude: constant latitude distance between grid cell centers.

Parameters:

cube (Cube) – The source cube to be regridded.
target_grid (Cube | Dataset | Path | str | dict) – The (location of a) cube that specifies the target or reference grid for the regridding operation. Alternatively, a Dataset can be provided. Alternatively, a CORDEX domain name may be provided, for example EUR-11. Any domain name recognized by the cordex package can be used. Alternatively, a string cell specification may be provided, of the form MxN, which specifies the extent of the cell, longitude by latitude (degrees) for a global, regular target grid. Alternatively, a dictionary with a regional target grid may be specified (see above).
scheme (NamedHorizontalScheme | dict) – The regridding scheme to perform. If the source grid is structured (i.e., rectilinear or curvilinear), can be one of the built-in schemes linear, nearest, area_weighted. If the source grid is unstructured, can be one of the built-in schemes linear, nearest. Alternatively, a dict that specifies generic regridding can be given (see below).
lat_offset (bool) – Offset the grid centers of the latitude coordinate w.r.t. the pole by half a grid step. This argument is ignored if target_grid is a cube or file.
lon_offset (bool) – Offset the grid centers of the longitude coordinate w.r.t. Greenwich meridian by half a grid step. This argument is ignored if target_grid is a cube or file.
cache_weights (bool) – If True, cache regridding weights for later usage. This can speed up the regridding of different datasets with similar source and target grids massively, but may take up a lot of memory for extremely high-resolution data. This option is ignored for schemes that do not support weights caching. More details on this are given in the section on Reusing regridding weights. To clear the cache, use esmvalcore.preprocessor.regrid.cache_clear().
use_src_coords (Iterable[str]) – If there are multiple horizontal coordinates available in the source cube, only use horizontal coordinates with these standard names.

Returns:

Regridded cube.

Return type:

iris.cube.Cube

See also

extract_levels: Perform vertical regridding.

Notes

This preprocessor allows for the use of arbitrary Iris regridding schemes, that is anything that can be passed as a scheme to iris.cube.Cube.regrid() is possible. This enables the use of further parameters for existing schemes, as well as the use of more advanced schemes for example for unstructured grids. To use this functionality, a dictionary must be passed for the scheme with a mandatory entry of reference in the form specified for the object reference of the entry point data model, i.e. importable.module:object.attr. This is used as a factory for the scheme. Any further entries in the dictionary are passed as keyword arguments to the factory.

For example, to use the familiar iris.analysis.Linear regridding scheme with a custom extrapolation mode, use

my_preprocessor:
  regrid:
    target: 1x1
    scheme:
      reference: iris.analysis:Linear
      extrapolation_mode: nanmask

To use the area weighted regridder available in esmf_regrid.schemes.ESMFAreaWeighted use

my_preprocessor:
  regrid:
    target: 1x1
    scheme:
      reference: esmf_regrid.schemes:ESMFAreaWeighted

esmvalcore.preprocessor.regrid_time(cube: Cube, frequency: str, calendar: str | None = None, units: str = 'days since 1850-01-01 00:00:00') → Cube[source]#

Align time coordinate for cubes.

Sets datetimes to common values:

Decadal data (e.g., frequency='dec'): 1 January 00:00:00 for the given year. Example: 1 January 2005 00:00:00 for given year 2005 (decade 2000-2010).
Yearly data (e.g., frequency='yr'): 1 July 00:00:00 for each year. Example: 1 July 1993 00:00:00 for the year 1993.
Monthly data (e.g., frequency='mon'): 15th day 00:00:00 for each month. Example: 15 October 1993 00:00:00 for the month October 1993.
Daily data (e.g., frequency='day'): 12:00:00 for each day. Example: 14 March 1996 12:00:00 for the day 14 March 1996.
n-hourly data where n is a divisor of 24 (e.g., frequency='3hr'): center of each time interval. Example: 03:00:00 for interval 00:00:00-06:00:00 (6-hourly data), 16:30:00 for interval 15:00:00-18:00:00 (3-hourly data), or 09:30:00 for interval 09:00:00-10:00:00 (hourly data).

The corresponding time bounds will be set according to the rules described in esmvalcore.cmor.fixes.get_time_bounds(). The data type of the new time coordinate will be set to float64 (CMOR default for coordinates). Potential auxiliary time coordinates (e.g., day_of_year) are also changed if present.

This function does not alter the data in any way.

Note

By default, this will not change the calendar of the input time coordinate. For decadal, yearly, and monthly data, it is possible to change the calendar using the calendar argument. Be aware that changing the calendar might introduce (small) errors to your data, especially for extensive quantities (those that depend on the period length).

Parameters:

cube (Cube) – Input cube. This input cube will not be modified.
frequency (str) –
Data frequency. Allowed are
- Decadal data (frequency must include dec, e.g., dec)
- Yearly data (frequency must include yr, e.g., yrPt)
- Monthly data (frequency must include mon, e.g., monC)
- Daily data (frequency must include day, e.g., day)
- n-hourly data, where n must be a divisor of 24 (frequency must include nhr, e.g., 6hrPt)
calendar (str | None) – If given, transform the calendar to the one specified (examples: standard, 365_day, etc.). This only works for decadal, yearly and monthly data, and will raise an error for other frequencies. If not set, the calendar will not be changed.
units (str) – Reference time units used if the calendar of the data is changed. Ignored if calendar is not set.

Returns:

Cube with converted time coordinate.

Return type:

iris.cube.Cube

Raises:

NotImplementedError – An invalid frequency is given or calendar is set for a non-supported frequency.

esmvalcore.preprocessor.remove_supplementary_variables(cube: Cube) → Cube[source]#

Remove supplementary variables from cube (in-place).

Strip cell measures or ancillary variables from the cube.

Parameters:: cube (Cube) – Iris cube with data and cell measures or ancillary variables.
Returns:: Cube without cell measures or ancillary variables.
Return type:: iris.cube.Cube

esmvalcore.preprocessor.resample_hours(cube: Cube, interval: int, offset: int = 0, interpolate: Literal['nearest', 'linear'] | None = None) → Cube[source]#

Convert x-hourly data to y-hourly data.

This is intended to be used with instantaneous data.

Examples

resample_hours(cube, interval=6): Six-hourly intervals at 0:00, 6:00, 12:00, 18:00.
resample_hours(cube, interval=6, offset=3): Six-hourly intervals at 3:00, 9:00, 15:00, 21:00.
resample_hours(cube, interval=12, offset=6): Twelve-hourly intervals at 6:00, 18:00.

Parameters:

cube (Cube) – Input cube.
interval (int) – The period (hours) of the desired output data.
offset (int) – The first hour of the desired output data.
interpolate (Literal['nearest', 'linear'] | None) – If interpolate is None (default), convert x-hourly data to y-hourly (y > x) by eliminating extra time steps. If interpolate is ‘nearest’ or ‘linear’, use nearest-neighbor or bilinear interpolation to convert general x-hourly data to general y-hourly data.

Returns:

Cube with the new frequency.

Return type:

iris.cube.Cube

Raises:

ValueError: – interval is not a divisor of 24; invalid interpolate given; or input data does not contain any target hour (if interpolate is None).

esmvalcore.preprocessor.resample_time(cube: Cube, month: int | None = None, day: int | None = None, hour: int | None = None) → Cube[source]#

Change frequency of data by resampling it.

Converts data from one frequency to another by extracting the timesteps that match the provided month, day and/or hour. This is meant to be used with instantaneous values when computing statistics is not desired.

For example:

resample_time(cube, hour=6): Daily values taken at 6:00.
resample_time(cube, day=15, hour=6): Monthly values taken at 15th 6:00.
resample_time(cube, month=6): Yearly values, taking in June
resample_time(cube, month=6, day=1): Yearly values, taking 1st June

The condition must yield only one value per interval: the last two samples above will produce yearly data, but the first one is meant to be used to sample from monthly output and the second one will work better with daily.

Parameters:

cube (Cube) – Input cube.
month (optional) – Month to extract.
day (optional) – Day to extract.
hour (optional) – Hour to extract.

Returns:

Cube with the new frequency.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.rolling_window_statistics(cube: Cube, coordinate: str, operator: str, window_length: int, **operator_kwargs: Any) → Cube[source]#

Compute rolling-window statistics over a coordinate.

Parameters:

cube (Cube) – Input cube.
coordinate (str) – Coordinate over which the rolling-window statistics is calculated.
operator (str) – The operation. Used to determine the iris.analysis.Aggregator object used to calculate the statistics. Allowed options are given in this table.
**operator_kwargs (Any) – Optional keyword arguments for the iris.analysis.Aggregator object defined by operator.
window_length (int) – Size of the window to use.

Returns:

Rolling-window statistics cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.save(cubes: Sequence[Cube], filename: Path | str, optimize_access: str = '', compress: bool = False, alias: str = '', compute: bool = True, **kwargs: Any) → Delayed | None[source]#

Save iris cubes to file.

Parameters:

cubes (Sequence[Cube]) – Data cubes to be saved
filename (Path | str) – Name of target file
optimize_access (str) –
Set internal NetCDF chunking to favour a reading scheme

Values can be map or timeseries, which improve performance when reading the file one map or time series at a time. Users can also provide a coordinate or a list of coordinates. In that case the better performance will be avhieved by loading all the values in that coordinate at a time
compress (bool) – Use NetCDF internal compression.
alias (str) – Var name to use when saving instead of the one in the cube.
compute (bool, default=True) –
Default is True, meaning complete the file immediately, and return None.

When False, create the output file but don’t write any lazy array content to its variables, such as lazy cube data or aux-coord points and bounds. Instead return a dask.delayed.Delayed which, when computed, will stream all the lazy content via dask.store(), to complete the file. Several such data saves can be performed in parallel, by passing a list of them into a dask.compute() call.
**kwargs – See iris.fileformats.netcdf.saver.save() for additional keyword arguments.

Returns:

A delayed object that can be used to save the data in the cube.

Return type:

dask.delayed.Delayed or None

Raises:

ValueError – cubes is empty.

esmvalcore.preprocessor.seasonal_statistics(cube: Cube, operator: str = 'mean', seasons: Iterable[str] = ('DJF', 'MAM', 'JJA', 'SON'), keep_group_coordinates: bool = False, **operator_kwargs: Any) → Cube[source]#

Compute seasonal statistics.

Chunks time seasons and computes statistics over them.

Parameters:

cube (Cube) – Input cube.
operator (str) – The operation. Used to determine the iris.analysis.Aggregator object used to calculate the statistics. Allowed options are given in this table.
seasons (Iterable[str]) – Seasons to build. Available: (‘DJF’, ‘MAM’, ‘JJA’, SON’) (default) and all sequentially correct combinations holding every month of a year: e.g. (‘JJAS’,’ONDJFMAM’), or less in case of prior season extraction.
keep_group_coordinates (bool) – If True, keep temporal group coordinates (i.e., clim_season, season_year) in the result. If False, remove them.
**operator_kwargs (Any) – Optional keyword arguments for the iris.analysis.Aggregator object defined by operator.

Returns:

Seasonal statistic cube.

Return type:

iris.cube.Cube

esmvalcore.preprocessor.timeseries_filter(cube: Cube, window: int, span: int, filter_type: str = 'lowpass', filter_stats: str = 'sum', **operator_kwargs: Any) → Cube[source]#

Apply a timeseries filter.

Method borrowed from iris example

Apply each filter using the rolling_window method used with the weights keyword argument. A weighted sum is required because the magnitude of the weights are just as important as their relative sizes.

See also the iris rolling window iris.cube.Cube.rolling_window.

Parameters:

cube (Cube) – Input cube.
window (int) – The length of the filter window (in units of cube time coordinate).
span (int) – Number of months/days (depending on data frequency) on which weights should be computed e.g. 2-yearly: span = 24 (2 x 12 months). Span should have same units as cube time coordinate.
filter_type (optional) – Type of filter to be applied; default ‘lowpass’. Available types: ‘lowpass’.
filter_stats (optional) – Type of statistic to aggregate on the rolling window; default: sum. Used to determine the iris.analysis.Aggregator object used for aggregation. Allowed options are given in this table.
**operator_kwargs – Optional keyword arguments for the iris.analysis.Aggregator object defined by filter_stats.

Returns:

Cube time-filtered using ‘rolling_window’.

Return type:

iris.cube.Cube

Raises:

iris.exceptions.CoordinateNotFoundError: – Cube does not have time coordinate.
NotImplementedError: – filter_type is not implemented.

esmvalcore.preprocessor.volume_statistics(cube: Cube, operator: str, normalize: Literal['subtract', 'divide'] | None = None, **operator_kwargs: Any) → Cube[source]#

Apply a statistical operation over a volume.

The volume average is weighted according to the cell volume.

Parameters:

cube (Cube) – Input cube. The input cube should have a iris.coords.CellMeasure named 'ocean_volume', unless it has a iris.coords.CellMeasure named 'cell_area' or regular 1D latitude and longitude coordinates so the cell areas can be computed using iris.analysis.cartography.area_weights(). The volume will be computed from the area multiplied by the thickness, computed from the bounds of the vertical coordinate. In that case, vertical coordinate units should be convertible to meters.
operator (str) – The operation. Used to determine the iris.analysis.Aggregator object used to calculate the statistics. Currently, only mean is allowed.
normalize (Literal['subtract', 'divide'] | None) – If given, do not return the statistics cube itself, but rather, the input cube, normalized with the statistics cube. Can either be subtract (statistics cube is subtracted from the input cube) or divide (input cube is divided by the statistics cube).
**operator_kwargs (Any) – Optional keyword arguments for the iris.analysis.Aggregator object defined by operator.

Return type:

Cube

Note

This preprocessor has been designed for oceanic variables, but it might be applicable to atmospheric data as well.

Returns:

Collapsed cube.

Return type:

iris.cube.Cube

Parameters:

cube (Cube)
operator (str)
normalize (Literal['subtract', 'divide'] | None)
operator_kwargs (Any)

esmvalcore.preprocessor.weighting_landsea_fraction(cube, area_type)[source]#

Weight fields using land or sea fraction.

This preprocessor function weights a field with its corresponding land or sea area fraction (value between 0 and 1). The application of this is important for most carbon cycle variables (and other land-surface outputs), which are e.g. reported in units of kgC m-2. This actually refers to ‘per square meter of land/sea’ and NOT ‘per square meter of gridbox’. So in order to integrate these globally or regionally one has to both area-weight the quantity but also weight by the land/sea fraction.

Parameters:

cube (iris.cube.Cube) – Data cube to be weighted. It should have an iris.coords.AncillaryVariable with standard name 'land_area_fraction' or 'sea_area_fraction'. If both are present, only the 'land_area_fraction' will be used.
area_type (str) – Use land ('land') or sea ('sea') fraction for weighting.

Returns:

Land/sea fraction weighted cube.

Return type:

iris.cube.Cube

Raises:

TypeError – area_type is not 'land' or 'sea'.
ValueError – Land/sea fraction variables sftlf or sftof not found.

esmvalcore.preprocessor.zonal_statistics(cube: Cube, operator: str, normalize: Literal['subtract', 'divide'] | None = None, **operator_kwargs: Any) → Cube[source]#

Compute zonal statistics.

Parameters:

cube (Cube) – Input cube.
operator (str) – The operation. Used to determine the iris.analysis.Aggregator object used to calculate the statistics. Allowed options are given in this table.
normalize (Literal['subtract', 'divide'] | None) – If given, do not return the statistics cube itself, but rather, the input cube, normalized with the statistics cube. Can either be subtract (statistics cube is subtracted from the input cube) or divide (input cube is divided by the statistics cube).
**operator_kwargs (Any) – Optional keyword arguments for the iris.analysis.Aggregator object defined by operator.

Returns:

Zonal statistics cube or input cube normalized by statistics cube (see normalize).

Return type:

iris.cube.Cube

Raises:

ValueError – Error raised if computation on irregular grids is attempted. Zonal statistics not yet implemented for irregular grids.