Postprocessing functionalities

Simple postprocessing of MLR model output.

Description

This diagnostic performs postprocessing operations for MLR model output (mean and error).

Author

Manuel Schlund (DLR, Germany)

Project

CRESCENDO

Notes

Prior to postprocessing, this diagnostic groups input datasets according to tag and prediction_name. For each group, accepts datasets with three different var_type s:

prediction_output: Exactly one necessary, refers to the mean prediction and serves as reference dataset (regarding shape).
prediction_output_error: Arbitrary number of error datasets. If not given, error calculation is skipped. May be squared errors (marked by the attribute squared) or not. In addition, a single covariance dataset can be specified (short_name ending with _cov).
prediction_input: Dataset used to estimate covariance structure of the mean prediction (i.e. matrix of Pearson correlation coefficients) for error estimation. At most one dataset allowed. Ignored when no prediction_output_error is given. This is only possible when (1) the shape of the prediction_input dataset is identical to the shape of the prediction_output_error datasets, (2) the number of dimensions of the prediction_input dataset is higher than the number of dimensions of the prediction_output_error datasets and they have identical trailing (rightmost) dimensions or (3) the number of dimensions of the prediction_input dataset is higher than the number of dimensions of prediction_output_error datasets and all dimensions of the prediction_output_error datasets are mapped to a corresponding dimension of the prediction_input using the cov_estimate_dim_map option (e.g. when prediction_input has shape (10, 5, 100, 20) and prediction_output_error has shape (5, 20), you can use cov_estimate_dim_map: [1, 3] to map the dimensions of prediction_output_error to dimension 1 and 3 of prediction_input).

All data with other var_type s is ignored (feature, label, etc.).

Real error calculation (using covariance dataset given as prediction_output_error) and estimation (using prediction_input dataset to estimate covariance structure) is only possible if the mean prediction cube is collapsed completely during postprocessing, i.e. all coordinates are listed for either mean or sum.

Configuration options in recipe

add_var_from_cov: bool, optional (default: True): Calculate variances from covariance matrix (diagonal elements) and add those to (squared) error datasets. Set to False if variance is already given separately in prediction output.
area_weighted: bool, optional (default: True): Calculate weighted averages/sums when collapsing over latitude and/or longitude coordinates using grid cell areas (calculated using grid cell bounds). Only possible for datasets on regular grids that contain latitude and longitude coordinates.
convert_units_to: str, optional: Convert units of the input data.
cov_estimate_dim_map: list of int, optional: Map dimensions of prediction_output_error datasets to corresponding dimensions of prediction_input used for estimating covariance. Only relevant if both dataset types are given. See notes above for more information.
ignore: list of dict, optional: Ignore specific datasets by specifying multiple dict s of metadata.
landsea_fraction_weighted: str, optional: When given, calculate weighted averages/sums when collapsing over latitude and/or longitude coordinates using land/sea fraction (calculated using Natural Earth masks). Only possible if the datasets contains latitude and longitude coordinates. Must be one of 'land', 'sea'.
mean: list of str, optional: Perform mean over the given coordinates.
pattern: str, optional: Pattern matched against ancestor file names.
sum: list of str, optional: Perform sum over the given coordinates.
time_weighted: bool, optional (default: True): Calculate weighted averages/sums for time (using time bounds).