Auxiliary functions for emergent constraints scripts#

Convenience functions for emergent constraints diagnostics.

Functions:

`cdf`(data, pdf)	Calculate cumulative distribution function for a 1-dimensional PDF.
`check_metadata`(metadata[, allowed_var_types])	Check metadata.
`combine_groups`(groups)	Combine `list` of groups to a single `str`.
`constraint_info_array`(x_data, y_data, ...[, ...])	Get array with all relevant parameters of emergent constraint.
`create_simple_scatterplot`(x_data, y_data, ...)	Create simple scatterplot of an emergent relationship (without saving).
`export_csv`(data_frame, attributes, basename, cfg)	Export CSV file.
`get_caption`(attributes, feature, label[, group])	Construct caption from plotting attributes for (feature, label) pair.
`get_colors`(cfg[, groups])	Get color palette.
`get_constraint`(x_data, y_data, obs_mean, obs_std)	Get constraint on target variable.
`get_constraint_from_df`(training_data, ...[, ...])	Get constraint on target variable from `pandas.DataFrame`.
`get_groups`(training_data[, add_combined_group])	Extract groups from training data.
`get_input_data`(cfg)	Extract input data.
`get_input_files`(cfg[, patterns, ignore_patterns])	Get input files.
`get_provenance_record`(attributes, tags, **kwargs)	Get provenance record.
`get_xy_data_without_nans`(data_frame, ...)	Get (X, Y) data for `(feature, label)` combination without nans.
`pandas_object_to_cube`(pandas_object[, ...])	Convert pandas object to `iris.cube.Cube`.
`plot_individual_scatterplots`(training_data, ...)	Plot individual scatterplots for the different groups.
`plot_merged_scatterplots`(training_data, ...)	Plot merged scatterplots (all groups in one plot).
`plot_target_distributions`(training_data, ...)	Plot distributions of target variable for every feature.
`regression_line`(x_data, y_data[, n_points])	Return x and y coordinates of the regression line (mean and error).
`set_plot_appearance`(axes, attributes, **kwargs)	Set appearance of a plot.
`standard_prediction_error`(x_data, y_data)	Return a function to calculate standard prediction error.
`target_pdf`(x_data, y_data, obs_mean, obs_std)	Calculate probability density function (PDF) for target variable.

esmvaltool.diag_scripts.emergent_constraints.cdf(data, pdf)[source]#

Calculate cumulative distribution function for a 1-dimensional PDF.

Parameters:

data (numpy.ndarray) – Data points (1D array).
pdf (numpy.ndarray) – Corresponding probability density function (PDF).

Returns:

Corresponding cumulative distribution function (CDF).

Return type:

numpy.ndarray

esmvaltool.diag_scripts.emergent_constraints.check_metadata(metadata, allowed_var_types=None)[source]#

Check metadata.

Parameters:

metadata (dict) – Metadata to check.
allowed_var_types (list of str, optional) – Allowed var_types, defaults to ALLOWED_VAR_TYPES.

Raises:

KeyError – Metadata does not contain necessary keys 'var_type' and 'tag'.
ValueError – Got invalid value for key 'var_type'.

esmvaltool.diag_scripts.emergent_constraints.combine_groups(groups)[source]#

Combine list of groups to a single str.

Parameters:: groups (list of str) – List of group names.
Returns:: Combined str.
Return type:: str

esmvaltool.diag_scripts.emergent_constraints.constraint_info_array(x_data, y_data, obs_mean, obs_std, n_points=1000, necessary_p_value=None)[source]#

Get array with all relevant parameters of emergent constraint.

Parameters:

x_data (numpy.ndarray) – X data of the emergent constraint.
y_data (numpy.ndarray) – Y data of the emergent constraint.
obs_mean (float) – Mean of observational data.
obs_std (float) – Standard deviation of observational data.
n_points (int, optional (default: 1000)) – Number of sampled points for PDF of target variable.
necessary_p_value (float, optional) – If given, replace constrained mean and standard deviation with unconstrained values when p-value of emergent relationship is greater than the given necessary p-value.

Returns:

Array of shape (8,) with the elements:

Constrained mean of target variable.
Constrained standard deviation of target variable.
Unconstrained mean of target variable.
Unconstrained standard deviation of target variable.
Slope of emergent relationship.
Intercept of emergent relationship.
Correlation coefficient r of emergent relationship.
p-value of emergent relationship.

Return type:

numpy.ndarray

esmvaltool.diag_scripts.emergent_constraints.create_simple_scatterplot(x_data, y_data, obs_mean, obs_std)[source]#

Create simple scatterplot of an emergent relationship (without saving).

Parameters:

x_data (numpy.ndarray) – X data of the emergent constraint.
y_data (numpy.ndarray) – Y data of the emergent constraint.
obs_mean (float) – Mean of observational data.
obs_std (float) – Standard deviation of observational data.

esmvaltool.diag_scripts.emergent_constraints.export_csv(data_frame, attributes, basename, cfg, tags=None)[source]#

Export CSV file.

Parameters:

data_frame (pandas.DataFrame) – Data to export.
attributes (dict) – Plot attributes for the different features and the label data. Used to retrieve provenance information.
basename (str) – Basename for the name of the file.
cfg (dict) – Recipe configuration.
tags (iterable of str, optional) – Tags for which provenance information should be retrieved (using attributes). If not specified, use (last level of) columns of the given data_frame.

Returns:

Path to the new CSV file.

Return type:

str

esmvaltool.diag_scripts.emergent_constraints.get_caption(attributes, feature, label, group=None)[source]#

Construct caption from plotting attributes for (feature, label) pair.

Parameters:

attributes (dict) – Plot attributes.
feature (str) – Feature.
label (str) – Label.
group (str, optional) – Group.

Returns:

Caption.

Return type:

str

Raises:

KeyError – attributes does not include necessary keys.

esmvaltool.diag_scripts.emergent_constraints.get_colors(cfg, groups=None)[source]#

Get color palette.

Parameters:

cfg (dict) – Recipe configuration.
groups (list, optional) – Use to check whether color for combining groups has to be added.

Returns:

List of colors that can be used for matplotlib.

Return type:

list

esmvaltool.diag_scripts.emergent_constraints.get_constraint(x_data, y_data, obs_mean, obs_std, confidence_level=0.66)[source]#

Get constraint on target variable.

Parameters:

x_data (numpy.ndarray) – X data of the emergent constraint.
y_data (numpy.ndarray) – Y data of the emergent constraint.
obs_mean (float) – Mean of observational data.
obs_std (float) – Standard deviation of observational data.
confidence_level (float, optional (default: 0.66)) – Confindence level to estimate the range of the target variable.

Returns:

Lower confidence limit, best estimate and upper confidence limit of target variable.

Return type:

tuple of float

esmvaltool.diag_scripts.emergent_constraints.get_constraint_from_df(training_data, pred_input_data, confidence_level=0.66)[source]#

Get constraint on target variable from pandas.DataFrame.

Parameters:

training_data (pandas.DataFrame) – Training data (features, label).
pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).
confidence_level (float, optional (default: 0.66)) – Confindence level to estimate the range of the target variable.

Returns:

Lower confidence limit, best estimate and upper confidence limit of target variable.

Return type:

tuple of float

esmvaltool.diag_scripts.emergent_constraints.get_groups(training_data, add_combined_group=False)[source]#

Extract groups from training data.

Parameters:

training_data (pandas.DataFrame) – Training data (features, label).
add_combined_group (bool, optional (default: False)) – Add combined group of all other groups at the beginning of the returned list.

Returns:

Groups.

Return type:

list of str

esmvaltool.diag_scripts.emergent_constraints.get_input_data(cfg)[source]#

Extract input data.

Return training data, prediction input data and corresponding attributes.

Parameters:: cfg (dict) – Recipe configuration.
Returns:: A tuple containing the training data (pandas.DataFrame), the prediction input data (pandas.DataFrame) and the corresponding attributes (dict).
Return type:: tuple

esmvaltool.diag_scripts.emergent_constraints.get_input_files(cfg, patterns=None, ignore_patterns=None)[source]#

Get input files.

Parameters:

cfg (dict) – Recipe configuration.
patterns (list of str, optional) – Use only ancestor files that match these patterns as input files.
ignore_patterns (list of str, optional) – Ignore input files that match these patterns.

Returns:

Input files.

Return type:

list of str

esmvaltool.diag_scripts.emergent_constraints.get_provenance_record(attributes, tags, **kwargs)[source]#

Get provenance record.

Parameters:

attributes (dict) – Plot attributes. All provenance keys need to start with 'provenance_'.
tags (list of str) – Tags used to retrieve data from the attributes dict, i.e. features and/or label.
**kwargs (Keyword arguments) – Additional key:value pairs directly passed to the provenance record dict. All values may include the format strings {feature} and {label}.

Returns:

Provenance record.

Return type:

dict

esmvaltool.diag_scripts.emergent_constraints.get_xy_data_without_nans(data_frame, feature, label)[source]#

Get (X, Y) data for (feature, label) combination without nans.

Parameters:

data_frame (pandas.DataFrame) – Training data.
feature (str) – Name of the feature data.
label (str) – Name of the label data.

Returns:

Tuple containing a pandas.DataFrame for the X axis (feature) and a pandas.DataFrame for the Y axis (label) without missing values.

Return type:

tuple

esmvaltool.diag_scripts.emergent_constraints.pandas_object_to_cube(pandas_object, index_droplevel=None, columns_droplevel=None, **kwargs)[source]#

Convert pandas object to iris.cube.Cube.

Parameters:

pandas_object (pandas.DataFrame or pandas.Series) – Data to convert.
index_droplevel (int or list of int, optional) – Drop levels of index if not None.
columns_droplevel (int or list of int, optional) – Drop levels of columns if not None. Can only be used if pandas_object is a pandas.DataFrame.
**kwargs (Keyword arguments) – Keyword arguments used for the cube metadata, e.g. standard_name, var_name, etc.

Returns:

Data cube.

Return type:

iris.cube.Cube

Raises:

TypeError – columns_droplevel is used when pandas_object is not a pandas.DataFrame.

esmvaltool.diag_scripts.emergent_constraints.plot_individual_scatterplots(training_data, pred_input_data, attributes, basename, cfg)[source]#

Plot individual scatterplots for the different groups.

Plot scatterplots for all pairs of (feature, label) data (Separate plot for each group).

Parameters:

training_data (pandas.DataFrame) – Training data (features, label).
pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).
attributes (dict) – Plot attributes for the different features and the label data.
basename (str) – Basename for the name of the file.
cfg (dict) – Recipe configuration.

esmvaltool.diag_scripts.emergent_constraints.plot_merged_scatterplots(training_data, pred_input_data, attributes, basename, cfg)[source]#

Plot merged scatterplots (all groups in one plot).

Plot scatterplots for all pairs of (feature, label) data (all groups in one plot).

Parameters:

training_data (pandas.DataFrame) – Training data (features, label).
pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).
attributes (dict) – Plot attributes for the different features and the label data.
basename (str) – Basename for the name of the file.
cfg (dict) – Recipe configuration.

esmvaltool.diag_scripts.emergent_constraints.plot_target_distributions(training_data, pred_input_data, attributes, basename, cfg)[source]#

Plot distributions of target variable for every feature.

Parameters:

training_data (pandas.DataFrame) – Training data (features, label).
pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).
attributes (dict) – Plot attributes for the different features and the label data.
basename (str) – Basename for the name of the file.
cfg (dict) – Recipe configuration.

esmvaltool.diag_scripts.emergent_constraints.regression_line(x_data, y_data, n_points=1000)[source]#

Return x and y coordinates of the regression line (mean and error).

Parameters:

x_data (numpy.ndarray) – X data used to fit the linear regression.
y_data (numpy.ndarray) – Y data used to fit the linear regression.
n_points (int, optional (default: 1000)) – Number of points for the regression lines.

Returns:

numpy.ndarray s for the keys 'x', 'y', 'y_minus_err', 'y_plus_err', 'slope', 'intercept', 'pvalue' and 'rvalue'.

Return type:

dict

esmvaltool.diag_scripts.emergent_constraints.set_plot_appearance(axes, attributes, **kwargs)[source]#

Set appearance of a plot.

Parameters:

axes (matplotlib.axes.Axes) – Matplotlib Axes object which contains the plot.
attributes (dict) – Plot attributes.
**kwargs (Keyword arguments) – Keyword arguments of the form plot_option=tag where plot_option is something like plot_title, plot_xlabel, plot_xlim, etc. and tag a key for the plot attributes dict that describes which attributes should be considered for that plot_option.

esmvaltool.diag_scripts.emergent_constraints.standard_prediction_error(x_data, y_data)[source]#

Return a function to calculate standard prediction error.

The standard prediction error of a linear regression is the error when predicting a data point which was not used to fit the regression line in the first place.

Parameters:

x_data (numpy.ndarray) – X data used to fit the linear regression.
y_data (numpy.ndarray) – Y data used to fit the linear regression.

Returns:

Function that takes a float as single argument (representing the X value of a new data point) and returns the standard prediction error for that.

Return type:

callable

esmvaltool.diag_scripts.emergent_constraints.target_pdf(x_data, y_data, obs_mean, obs_std, n_points=1000, necessary_p_value=None)[source]#

Calculate probability density function (PDF) for target variable.

Parameters:

x_data (numpy.ndarray) – X data of the emergent constraint.
y_data (numpy.ndarray) – Y data of the emergent constraint.
obs_mean (float) – Mean of observational data.
obs_std (float) – Standard deviation of observational data.
n_points (int, optional (default: 1000)) – Number of sampled points for PDF of target variable.
necessary_p_value (float, optional) – If given, return unconstrained PDF (using Gaussian distribution with unconstrained mean and standard deviation) when p-value of emergent relationship is greater than the given necessary p-value.

Returns:

x and y values for the PDF.

Return type:

tuple of numpy.ndarray