Auxiliary functions for emergent constraints scripts

Convenience functions for emergent constraints diagnostics.

Functions:

cdf(data, pdf)

Calculate cumulative distribution function for a 1-dimensional PDF.

check_metadata(metadata[, allowed_var_types])

Check metadata.

combine_groups(groups)

Combine list of groups to a single str.

constraint_info_array(x_data, y_data, ...[, ...])

Get array with all relevant parameters of emergent constraint.

create_simple_scatterplot(x_data, y_data, ...)

Create simple scatterplot of an emergent relationship (without saving).

export_csv(data_frame, attributes, basename, cfg)

Export CSV file.

get_caption(attributes, feature, label[, group])

Construct caption from plotting attributes for (feature, label) pair.

get_colors(cfg[, groups])

Get color palette.

get_constraint(x_data, y_data, obs_mean, obs_std)

Get constraint on target variable.

get_constraint_from_df(training_data, ...[, ...])

Get constraint on target variable from pandas.DataFrame.

get_groups(training_data[, add_combined_group])

Extract groups from training data.

get_input_data(cfg)

Extract input data.

get_input_files(cfg[, patterns, ignore_patterns])

Get input files.

get_provenance_record(attributes, tags, **kwargs)

Get provenance record.

get_xy_data_without_nans(data_frame, ...)

Get (X, Y) data for (feature, label) combination without nans.

pandas_object_to_cube(pandas_object[, ...])

Convert pandas object to iris.cube.Cube.

plot_individual_scatterplots(training_data, ...)

Plot individual scatterplots for the different groups.

plot_merged_scatterplots(training_data, ...)

Plot merged scatterplots (all groups in one plot).

plot_target_distributions(training_data, ...)

Plot distributions of target variable for every feature.

regression_line(x_data, y_data[, n_points])

Return x and y coordinates of the regression line (mean and error).

set_plot_appearance(axes, attributes, **kwargs)

Set appearance of a plot.

standard_prediction_error(x_data, y_data)

Return a function to calculate standard prediction error.

target_pdf(x_data, y_data, obs_mean, obs_std)

Calculate probability density function (PDF) for target variable.

esmvaltool.diag_scripts.emergent_constraints.cdf(data, pdf)[source]

Calculate cumulative distribution function for a 1-dimensional PDF.

Parameters
Returns

Corresponding cumulative distribution function (CDF).

Return type

numpy.ndarray

esmvaltool.diag_scripts.emergent_constraints.check_metadata(metadata, allowed_var_types=None)[source]

Check metadata.

Parameters
  • metadata (dict) – Metadata to check.

  • allowed_var_types (list of str, optional) – Allowed var_types, defaults to ALLOWED_VAR_TYPES.

Raises
  • KeyError – Metadata does not contain necessary keys 'var_type' and 'tag'.

  • ValueError – Got invalid value for key 'var_type'.

esmvaltool.diag_scripts.emergent_constraints.combine_groups(groups)[source]

Combine list of groups to a single str.

Parameters

groups (list of str) – List of group names.

Returns

Combined str.

Return type

str

esmvaltool.diag_scripts.emergent_constraints.constraint_info_array(x_data, y_data, obs_mean, obs_std, n_points=1000, necessary_p_value=None)[source]

Get array with all relevant parameters of emergent constraint.

Parameters
  • x_data (numpy.ndarray) – X data of the emergent constraint.

  • y_data (numpy.ndarray) – Y data of the emergent constraint.

  • obs_mean (float) – Mean of observational data.

  • obs_std (float) – Standard deviation of observational data.

  • n_points (int, optional (default: 1000)) – Number of sampled points for PDF of target variable.

  • necessary_p_value (float, optional) – If given, replace constrained mean and standard deviation with unconstrained values when p-value of emergent relationship is greater than the given necessary p-value.

Returns

Array of shape (8,) with the elements:
  1. Constrained mean of target variable.

  2. Constrained standard deviation of target variable.

  3. Unconstrained mean of target variable.

  4. Unconstrained standard deviation of target variable.

  5. Slope of emergent relationship.

  6. Intercept of emergent relationship.

  7. Correlation coefficient r of emergent relationship.

  8. p-value of emergent relationship.

Return type

numpy.ndarray

esmvaltool.diag_scripts.emergent_constraints.create_simple_scatterplot(x_data, y_data, obs_mean, obs_std)[source]

Create simple scatterplot of an emergent relationship (without saving).

Parameters
  • x_data (numpy.ndarray) – X data of the emergent constraint.

  • y_data (numpy.ndarray) – Y data of the emergent constraint.

  • obs_mean (float) – Mean of observational data.

  • obs_std (float) – Standard deviation of observational data.

esmvaltool.diag_scripts.emergent_constraints.export_csv(data_frame, attributes, basename, cfg, tags=None)[source]

Export CSV file.

Parameters
  • data_frame (pandas.DataFrame) – Data to export.

  • attributes (dict) – Plot attributes for the different features and the label data. Used to retrieve provenance information.

  • basename (str) – Basename for the name of the file.

  • cfg (dict) – Recipe configuration.

  • tags (iterable of str, optional) – Tags for which provenance information should be retrieved (using attributes). If not specified, use (last level of) columns of the given data_frame.

Returns

Path to the new CSV file.

Return type

str

esmvaltool.diag_scripts.emergent_constraints.get_caption(attributes, feature, label, group=None)[source]

Construct caption from plotting attributes for (feature, label) pair.

Parameters
  • attributes (dict) – Plot attributes.

  • feature (str) – Feature.

  • label (str) – Label.

  • group (str, optional) – Group.

Returns

Caption.

Return type

str

Raises

KeyErrorattributes does not include necessary keys.

esmvaltool.diag_scripts.emergent_constraints.get_colors(cfg, groups=None)[source]

Get color palette.

Parameters
  • cfg (dict) – Recipe configuration.

  • groups (list, optional) – Use to check whether color for combining groups has to be added.

Returns

List of colors that can be used for matplotlib.

Return type

list

esmvaltool.diag_scripts.emergent_constraints.get_constraint(x_data, y_data, obs_mean, obs_std, confidence_level=0.66)[source]

Get constraint on target variable.

Parameters
  • x_data (numpy.ndarray) – X data of the emergent constraint.

  • y_data (numpy.ndarray) – Y data of the emergent constraint.

  • obs_mean (float) – Mean of observational data.

  • obs_std (float) – Standard deviation of observational data.

  • confidence_level (float, optional (default: 0.66)) – Confindence level to estimate the range of the target variable.

Returns

Lower confidence limit, best estimate and upper confidence limit of target variable.

Return type

tuple of float

esmvaltool.diag_scripts.emergent_constraints.get_constraint_from_df(training_data, pred_input_data, confidence_level=0.66)[source]

Get constraint on target variable from pandas.DataFrame.

Parameters
  • training_data (pandas.DataFrame) – Training data (features, label).

  • pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).

  • confidence_level (float, optional (default: 0.66)) – Confindence level to estimate the range of the target variable.

Returns

Lower confidence limit, best estimate and upper confidence limit of target variable.

Return type

tuple of float

esmvaltool.diag_scripts.emergent_constraints.get_groups(training_data, add_combined_group=False)[source]

Extract groups from training data.

Parameters
  • training_data (pandas.DataFrame) – Training data (features, label).

  • add_combined_group (bool, optional (default: False)) – Add combined group of all other groups at the beginning of the returned list.

Returns

Groups.

Return type

list of str

esmvaltool.diag_scripts.emergent_constraints.get_input_data(cfg)[source]

Extract input data.

Return training data, prediction input data and corresponding attributes.

Parameters

cfg (dict) – Recipe configuration.

Returns

A tuple containing the training data (pandas.DataFrame), the prediction input data (pandas.DataFrame) and the corresponding attributes (dict).

Return type

tuple

esmvaltool.diag_scripts.emergent_constraints.get_input_files(cfg, patterns=None, ignore_patterns=None)[source]

Get input files.

Parameters
  • cfg (dict) – Recipe configuration.

  • patterns (list of str, optional) – Use only ancestor files that match these patterns as input files.

  • ignore_patterns (list of str, optional) – Ignore input files that match these patterns.

Returns

Input files.

Return type

list of str

esmvaltool.diag_scripts.emergent_constraints.get_provenance_record(attributes, tags, **kwargs)[source]

Get provenance record.

Parameters
  • attributes (dict) – Plot attributes. All provenance keys need to start with 'provenance_'.

  • tags (list of str) – Tags used to retrieve data from the attributes dict, i.e. features and/or label.

  • **kwargs (Keyword arguments) – Additional key:value pairs directly passed to the provenance record dict. All values may include the format strings {feature} and {label}.

Returns

Provenance record.

Return type

dict

esmvaltool.diag_scripts.emergent_constraints.get_xy_data_without_nans(data_frame, feature, label)[source]

Get (X, Y) data for (feature, label) combination without nans.

Parameters
  • data_frame (pandas.DataFrame) – Training data.

  • feature (str) – Name of the feature data.

  • label (str) – Name of the label data.

Returns

Tuple containing a pandas.DataFrame for the X axis (feature) and a pandas.DataFrame for the Y axis (label) without missing values.

Return type

tuple

esmvaltool.diag_scripts.emergent_constraints.pandas_object_to_cube(pandas_object, index_droplevel=None, columns_droplevel=None, **kwargs)[source]

Convert pandas object to iris.cube.Cube.

Parameters
  • pandas_object (pandas.DataFrame or pandas.Series) – Data to convert.

  • index_droplevel (int or list of int, optional) – Drop levels of index if not None.

  • columns_droplevel (int or list of int, optional) – Drop levels of columns if not None. Can only be used if pandas_object is a pandas.DataFrame.

  • **kwargs (Keyword arguments) – Keyword arguments used for the cube metadata, e.g. standard_name, var_name, etc.

Returns

Data cube.

Return type

iris.cube.Cube

Raises

TypeErrorcolumns_droplevel is used when pandas_object is not a pandas.DataFrame.

esmvaltool.diag_scripts.emergent_constraints.plot_individual_scatterplots(training_data, pred_input_data, attributes, basename, cfg)[source]

Plot individual scatterplots for the different groups.

Plot scatterplots for all pairs of (feature, label) data (Separate plot for each group).

Parameters
  • training_data (pandas.DataFrame) – Training data (features, label).

  • pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).

  • attributes (dict) – Plot attributes for the different features and the label data.

  • basename (str) – Basename for the name of the file.

  • cfg (dict) – Recipe configuration.

esmvaltool.diag_scripts.emergent_constraints.plot_merged_scatterplots(training_data, pred_input_data, attributes, basename, cfg)[source]

Plot merged scatterplots (all groups in one plot).

Plot scatterplots for all pairs of (feature, label) data (all groups in one plot).

Parameters
  • training_data (pandas.DataFrame) – Training data (features, label).

  • pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).

  • attributes (dict) – Plot attributes for the different features and the label data.

  • basename (str) – Basename for the name of the file.

  • cfg (dict) – Recipe configuration.

esmvaltool.diag_scripts.emergent_constraints.plot_target_distributions(training_data, pred_input_data, attributes, basename, cfg)[source]

Plot distributions of target variable for every feature.

Parameters
  • training_data (pandas.DataFrame) – Training data (features, label).

  • pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).

  • attributes (dict) – Plot attributes for the different features and the label data.

  • basename (str) – Basename for the name of the file.

  • cfg (dict) – Recipe configuration.

esmvaltool.diag_scripts.emergent_constraints.regression_line(x_data, y_data, n_points=1000)[source]

Return x and y coordinates of the regression line (mean and error).

Parameters
  • x_data (numpy.ndarray) – X data used to fit the linear regression.

  • y_data (numpy.ndarray) – Y data used to fit the linear regression.

  • n_points (int, optional (default: 1000)) – Number of points for the regression lines.

Returns

numpy.ndarray s for the keys 'x', 'y', 'y_minus_err', 'y_plus_err', 'slope', 'intercept', 'pvalue' and 'rvalue'.

Return type

dict

esmvaltool.diag_scripts.emergent_constraints.set_plot_appearance(axes, attributes, **kwargs)[source]

Set appearance of a plot.

Parameters
  • axes (matplotlib.axes.Axes) – Matplotlib Axes object which contains the plot.

  • attributes (dict) – Plot attributes.

  • **kwargs (Keyword arguments) – Keyword arguments of the form plot_option=tag where plot_option is something like plot_title, plot_xlabel, plot_xlim, etc. and tag a key for the plot attributes dict that describes which attributes should be considered for that plot_option.

esmvaltool.diag_scripts.emergent_constraints.standard_prediction_error(x_data, y_data)[source]

Return a function to calculate standard prediction error.

The standard prediction error of a linear regression is the error when predicting a data point which was not used to fit the regression line in the first place.

Parameters
  • x_data (numpy.ndarray) – X data used to fit the linear regression.

  • y_data (numpy.ndarray) – Y data used to fit the linear regression.

Returns

Function that takes a float as single argument (representing the X value of a new data point) and returns the standard prediction error for that.

Return type

callable

esmvaltool.diag_scripts.emergent_constraints.target_pdf(x_data, y_data, obs_mean, obs_std, n_points=1000, necessary_p_value=None)[source]

Calculate probability density function (PDF) for target variable.

Parameters
  • x_data (numpy.ndarray) – X data of the emergent constraint.

  • y_data (numpy.ndarray) – Y data of the emergent constraint.

  • obs_mean (float) – Mean of observational data.

  • obs_std (float) – Standard deviation of observational data.

  • n_points (int, optional (default: 1000)) – Number of sampled points for PDF of target variable.

  • necessary_p_value (float, optional) – If given, return unconstrained PDF (using Gaussian distribution with unconstrained mean and standard deviation) when p-value of emergent relationship is greater than the given necessary p-value.

Returns

x and y values for the PDF.

Return type

tuple of numpy.ndarray