Auxiliary functions for emergent constraints scripts#
Convenience functions for emergent constraints diagnostics.
Functions:
|
Calculate cumulative distribution function for a 1-dimensional PDF. |
|
Check metadata. |
|
|
|
Get array with all relevant parameters of emergent constraint. |
|
Create simple scatterplot of an emergent relationship (without saving). |
|
Export CSV file. |
|
Construct caption from plotting attributes for (feature, label) pair. |
|
Get color palette. |
|
Get constraint on target variable. |
|
Get constraint on target variable from |
|
Extract groups from training data. |
|
Extract input data. |
|
Get input files. |
|
Get provenance record. |
|
Get (X, Y) data for |
|
Convert pandas object to |
|
Plot individual scatterplots for the different groups. |
|
Plot merged scatterplots (all groups in one plot). |
|
Plot distributions of target variable for every feature. |
|
Return x and y coordinates of the regression line (mean and error). |
|
Set appearance of a plot. |
|
Return a function to calculate standard prediction error. |
|
Calculate probability density function (PDF) for target variable. |
- esmvaltool.diag_scripts.emergent_constraints.cdf(data, pdf)[source]#
Calculate cumulative distribution function for a 1-dimensional PDF.
- Parameters:
data (numpy.ndarray) – Data points (1D array).
pdf (numpy.ndarray) – Corresponding probability density function (PDF).
- Returns:
Corresponding cumulative distribution function (CDF).
- Return type:
- esmvaltool.diag_scripts.emergent_constraints.check_metadata(metadata, allowed_var_types=None)[source]#
Check metadata.
- Parameters:
- Raises:
KeyError – Metadata does not contain necessary keys
'var_type'
and'tag'
.ValueError – Got invalid value for key
'var_type'
.
- esmvaltool.diag_scripts.emergent_constraints.constraint_info_array(x_data, y_data, obs_mean, obs_std, n_points=1000, necessary_p_value=None)[source]#
Get array with all relevant parameters of emergent constraint.
- Parameters:
x_data (numpy.ndarray) – X data of the emergent constraint.
y_data (numpy.ndarray) – Y data of the emergent constraint.
obs_mean (float) – Mean of observational data.
obs_std (float) – Standard deviation of observational data.
n_points (int, optional (default: 1000)) – Number of sampled points for PDF of target variable.
necessary_p_value (float, optional) – If given, replace constrained mean and standard deviation with unconstrained values when p-value of emergent relationship is greater than the given necessary p-value.
- Returns:
- Array of shape (8,) with the elements:
Constrained mean of target variable.
Constrained standard deviation of target variable.
Unconstrained mean of target variable.
Unconstrained standard deviation of target variable.
Slope of emergent relationship.
Intercept of emergent relationship.
Correlation coefficient r of emergent relationship.
p-value of emergent relationship.
- Return type:
- esmvaltool.diag_scripts.emergent_constraints.create_simple_scatterplot(x_data, y_data, obs_mean, obs_std)[source]#
Create simple scatterplot of an emergent relationship (without saving).
- Parameters:
x_data (numpy.ndarray) – X data of the emergent constraint.
y_data (numpy.ndarray) – Y data of the emergent constraint.
obs_mean (float) – Mean of observational data.
obs_std (float) – Standard deviation of observational data.
- esmvaltool.diag_scripts.emergent_constraints.export_csv(data_frame, attributes, basename, cfg, tags=None)[source]#
Export CSV file.
- Parameters:
data_frame (pandas.DataFrame) – Data to export.
attributes (dict) – Plot attributes for the different features and the label data. Used to retrieve provenance information.
basename (str) – Basename for the name of the file.
cfg (dict) – Recipe configuration.
tags (iterable of str, optional) – Tags for which provenance information should be retrieved (using
attributes
). If not specified, use (last level of) columns of the givendata_frame
.
- Returns:
Path to the new CSV file.
- Return type:
Construct caption from plotting attributes for (feature, label) pair.
- esmvaltool.diag_scripts.emergent_constraints.get_colors(cfg, groups=None)[source]#
Get color palette.
- Parameters:
- Returns:
List of colors that can be used for
matplotlib
.- Return type:
- esmvaltool.diag_scripts.emergent_constraints.get_constraint(x_data, y_data, obs_mean, obs_std, confidence_level=0.66)[source]#
Get constraint on target variable.
- Parameters:
x_data (numpy.ndarray) – X data of the emergent constraint.
y_data (numpy.ndarray) – Y data of the emergent constraint.
obs_mean (float) – Mean of observational data.
obs_std (float) – Standard deviation of observational data.
confidence_level (float, optional (default: 0.66)) – Confindence level to estimate the range of the target variable.
- Returns:
Lower confidence limit, best estimate and upper confidence limit of target variable.
- Return type:
- esmvaltool.diag_scripts.emergent_constraints.get_constraint_from_df(training_data, pred_input_data, confidence_level=0.66)[source]#
Get constraint on target variable from
pandas.DataFrame
.- Parameters:
training_data (pandas.DataFrame) – Training data (features, label).
pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).
confidence_level (float, optional (default: 0.66)) – Confindence level to estimate the range of the target variable.
- Returns:
Lower confidence limit, best estimate and upper confidence limit of target variable.
- Return type:
- esmvaltool.diag_scripts.emergent_constraints.get_groups(training_data, add_combined_group=False)[source]#
Extract groups from training data.
- Parameters:
training_data (pandas.DataFrame) – Training data (features, label).
add_combined_group (bool, optional (default: False)) – Add combined group of all other groups at the beginning of the returned
list
.
- Returns:
Groups.
- Return type:
- esmvaltool.diag_scripts.emergent_constraints.get_input_data(cfg)[source]#
Extract input data.
Return training data, prediction input data and corresponding attributes.
- Parameters:
cfg (dict) – Recipe configuration.
- Returns:
A tuple containing the training data (
pandas.DataFrame
), the prediction input data (pandas.DataFrame
) and the corresponding attributes (dict
).- Return type:
- esmvaltool.diag_scripts.emergent_constraints.get_input_files(cfg, patterns=None, ignore_patterns=None)[source]#
Get input files.
- esmvaltool.diag_scripts.emergent_constraints.get_provenance_record(attributes, tags, **kwargs)[source]#
Get provenance record.
- Parameters:
attributes (dict) – Plot attributes. All provenance keys need to start with
'provenance_'
.tags (list of str) – Tags used to retrieve data from the
attributes
dict
, i.e. features and/or label.**kwargs (Keyword arguments) – Additional
key:value
pairs directly passed to the provenance recorddict
. All values may include the format strings{feature}
and{label}
.
- Returns:
Provenance record.
- Return type:
- esmvaltool.diag_scripts.emergent_constraints.get_xy_data_without_nans(data_frame, feature, label)[source]#
Get (X, Y) data for
(feature, label)
combination without nans.- Parameters:
data_frame (pandas.DataFrame) – Training data.
feature (str) – Name of the feature data.
label (str) – Name of the label data.
- Returns:
Tuple containing a
pandas.DataFrame
for the X axis (feature) and apandas.DataFrame
for the Y axis (label) without missing values.- Return type:
- esmvaltool.diag_scripts.emergent_constraints.pandas_object_to_cube(pandas_object, index_droplevel=None, columns_droplevel=None, **kwargs)[source]#
Convert pandas object to
iris.cube.Cube
.- Parameters:
pandas_object (pandas.DataFrame or pandas.Series) – Data to convert.
index_droplevel (int or list of int, optional) – Drop levels of index if not
None
.columns_droplevel (int or list of int, optional) – Drop levels of columns if not
None
. Can only be used ifpandas_object
is apandas.DataFrame
.**kwargs (Keyword arguments) – Keyword arguments used for the cube metadata, e.g.
standard_name
,var_name
, etc.
- Returns:
Data cube.
- Return type:
- Raises:
TypeError –
columns_droplevel
is used whenpandas_object
is not apandas.DataFrame
.
- esmvaltool.diag_scripts.emergent_constraints.plot_individual_scatterplots(training_data, pred_input_data, attributes, basename, cfg)[source]#
Plot individual scatterplots for the different groups.
Plot scatterplots for all pairs of
(feature, label)
data (Separate plot for each group).- Parameters:
training_data (pandas.DataFrame) – Training data (features, label).
pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).
attributes (dict) – Plot attributes for the different features and the label data.
basename (str) – Basename for the name of the file.
cfg (dict) – Recipe configuration.
- esmvaltool.diag_scripts.emergent_constraints.plot_merged_scatterplots(training_data, pred_input_data, attributes, basename, cfg)[source]#
Plot merged scatterplots (all groups in one plot).
Plot scatterplots for all pairs of
(feature, label)
data (all groups in one plot).- Parameters:
training_data (pandas.DataFrame) – Training data (features, label).
pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).
attributes (dict) – Plot attributes for the different features and the label data.
basename (str) – Basename for the name of the file.
cfg (dict) – Recipe configuration.
- esmvaltool.diag_scripts.emergent_constraints.plot_target_distributions(training_data, pred_input_data, attributes, basename, cfg)[source]#
Plot distributions of target variable for every feature.
- Parameters:
training_data (pandas.DataFrame) – Training data (features, label).
pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).
attributes (dict) – Plot attributes for the different features and the label data.
basename (str) – Basename for the name of the file.
cfg (dict) – Recipe configuration.
- esmvaltool.diag_scripts.emergent_constraints.regression_line(x_data, y_data, n_points=1000)[source]#
Return x and y coordinates of the regression line (mean and error).
- Parameters:
x_data (numpy.ndarray) – X data used to fit the linear regression.
y_data (numpy.ndarray) – Y data used to fit the linear regression.
n_points (int, optional (default: 1000)) – Number of points for the regression lines.
- Returns:
numpy.ndarray
s for the keys'x'
,'y'
,'y_minus_err'
,'y_plus_err'
,'slope'
,'intercept'
,'pvalue'
and'rvalue'
.- Return type:
- esmvaltool.diag_scripts.emergent_constraints.set_plot_appearance(axes, attributes, **kwargs)[source]#
Set appearance of a plot.
- Parameters:
axes (matplotlib.axes.Axes) – Matplotlib Axes object which contains the plot.
attributes (dict) – Plot attributes.
**kwargs (Keyword arguments) – Keyword arguments of the form
plot_option=tag
whereplot_option
is something likeplot_title
,plot_xlabel
,plot_xlim
, etc. andtag
a key for the plot attributesdict
that describes which attributes should be considered for thatplot_option
.
- esmvaltool.diag_scripts.emergent_constraints.standard_prediction_error(x_data, y_data)[source]#
Return a function to calculate standard prediction error.
The standard prediction error of a linear regression is the error when predicting a data point which was not used to fit the regression line in the first place.
- Parameters:
x_data (numpy.ndarray) – X data used to fit the linear regression.
y_data (numpy.ndarray) – Y data used to fit the linear regression.
- Returns:
Function that takes a
float
as single argument (representing the X value of a new data point) and returns the standard prediction error for that.- Return type:
callable
- esmvaltool.diag_scripts.emergent_constraints.target_pdf(x_data, y_data, obs_mean, obs_std, n_points=1000, necessary_p_value=None)[source]#
Calculate probability density function (PDF) for target variable.
- Parameters:
x_data (numpy.ndarray) – X data of the emergent constraint.
y_data (numpy.ndarray) – Y data of the emergent constraint.
obs_mean (float) – Mean of observational data.
obs_std (float) – Standard deviation of observational data.
n_points (int, optional (default: 1000)) – Number of sampled points for PDF of target variable.
necessary_p_value (float, optional) – If given, return unconstrained PDF (using Gaussian distribution with unconstrained mean and standard deviation) when p-value of emergent relationship is greater than the given necessary p-value.
- Returns:
x and y values for the PDF.
- Return type: