MLR main diagnostic#
Main Diagnostic script to create MLR models.
Description#
This diagnostic script creates Machine Learning Regression (MLR) models which
use inter-model relations between process-based predictors (usually from the
past/present climate) and a target variable (usually a projection of the future
climate) to get a constrained prediction of the target variable. It provides an
interface for using MLR models (subclasses of
esmvaltool.diag_scripts.mlr.models.MLRModel
).
Project#
CRESCENDO
Configuration options in recipe#
- efecv_kwargs: dict, optional
If specified, use these additional keyword arguments to perform a exhaustive feature elimination using cross-validation. May not be used together with
grid_search_cv_param_grid
orrfecv_kwargs
.- grid_search_cv_kwargs: dict, optional
Keyword arguments for the grid search cross-validation, see https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html.
- grid_search_cv_param_grid: dict or list of dict, optional
If specified, perform exhaustive parameter search using cross-validation instead of simply calling
esmvaltool.diag_scripts.mlr.models.MLRModel.fit()
. Contains parameters (keys) and ranges (values) for the exhaustive parameter search. Have to be given for each step of the pipeline separated by two underscores, i.e.s__p
is the parameterp
for steps
. May not be used together withefecv_kwargs
orrfecv_kwargs
.- group_metadata: str, optional
Group input data by an attribute. For every group element (set of datasets), an individual MLR model is calculated. Only affects
feature
andlabel
datasets. May be used together with the optionpseudo_reality
.- ignore: list of dict, optional
Ignore specific datasets by specifying multiple
dict
s of metadata.- mlr_model_type: str
MLR model type. The given model has to be defined in
esmvaltool.diag_scripts.mlr.models
.- only_predict: bool, optional (default: False)
If
True
, only useesmvaltool.diag_scripts.mlr.models.MLRModel.predict()
and do not create any other output (CSV files, plots, etc.).- pattern: str, optional
Pattern matched against ancestor file names.
- plot_partial_dependences: bool, optional (default: False)
Plot partial dependence of every feature in MLR model (computationally expensive).
- predict_kwargs: dict, optional
Optional keyword arguments for the final regressor’s
predict()
function.- pseudo_reality: list of str, optional
List of dataset attributes which are used to group input data for a pseudo- reality test (also known as model-as-truth or perfect-model setup). For every element of the group a single MLR model is fitted on all data except for that of the specified group element. This group element is then used as additional
prediction_input
andprediction_reference
. This allows a direct assessment of the predictive power of the MLR model by comparing the MLR prediction output and the true labels (similar to splitting the input data in a training and test set, but not dividing the data randomly but using specific datasets, e.g. the different climate models). May be used together with the optiongroup_metadata
.- rfecv_kwargs: dict, optional
If specified, use these additional keyword arguments to perform a recursive feature elimination using cross-validation, see https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html. May not be used together with
efecv_kwargs
orgrid_search_cv_param_grid
.- save_mlr_model_error: str or int, optional
Additionally saves estimated squared MLR model error. This error represents the uncertainty of the prediction caused by the MLR model itself and not by errors in the prediction input data (errors in that will be considered by including datasets with
var_type
set toprediction_input_error
and settingsave_propagated_errors
toTrue
). If the option is set to'test'
, the (constant) error is estimated as RMSEP using a (hold-out) test data set. Only possible if test data is available, i.e. the optiontest_size
is not set toFalse
during class initialization. If the option is set to'logo'
, the (constant) error is estimated as RMSEP using leave-one-group-out cross-validation using the group_attributes. Only possible ifgroup_datasets_by_attributes
is given. If the option is set to an integern
(!= 0), the (constant) error is estimated as RMSEP using n-fold cross-validation.- save_lime_importance: bool, optional (default: False)
Additionally save local feature importance given by LIME (Local Interpretable Model-agnostic Explanations).
- save_propagated_errors: bool, optional (default: False)
Additionally save propagated errors from
prediction_input_error
datasets.- select_metadata: dict, optional
Pre-select input data by specifying (key, value) pairs. Affects all datasets regardless of
var_type
.
Additional optional parameters are optional parameters for
esmvaltool.diag_scripts.mlr.models.MLRModel
given here or optional parameters of
esmvaltool.diag_scripts.mlr.mmm
if mlr_model_type='mmm'
.