Ridge Regression
Ridge Regression model.
Use mlr_model_type: ridge
to use this MLR model in the recipe.
Classes:
|
Ridge Regression model. |
- class esmvaltool.diag_scripts.mlr.models.ridge.RidgeModel(input_datasets, **kwargs)[source]
Bases:
LinearModel
Ridge Regression model.
Attributes:
Categorical features.
Input data of the MLR model.
Features of the input data.
Features of the input data after preprocessing.
Types of the features.
Units of the features.
Keyword arguments for
fit()
.Group attributes of the input data.
Label of the input data.
Units of the label.
MLR model type.
Numerical features.
Parameters of the complete MLR model pipeline.
Methods:
create
(mlr_model_type, *args, **kwargs)Create desired MLR model subclass (factory method).
efecv
(**kwargs)Perform exhaustive feature elimination using cross-validation.
export_prediction_data
([filename])Export all prediction data contained in self._data.
export_training_data
([filename])Export all training data contained in self._data.
fit
()Fit MLR model.
get_ancestors
([label, features, ...])Return ancestor files.
get_data_frame
(data_type[, impute_nans])Return data frame of specified type.
get_x_array
(data_type[, impute_nans])Return x data of specific type.
get_y_array
(data_type[, impute_nans])Return y data of specific type.
grid_search_cv
(param_grid, **kwargs)Perform exhaustive parameter search using cross-validation.
plot_1d_model
([filename, n_points])Plot lineplot that represents the MLR model.
plot_coefs
([filename])Plot linear coefficients of models.
plot_feature_importance
([filename, color_coded])Plot feature importance given by linear coefficients.
plot_partial_dependences
([filename])Plot partial dependences for every feature.
plot_prediction_errors
([filename])Plot predicted vs.
plot_residuals
([filename])Plot residuals of training and test (if available) data.
plot_residuals_distribution
([filename])Plot distribution of residuals of training and test data (KDE).
plot_residuals_histogram
([filename])Plot histogram of residuals of training and test data.
plot_scatterplots
([filename])Plot scatterplots label vs.
predict
([save_mlr_model_error, ...])Perform prediction using the MLR model(s) and write
*.nc
files.Print correlation matrices for all datasets.
print_regression_metrics
([logo])Print all available regression metrics for training data.
register_mlr_model
(mlr_model_type)Add MLR model (subclass of this class) (decorator).
Reset regressor pipeline.
rfecv
(**kwargs)Perform recursive feature elimination using cross-validation.
Perform Shapiro-Wilk test to normality of residuals.
update_parameters
(**params)Update parameters of the whole pipeline.
- property categorical_features
Categorical features.
- Type
- classmethod create(mlr_model_type, *args, **kwargs)
Create desired MLR model subclass (factory method).
- efecv(**kwargs)
Perform exhaustive feature elimination using cross-validation.
- Parameters
**kwargs (keyword arguments, optional) – Additional options for
esmvaltool.diag_scripts.mlr. custom_sklearn.cross_val_score_weighted()
.
- export_prediction_data(filename=None)
Export all prediction data contained in self._data.
- Parameters
filename (str, optional (default: '{data_type}_{pred_name}.csv')) – Name of the exported files.
- export_training_data(filename=None)
Export all training data contained in self._data.
- Parameters
filename (str, optional (default: '{data_type}.csv')) – Name of the exported files.
- property features
Features of the input data.
- Type
- property features_after_preprocessing
Features of the input data after preprocessing.
- Type
- property features_types
Types of the features.
- Type
- property features_units
Units of the features.
- Type
- fit()
Fit MLR model.
Note
Specifying keyword arguments for this function is not allowed here since
features_after_preprocessing
might be altered by that. Use the keyword argumentfit_kwargs
during class initialization instead.
- get_ancestors(label=True, features=None, prediction_names=None, prediction_reference=False)
Return ancestor files.
- Parameters
label (bool, optional (default: True)) – Return
label
files.features (list of str, optional (default: None)) – Features for which files should be returned. If
None
, return files for all features.prediction_names (list of str, optional (default: None)) – Prediction names for which files should be returned. If
None
, return files for all prediction names.prediction_reference (bool, optional (default: False)) – Return
prediction_reference
files if available for givenprediction_names
.
- Returns
Ancestor files.
- Return type
list of str
- Raises
ValueError – Invalid
feature
orprediction_name
given.
- get_data_frame(data_type, impute_nans=False)
Return data frame of specified type.
- Parameters
- Returns
Desired data.
- Return type
- Raises
TypeError –
data_type
is invalid or data does not exist (e.g. test data is not set).
- get_x_array(data_type, impute_nans=False)
Return x data of specific type.
- Parameters
- Returns
Desired data.
- Return type
- Raises
TypeError –
data_type
is invalid or data does not exist (e.g. test data is not set).
- get_y_array(data_type, impute_nans=False)
Return y data of specific type.
- Parameters
- Returns
Desired data.
- Return type
- Raises
TypeError –
data_type
is invalid or data does not exist (e.g. test data is not set).
- grid_search_cv(param_grid, **kwargs)
Perform exhaustive parameter search using cross-validation.
- Parameters
param_grid (dict or list of dict) – Parameter names (keys) and ranges (values) for the search. Have to be given for each step of the pipeline separated by two underscores, i.e.
s__p
is the parameterp
for steps
.**kwargs (keyword arguments, optional) – Additional options for
sklearn.model_selection.GridSearchCV
.
- Raises
ValueError – Final regressor does not supply the attributes
best_estimator_
orbest_params_
.
- property group_attributes
Group attributes of the input data.
- Type
- property numerical_features
Numerical features.
- Type
- plot_1d_model(filename=None, n_points=1000)
Plot lineplot that represents the MLR model.
Note
This only works for a model with a single feature.
- Parameters
- Raises
sklearn.exceptions.NotFittedError – MLR model is not fitted.
ValueError – MLR model is built from more than 1 feature.
- plot_coefs(filename=None)
Plot linear coefficients of models.
Note
The features plotted here are not necessarily the real input features, but the ones after preprocessing.
- Parameters
filename (str, optional (default: 'coefs')) – Name of the plot file.
- plot_feature_importance(filename=None, color_coded=True)
Plot feature importance given by linear coefficients.
Note
The features plotted here are not necessarily the real input features, but the ones after preprocessing.
- plot_partial_dependences(filename=None)
Plot partial dependences for every feature.
- Parameters
filename (str, optional (default: 'partial_dependece_{feature}')) – Name of the plot file.
- Raises
sklearn.exceptions.NotFittedError – MLR model is not fitted.
- plot_prediction_errors(filename=None)
Plot predicted vs. true values.
- Parameters
filename (str, optional (default: 'prediction_errors')) – Name of the plot file.
- Raises
sklearn.exceptions.NotFittedError – MLR model is not fitted.
- plot_residuals(filename=None)
Plot residuals of training and test (if available) data.
- Parameters
filename (str, optional (default: 'residuals')) – Name of the plot file.
- Raises
sklearn.exceptions.NotFittedError – MLR model is not fitted.
- plot_residuals_distribution(filename=None)
Plot distribution of residuals of training and test data (KDE).
- Parameters
filename (str, optional (default: 'residuals_distribution')) – Name of the plot file.
- Raises
sklearn.exceptions.NotFittedError – MLR model is not fitted.
- plot_residuals_histogram(filename=None)
Plot histogram of residuals of training and test data.
- Parameters
filename (str, optional (default: 'residuals_histogram')) – Name of the plot file.
- Raises
sklearn.exceptions.NotFittedError – MLR model is not fitted.
- plot_scatterplots(filename=None)
Plot scatterplots label vs. feature for every feature.
- Parameters
filename (str, optional (default: 'scatterplot_{feature}')) – Name of the plot file.
- Raises
sklearn.exceptions.NotFittedError – MLR model is not fitted.
- predict(save_mlr_model_error=None, save_lime_importance=False, save_propagated_errors=False, **kwargs)
Perform prediction using the MLR model(s) and write
*.nc
files.- Parameters
save_mlr_model_error (str or int, optional) – Additionally saves estimated squared MLR model error. This error represents the uncertainty of the prediction caused by the MLR model itself and not by errors in the prediction input data (errors in that will be considered by including datasets with
var_type
set toprediction_input_error
and settingsave_propagated_errors
toTrue
). If the option is set to'test'
, the (constant) error is estimated as RMSEP using a (hold-out) test data set. Only possible if test data is available, i.e. the optiontest_size
is not set toFalse
during class initialization. If the option is set to'logo'
, the (constant) error is estimated as RMSEP using leave-one-group-out cross-validation using the group_attributes. Only possible ifgroup_datasets_by_attributes
is given. If the option is set to an integern
(!= 0), the (constant) error is estimated as RMSEP using n-fold cross-validation.save_lime_importance (bool, optional (default: False)) – Additionally saves local feature importance given by LIME (Local Interpretable Model-agnostic Explanations).
save_propagated_errors (bool, optional (default: False)) – Additionally saves propagated errors from
prediction_input_error
datasets. Only possible when these are available.**kwargs (keyword arguments, optional) – Additional options for the final regressors
predict()
function.
- Raises
RuntimeError –
return_var
andreturn_cov
are both set toTrue
.sklearn.exceptions.NotFittedError – MLR model is not fitted.
ValueError – An invalid value for
save_mlr_model_error
is given.ValueError –
save_propagated_errors
isTrue
and noprediction_input_error
data is available.
- print_correlation_matrices()
Print correlation matrices for all datasets.
- print_regression_metrics(logo=False)
Print all available regression metrics for training data.
- Parameters
logo (bool, optional (default: False)) – Print regression metrics using
sklearn.model_selection.LeaveOneGroupOut
cross-validation. Only possible when group_datasets_by_attributes was given during class initialization.
- classmethod register_mlr_model(mlr_model_type)
Add MLR model (subclass of this class) (decorator).
- reset_pipeline()
Reset regressor pipeline.
- rfecv(**kwargs)
Perform recursive feature elimination using cross-validation.
Note
This only works for final estimators that provide information about feature importance either through a
coef_
attribute or through afeature_importances_
attribute.- Parameters
**kwargs (keyword arguments, optional) – Additional options for
sklearn.feature_selection.RFECV
.- Raises
RuntimeError – Final estimator does not provide
coef_
orfeature_importances_
attribute.
- test_normality_of_residuals()
Perform Shapiro-Wilk test to normality of residuals.
- Raises
sklearn.exceptions.NotFittedError – MLR model is not fitted.
- update_parameters(**params)
Update parameters of the whole pipeline.
Note
Parameter names have to be given for each step of the pipeline separated by two underscores, i.e.
s__p
is the parameterp
for steps
.- Parameters
**params (keyword arguments, optional) – Parameters for the pipeline which should be updated.
- Raises
ValueError – Invalid parameter for pipeline given.