Discovering data#

This notebook shows how to find out what data is available locally as well as on ESGF. It also shows how to download the data from ESGF.

[1]:
from esmvalcore.config import CFG
from esmvalcore.dataset import Dataset, datasets_to_recipe
from esmvalcore.esgf import download
import yaml

Configure ESMValCore so it always searches the ESGF for data

[2]:
CFG['search_esgf'] = 'always'

We define a dataset template to search for all CMIP6 datasets that provide surface air temperature (tas) on a monthly resolution for the historical experiment. Note that ESMValCore uses its own names for the facets for a more uniform naming across different CMIP phases and other projects. The mapping to the facet names used on ESGF can be found in esmvalcore.esgf.facets.FACETS.

[3]:
dataset_template = Dataset(
    short_name='tas',
    mip='Amon',
    project='CMIP6',
    exp='historical',
    dataset='*',
    institute='*',
    ensemble='*',
    grid='*',
)

Next, we use the Dataset.from_files method to build a list of datasets from the available files. This may take a while as searching the ESGF for many files is a bit slow. Because the search results are cached for a configurable duration, subsequent searches will be faster.

[4]:
datasets = list(dataset_template.from_files())
print(f"Found {len(datasets)} datasets, showing the first 10:")
datasets[:10]
Found 778 datasets, showing the first 10:
[4]:
[Dataset:
 {'dataset': 'TaiESM1',
  'project': 'CMIP6',
  'mip': 'Amon',
  'short_name': 'tas',
  'ensemble': 'r1i1p1f1',
  'exp': 'historical',
  'grid': 'gn',
  'institute': 'AS-RCEC'},
 Dataset:
 {'dataset': 'TaiESM1',
  'project': 'CMIP6',
  'mip': 'Amon',
  'short_name': 'tas',
  'ensemble': 'r2i1p1f1',
  'exp': 'historical',
  'grid': 'gn',
  'institute': 'AS-RCEC'},
 Dataset:
 {'dataset': 'AWI-CM-1-1-MR',
  'project': 'CMIP6',
  'mip': 'Amon',
  'short_name': 'tas',
  'ensemble': 'r1i1p1f1',
  'exp': 'historical',
  'grid': 'gn',
  'institute': 'AWI'},
 Dataset:
 {'dataset': 'AWI-CM-1-1-MR',
  'project': 'CMIP6',
  'mip': 'Amon',
  'short_name': 'tas',
  'ensemble': 'r2i1p1f1',
  'exp': 'historical',
  'grid': 'gn',
  'institute': 'AWI'},
 Dataset:
 {'dataset': 'AWI-CM-1-1-MR',
  'project': 'CMIP6',
  'mip': 'Amon',
  'short_name': 'tas',
  'ensemble': 'r3i1p1f1',
  'exp': 'historical',
  'grid': 'gn',
  'institute': 'AWI'},
 Dataset:
 {'dataset': 'AWI-CM-1-1-MR',
  'project': 'CMIP6',
  'mip': 'Amon',
  'short_name': 'tas',
  'ensemble': 'r4i1p1f1',
  'exp': 'historical',
  'grid': 'gn',
  'institute': 'AWI'},
 Dataset:
 {'dataset': 'AWI-CM-1-1-MR',
  'project': 'CMIP6',
  'mip': 'Amon',
  'short_name': 'tas',
  'ensemble': 'r5i1p1f1',
  'exp': 'historical',
  'grid': 'gn',
  'institute': 'AWI'},
 Dataset:
 {'dataset': 'AWI-ESM-1-1-LR',
  'project': 'CMIP6',
  'mip': 'Amon',
  'short_name': 'tas',
  'ensemble': 'r1i1p1f1',
  'exp': 'historical',
  'grid': 'gn',
  'institute': 'AWI'},
 Dataset:
 {'dataset': 'BCC-CSM2-MR',
  'project': 'CMIP6',
  'mip': 'Amon',
  'short_name': 'tas',
  'ensemble': 'r1i1p1f1',
  'exp': 'historical',
  'grid': 'gn',
  'institute': 'BCC'},
 Dataset:
 {'dataset': 'BCC-CSM2-MR',
  'project': 'CMIP6',
  'mip': 'Amon',
  'short_name': 'tas',
  'ensemble': 'r2i1p1f1',
  'exp': 'historical',
  'grid': 'gn',
  'institute': 'BCC'}]

Let’s look at the first dataset in more detail. We can print the facets describing the dataset:

[5]:
dataset = datasets[0]
dataset
[5]:
Dataset:
{'dataset': 'TaiESM1',
 'project': 'CMIP6',
 'mip': 'Amon',
 'short_name': 'tas',
 'ensemble': 'r1i1p1f1',
 'exp': 'historical',
 'grid': 'gn',
 'institute': 'AS-RCEC'}

and see what files are available:

[6]:
dataset.files
[6]:
[ESGFFile:CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/Amon/tas/gn/v20200623/tas_Amon_TaiESM1_historical_r1i1p1f1_gn_185001-201412.nc on hosts ['esgf-data1.llnl.gov', 'esgf.ceda.ac.uk', 'esgf.rcec.sinica.edu.tw', 'esgf3.dkrz.de', 'esgf-data04.diasjp.net', 'esgf.nci.org.au', 'esgf3.dkrz.de']]

A single file can be downloaded using its download method:

[7]:
dataset.files[0].download(CFG['download_dir'])
[7]:
LocalFile('~/climate_data/CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/Amon/tas/gn/v20200623/tas_Amon_TaiESM1_historical_r1i1p1f1_gn_185001-201412.nc')

For downloading many files, the esmvalcore.esgf.download function is recommended because it will download the files in parallel. The ESMValCore will try to guess the fastest host and download from there. If it is not available for some reason, it will automatically fall back to the next host.

[8]:
download(dataset.files, CFG['download_dir'])