{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "bd168fbd-f5e8-4b32-906f-5c658b9758a0", "metadata": {}, "source": [ "# Discovering data\n", "\n", "This notebook shows how to find out what data is available locally as well as on ESGF. It also shows how to download the data from ESGF." ] }, { "cell_type": "code", "execution_count": 1, "id": "f0ccfe7f-c535-4606-99ce-be24960aece1", "metadata": {}, "outputs": [], "source": [ "from esmvalcore.config import CFG\n", "from esmvalcore.dataset import Dataset, datasets_to_recipe\n", "from esmvalcore.esgf import download\n", "import yaml" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f4374495-19c4-4c3b-9fac-d929a5e595ad", "metadata": {}, "source": [ "Configure ESMValCore so it always searches the ESGF for data" ] }, { "cell_type": "code", "execution_count": 2, "id": "5d2711ea-6738-4a82-97b1-bc7d1212098a", "metadata": {}, "outputs": [], "source": [ "CFG['search_esgf'] = 'always'" ] }, { "attachments": {}, "cell_type": "markdown", "id": "aea7a272-7d26-44d9-8766-379379e5d152", "metadata": {}, "source": [ "We define a dataset template to search for all CMIP6 datasets that provide surface air temperature (tas) on a monthly resolution for the historical experiment. Note that ESMValCore uses its own names for the facets for a more uniform naming across different CMIP phases and other projects. The mapping to the facet names used on ESGF can be found in [esmvalcore.esgf.facets.FACETS](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/api/esmvalcore.esgf.html#esmvalcore.esgf.facets.FACETS)." ] }, { "cell_type": "code", "execution_count": 3, "id": "23c26e29-ea87-40d7-a962-85a06fc77221", "metadata": {}, "outputs": [], "source": [ "dataset_template = Dataset(\n", " short_name='tas',\n", " mip='Amon',\n", " project='CMIP6',\n", " exp='historical',\n", " dataset='*',\n", " institute='*',\n", " ensemble='*',\n", " grid='*',\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "baf29fbb-eed5-47bd-8805-c27ad34b0539", "metadata": {}, "source": [ "Next, we use the `Dataset.from_files` method to build a list of datasets from the available files. This may take a while as searching the ESGF for many files is a bit slow. Because the search results are cached for a [configurable duration](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/configure.html#esgf-configuration), subsequent searches will be faster." ] }, { "cell_type": "code", "execution_count": 4, "id": "d657320b-25c7-48f3-bfe1-5f3b94d7b789", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 778 datasets, showing the first 10:\n" ] }, { "data": { "text/plain": [ "[Dataset:\n", " {'dataset': 'TaiESM1',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'tas',\n", " 'ensemble': 'r1i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AS-RCEC'},\n", " Dataset:\n", " {'dataset': 'TaiESM1',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'tas',\n", " 'ensemble': 'r2i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AS-RCEC'},\n", " Dataset:\n", " {'dataset': 'AWI-CM-1-1-MR',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'tas',\n", " 'ensemble': 'r1i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AWI'},\n", " Dataset:\n", " {'dataset': 'AWI-CM-1-1-MR',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'tas',\n", " 'ensemble': 'r2i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AWI'},\n", " Dataset:\n", " {'dataset': 'AWI-CM-1-1-MR',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'tas',\n", " 'ensemble': 'r3i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AWI'},\n", " Dataset:\n", " {'dataset': 'AWI-CM-1-1-MR',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'tas',\n", " 'ensemble': 'r4i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AWI'},\n", " Dataset:\n", " {'dataset': 'AWI-CM-1-1-MR',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'tas',\n", " 'ensemble': 'r5i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AWI'},\n", " Dataset:\n", " {'dataset': 'AWI-ESM-1-1-LR',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'tas',\n", " 'ensemble': 'r1i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AWI'},\n", " Dataset:\n", " {'dataset': 'BCC-CSM2-MR',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'tas',\n", " 'ensemble': 'r1i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'BCC'},\n", " Dataset:\n", " {'dataset': 'BCC-CSM2-MR',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'tas',\n", " 'ensemble': 'r2i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'BCC'}]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datasets = list(dataset_template.from_files())\n", "print(f\"Found {len(datasets)} datasets, showing the first 10:\")\n", "datasets[:10]" ] }, { "cell_type": "markdown", "id": "3f88a30e-9dcd-431d-b469-3efd367795de", "metadata": {}, "source": [ "Let's look at the first dataset in more detail. We can print the facets describing the dataset:" ] }, { "cell_type": "code", "execution_count": 5, "id": "8216affd-fa1a-4499-abd5-7ec836d14fd6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Dataset:\n", "{'dataset': 'TaiESM1',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'tas',\n", " 'ensemble': 'r1i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AS-RCEC'}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset = datasets[0]\n", "dataset" ] }, { "cell_type": "markdown", "id": "4792d842-0476-48b4-97ae-eca16da09c42", "metadata": {}, "source": [ "and see what files are available:" ] }, { "cell_type": "code", "execution_count": 6, "id": "2cb3a047-bbaf-415d-bc0b-44bf473d858d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[ESGFFile:CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/Amon/tas/gn/v20200623/tas_Amon_TaiESM1_historical_r1i1p1f1_gn_185001-201412.nc on hosts ['esgf-data1.llnl.gov', 'esgf.ceda.ac.uk', 'esgf.rcec.sinica.edu.tw', 'esgf3.dkrz.de', 'esgf-data04.diasjp.net', 'esgf.nci.org.au', 'esgf3.dkrz.de']]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.files" ] }, { "cell_type": "markdown", "id": "60d88a34-c886-4b9d-a9e9-a9d18fa97917", "metadata": {}, "source": [ "A single file can be downloaded using its `download` method:" ] }, { "cell_type": "code", "execution_count": 7, "id": "cdb2b043-c2f4-4cd2-b2d3-d75bb28571a2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LocalFile('~/climate_data/CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/Amon/tas/gn/v20200623/tas_Amon_TaiESM1_historical_r1i1p1f1_gn_185001-201412.nc')" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.files[0].download(CFG['download_dir'])" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3821b594-3797-497b-a51d-1798d5b2fc80", "metadata": {}, "source": [ "For downloading many files, the [esmvalcore.esgf.download](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/api/esmvalcore.esgf.html#esmvalcore.esgf.download) function is recommended because it will download the files in parallel. The ESMValCore will try to guess the fastest host and download from there. If it is not available for some reason, it will automatically fall back to the next host." ] }, { "cell_type": "code", "execution_count": 8, "id": "9676ff81-232e-4ff8-b784-686f0d06c469", "metadata": {}, "outputs": [], "source": [ "download(dataset.files, CFG['download_dir'])" ] } ], "metadata": { "kernelspec": { "display_name": "esm", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" }, "vscode": { "interpreter": { "hash": "17e81e49408864327be43d3caebcb8eca32ff92a01becb15aa27be73c37f0517" } } }, "nbformat": 4, "nbformat_minor": 5 }