From fc3af3c473678b385c0b538b86363fcd2b09bb86 Mon Sep 17 00:00:00 2001 From: Paul Heubel Date: Tue, 16 Jul 2024 13:19:09 +0200 Subject: [PATCH] FIX #71. Removed W2D5 files from student/ and instructor/ dirs of W2D4. --- .../instructor/W2D5_DaySummary.ipynb | 41 - .../instructor/W2D5_Intro.ipynb | 187 ---- .../instructor/W2D5_Outro.ipynb | 56 -- .../instructor/W2D5_Tutorial1.ipynb | 829 ---------------- .../instructor/W2D5_Tutorial2.ipynb | 895 ------------------ .../instructor/W2D5_Tutorial3.ipynb | 656 ------------- .../instructor/W2D5_Tutorial4.ipynb | 678 ------------- .../instructor/W2D5_Tutorial5.ipynb | 748 --------------- .../instructor/W2D5_Tutorial6.ipynb | 292 ------ .../student/W2D5_DaySummary.ipynb | 41 - .../student/W2D5_Intro.ipynb | 187 ---- .../student/W2D5_Outro.ipynb | 56 -- .../student/W2D5_Tutorial1.ipynb | 789 --------------- .../student/W2D5_Tutorial2.ipynb | 843 ----------------- .../student/W2D5_Tutorial3.ipynb | 612 ------------ .../student/W2D5_Tutorial4.ipynb | 635 ------------- .../student/W2D5_Tutorial5.ipynb | 700 -------------- .../student/W2D5_Tutorial6.ipynb | 292 ------ 18 files changed, 8537 deletions(-) delete mode 100644 tutorials/W2D4_AIandClimateChange/instructor/W2D5_DaySummary.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/instructor/W2D5_Intro.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/instructor/W2D5_Outro.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial1.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial2.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial3.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial4.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial5.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial6.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/student/W2D5_DaySummary.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/student/W2D5_Intro.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/student/W2D5_Outro.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial1.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial2.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial3.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial4.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial5.ipynb delete mode 100644 tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial6.ipynb diff --git a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_DaySummary.ipynb b/tutorials/W2D4_AIandClimateChange/instructor/W2D5_DaySummary.ipynb deleted file mode 100644 index cf3e5fe89..000000000 --- a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_DaySummary.ipynb +++ /dev/null @@ -1,41 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "760f5fe4", - "metadata": {}, - "source": [ - "# Day Summary" - ] - }, - { - "cell_type": "markdown", - "id": "cbf64b85", - "metadata": {}, - "source": [ - "In this day, you learned how to explore data, ways to build and train regression models, the basics of random forest models and artificial neural networks, methods for assessing feature importance, and the nuances of how to evaluate a model's performance. " - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.8" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Intro.ipynb b/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Intro.ipynb deleted file mode 100644 index 8800fba0a..000000000 --- a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Intro.ipynb +++ /dev/null @@ -1,187 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "8b35183e", - "metadata": {}, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ClimateMatchAcademy/course-content/blob/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/instructor/W2D5_Intro.ipynb)   \"Open\n" - ] - }, - { - "cell_type": "markdown", - "id": "ce6deecc", - "metadata": { - "execution": {} - }, - "source": [ - "# Intro\n" - ] - }, - { - "cell_type": "markdown", - "id": "652bed63", - "metadata": { - "execution": {} - }, - "source": [ - "## Overview\n" - ] - }, - { - "cell_type": "markdown", - "id": "d3c95a91", - "metadata": { - "execution": {} - }, - "source": [ - "Today's materials will provide an overview of data science and machine learning and how these topics can be applied to topics related to climate science and climate change. Particularly, we will explore two real world data sets that represent the impact of climate change on health and agriculture and learn how to model machine learning models that can predict output values and categorize data.\n" - ] - }, - { - "cell_type": "markdown", - "id": "e16287a1", - "metadata": { - "execution": {} - }, - "source": [ - "## Video 1: Climate Change Impacts on the SDGs and the Role of AI\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "9e032d93", - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @markdown\n", - "\n", - "from ipywidgets import widgets\n", - "from IPython.display import YouTubeVideo\n", - "from IPython.display import IFrame\n", - "from IPython.display import display\n", - "\n", - "\n", - "class PlayVideo(IFrame):\n", - " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", - " self.id = id\n", - " if source == \"Bilibili\":\n", - " src = f\"https://player.bilibili.com/player.html?bvid={id}&page={page}\"\n", - " elif source == \"Osf\":\n", - " src = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render\"\n", - " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", - "\n", - "\n", - "def display_videos(video_ids, W=400, H=300, fs=1):\n", - " tab_contents = []\n", - " for i, video_id in enumerate(video_ids):\n", - " out = widgets.Output()\n", - " with out:\n", - " if video_ids[i][0] == \"Youtube\":\n", - " video = YouTubeVideo(\n", - " id=video_ids[i][1], width=W, height=H, fs=fs, rel=0\n", - " )\n", - " print(f\"Video available at https://youtube.com/watch?v={video.id}\")\n", - " else:\n", - " video = PlayVideo(\n", - " id=video_ids[i][1],\n", - " source=video_ids[i][0],\n", - " width=W,\n", - " height=H,\n", - " fs=fs,\n", - " autoplay=False,\n", - " )\n", - " if video_ids[i][0] == \"Bilibili\":\n", - " print(\n", - " f\"Video available at https://www.bilibili.com/video/{video.id}\"\n", - " )\n", - " elif video_ids[i][0] == \"Osf\":\n", - " print(f\"Video available at https://osf.io/{video.id}\")\n", - " display(video)\n", - " tab_contents.append(out)\n", - " return tab_contents\n", - "\n", - "\n", - "video_ids = [(\"Youtube\", \"cEBv2yhKrtk\"), (\"Bilibili\", \"BV1Du41157AM\")]\n", - "tab_contents = display_videos(video_ids, W=730, H=410)\n", - "tabs = widgets.Tab()\n", - "tabs.children = tab_contents\n", - "for i in range(len(tab_contents)):\n", - " tabs.set_title(i, video_ids[i][0])\n", - "display(tabs)" - ] - }, - { - "cell_type": "markdown", - "id": "3dfb0d40", - "metadata": { - "execution": {} - }, - "source": [ - "## Slides\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "ccaea697", - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @markdown\n", - "from ipywidgets import widgets\n", - "from IPython.display import IFrame\n", - "\n", - "link_id = \"rqst6\"\n", - "\n", - "download_link = f\"https://osf.io/download/{link_id}/\"\n", - "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", - "# @markdown\n", - "out = widgets.Output()\n", - "with out:\n", - " print(f\"If you want to download the slides: {download_link}\")\n", - " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", - "display(out)\n" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Intro", - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.8" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Outro.ipynb b/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Outro.ipynb deleted file mode 100644 index de88cde04..000000000 --- a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Outro.ipynb +++ /dev/null @@ -1,56 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "cbe66527", - "metadata": { - "execution": {} - }, - "source": [ - "# Outro" - ] - }, - { - "cell_type": "markdown", - "id": "223a5d62", - "metadata": { - "execution": {} - }, - "source": [ - "The tools learned on this day are widely applicable to many climate topics, some of which were already covered in this course but also many more topic areas of importance to climate change mitigation and adaptation. As discussed, however, care most always be taken when interpreting and using machine learning models in the real world." - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Outro", - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.12" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial1.ipynb b/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial1.ipynb deleted file mode 100644 index bc663aa53..000000000 --- a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial1.ipynb +++ /dev/null @@ -1,829 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/W2D5_Tutorial1.ipynb)   \"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial 1: ClimateBench Dataset and How Machine Learning Can Help\n", - "\n", - "**Week 2, Day 5, AI and Climate Change**\n", - "\n", - "__Content creators:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Content reviewers:__ Mujeeb Abdulfatai, Nkongho Ayuketang Arreyndip, Jeffrey N. A. Aryee, Paul Heubel, Jenna Pearson, Abel Shibu\n", - "\n", - "__Content editors:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Production editors:__ Konstantine Tsafatinos\n", - "\n", - "**Our 2024 Sponsors:** CMIP, NFDI4Earth" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "editable": true, - "execution": {}, - "slideshow": { - "slide_type": "" - }, - "tags": [] - }, - "source": [ - "# Tutorial Objectives\n", - "\n", - "*Estimated timing of tutorial:* 25 minutes\n", - "\n", - "Today, you will work on a total of 6 short tutorials. In Tutorial 1, you delve into the fundamentals, including discussions on climate model emulators and the ClimateBench dataset. You gain insights into Earth System Models (ESMs) and Shared Socioeconomic Pathways (SSPs), alongside practical visualization techniques for ClimateBench features. Tutorial 2 expands on these foundations, exploring decision trees, hyperparameters, and random forest models. You learn to evaluate regression models, focusing on the coefficient of determination (R$^2$), and gain hands-on experience implementing models using `scikit-learn`. Tutorial 3 shifts focus to mitigating overfitting in machine learning models. Here, you learn the importance of model generalization and acquire practical skills for splitting data into training and test sets. In Tutorial 4, you refine your understanding of model robustness, with emphasis on within-distribution generalization and testing model performance on similar data. Tutorial 5 challenges you to test our models on various types of out-of-distribution data, while also exploring the role of climate model emulators in climate science research. Finally, Tutorial 6 concludes the series by discussing practical applications of AI and machine learning in addressing climate change-related challenges, and introducing available resources and tools in the field of climate change AI.\n", - "\n", - "In this tutorial, you will\n", - "* Learn about the basics of data science and machine learning.\n", - "* Define “climate model emulators”.\n", - "* Introduce the ClimateBench dataset.\n", - "* Visualize features from this dataset.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {}, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# imports\n", - "import matplotlib.pyplot as plt # For plotting graphs\n", - "import pandas as pd # For data manipulation\n", - "import xarray as xr # For multidimensional data manipulation\n", - "import seaborn as sns # For advanced visualizations\n", - "import cartopy.crs as ccrs # for geospatial visualizations" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Figure Settings\n", - "import ipywidgets as widgets # interactive display\n", - "\n", - "%config InlineBackend.figure_format = 'retina'\n", - "plt.style.use(\n", - " \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Set random seed\n", - "\n", - "# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n", - "\n", - "# Call `set_seed` function in the exercises to ensure reproducibility.\n", - "import random\n", - "import numpy as np\n", - "\n", - "def set_seed(seed=None):\n", - " if seed is None:\n", - " seed = np.random.choice(2 ** 32)\n", - " random.seed(seed)\n", - " np.random.seed(seed)\n", - " print(f'Random seed {seed} has been set.')\n", - "\n", - "# Set a global seed value for reproducibility\n", - "random_state = 42 # change 42 with any number you like\n", - "\n", - "set_seed(seed=random_state)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Video 1: Machine Learning on ClimateBench data\n", - "\n", - "from ipywidgets import widgets\n", - "from IPython.display import YouTubeVideo\n", - "from IPython.display import IFrame\n", - "from IPython.display import display\n", - "\n", - "\n", - "class PlayVideo(IFrame):\n", - " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", - " self.id = id\n", - " if source == 'Bilibili':\n", - " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", - " elif source == 'Osf':\n", - " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", - " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", - "\n", - "\n", - "def display_videos(video_ids, W=400, H=300, fs=1):\n", - " tab_contents = []\n", - " for i, video_id in enumerate(video_ids):\n", - " out = widgets.Output()\n", - " with out:\n", - " if video_ids[i][0] == 'Youtube':\n", - " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", - " height=H, fs=fs, rel=0)\n", - " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", - " else:\n", - " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", - " height=H, fs=fs, autoplay=False)\n", - " if video_ids[i][0] == 'Bilibili':\n", - " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", - " elif video_ids[i][0] == 'Osf':\n", - " print(f'Video available at https://osf.io/{video.id}')\n", - " display(video)\n", - " tab_contents.append(out)\n", - " return tab_contents\n", - "\n", - "# curriculum or production team will provide these ids\n", - "video_ids = [('Youtube', 'k1jrcheoWP8'), ('Bilibili', ''), ('Osf', '')]\n", - "tab_contents = display_videos(video_ids, W=854, H=480)\n", - "tabs = widgets.Tab()\n", - "tabs.children = tab_contents\n", - "for i in range(len(tab_contents)):\n", - " tabs.set_title(i, video_ids[i][0])\n", - "display(tabs)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "editable": true, - "execution": {}, - "slideshow": { - "slide_type": "" - }, - "tags": [] - }, - "outputs": [], - "source": [ - "# @title Tutorial slides\n", - "\n", - "# @markdown\n", - "from ipywidgets import widgets\n", - "from IPython.display import IFrame\n", - "\n", - "link_id = \"4k3jd\"\n", - "\n", - "download_link = f\"https://osf.io/download/{link_id}/\"\n", - "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", - "# @markdown\n", - "out = widgets.Output()\n", - "with out:\n", - " print(f\"If you want to download the slides: {download_link}\")\n", - " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", - "display(out)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Section 1: ClimateBench Dataset and How Machine Learning Can Help\n", - "\n", - "Section Objectives:\n", - "* Understand how machine learning can be helpful generally\n", - "* Understand the climate model data we will be working with\n", - "* Understand the concept of a climate model emulator\n", - "* Learn how to explore the dataset\n", - "\n", - "\n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.1: About the ClimateBench dataset" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "The ClimateBench dataset offers a comprehensive collection of hypothetical climate data derived from sophisticated computer simulations (specifically, the NorESM2 model, available via CIMP6). It includes information on key climate variables such as temperature, precipitation, and diurnal temperature range. These values are collected by running simulations that represent the different Shared Socioeconomic Pathways (SSPs). Each pathway is associated with a different projected emissions profile over time. This data thus provides insights into how these climate variables may change in the future due to different emission scenarios. By utilizing this dataset, researchers can develop predictive models to better understand and anticipate the impacts of climate change, ultimately aiding in the development of effective mitigation strategies. Specifically, this data set is well-formatted for training *machine learning models*, which is exactly what you will do here.\n", - "\n", - "A brief overview of the ClimateBench dataset is provided below; for additional details, please refer to the full paper -\n", - "\n", - "[ClimateBench v1.0: A Benchmark for Data-Driven Climate Projections](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Spatial Resolution:\n", - "The simulations are conducted on a grid with a spatial resolution of approximately 2°, allowing for analysis of regional climate patterns and phenomena." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Variables:\n", - "The dataset includes four main variables defined for each point on the grid:\n", - "1. **Temperature (TAS)**: Represents the annual mean surface air temperature.\n", - "2. **Diurnal Temperature Range (DTR)**: Reflects the difference between the maximum and minimum temperatures within a day averaged annually.\n", - "3. **Precipitation (PR)**: Indicates the annual total precipitation.\n", - "4. **90th Percentile of Precipitation (PR90)**: Captures extreme precipitation events by identifying the 90th percentile of daily precipitation values. \n", - " " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### ScenarioMIP Simulations:\n", - "The dataset incorporates ScenarioMIP simulations, exploring various future emission pathways under different socio-economic scenarios. Each scenario is defined by a set of annual emissions values over future years. We will look at 5 different scenarios in total here." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Emissions Inputs:\n", - "Emissions scenarios are defined according to the following four types of emissions:\n", - "- Carbon dioxide (CO2) concentrations.\n", - "- Methane (CH4) concentrations.\n", - "- Sulfur dioxide (SO2) emissions, a precursor to sulfate aerosols.\n", - "- Black carbon (BC) emissions.\n", - "\n", - "Note: In the ClimateBench dataset, sulfur dioxide and black carbon emissions are provided as a spatial map over grid locations, but we will just look at global totals here." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Model Specifications:\n", - "- Simulation Model: the NorESM2 model is run in its low atmosphere-medium ocean resolution (LM) configuration.\n", - "- Model Components: Fully coupled earth system including the atmosphere, land, ocean, ice, and biogeochemistry components.\n", - "- Ensemble Averaging: Target variables are averaged over three ensemble members to mitigate internal variability contributions.\n", - "\n", - "By leveraging the ClimateBench dataset, researchers gain insights into climate dynamics, enabling the development and evaluation of predictive models crucial for understanding and addressing climate change challenges." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "

W2D5_Tutorial1_climatebench_Scenario

" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "For simplicity's sake, we'll utilize a **condensed version of the ClimateBench dataset**. As mentioned above, we will be looking at only 5 scenarios ('SSPs', listed above as \"experiments\"), and all emissions will be given as global annual averages for the years 2015 to 2050. Furthermore, we will include climate variables for each spatial location (as defined by latitude and longitude for a restricted region) for the year 2015. The target for our model prediction will be temperature in the year 2050 for each spatial location." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.2: Load the Dataset (Condensed Version)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "We will use `pandas` to interact with the data, which is shared in the `.csv` format. First, let us load the environmental data into a pandas dataframe and print its contents." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "#Load Dataset\n", - "url_Climatebench_train_val = \"https://osf.io/y2pq7/download\"\n", - "training_data = pd.read_csv(url_Climatebench_train_val)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.3: Explore Data Structure" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Next, we will quickly explore the size of the data, check for missing data, and understand column names" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "print(training_data.shape)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "This tells us we have 3240 rows and 152 columns.\n", - "\n", - "Let's look at what these rows and columns mean:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "training_data" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Each row represents a combination of spatial location and scenario. The scenario can be found in the 'scenario' column while the location is given in the 'lat' and 'lon' columns. Climate variables for 2015 are given in the following columns and tas_FINAL represents the temperature in 2050. After these columns, we get the annual global emissions values for each of the 4 emissions types included in ClimateBench, starting in 2015 and ending in 2050." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "**Handle Missing Values (if necessary)**:\n", - "\n", - "We cannot train a machine learning model if there are values missing anywhere in this dataset. Therefore, we will check for missing values using `training_data.isnull().sum()`, which sums the number of 'null' or missing values. \n", - "If missing values exist, we can consider imputation techniques (e.g., [`fillna`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html), [`interpolate`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html)) based on the nature of the data and the specific column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "training_data.isnull().sum()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Here, there are no missing values as the sum of all [`isnull()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isnull.html) values is zero for all columns. So we are good to go!" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.4: Visualize the data\n", - "In this section, we'll utilize visualization techniques to explore the dataset, uncovering underlying patterns and distributions of the variables. Visualizations are instrumental in making informed decisions and conducting comprehensive data analysis." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "**Spatial Distribution of Temperature and Precipitation:** \n", - "Plotting the spatial distribution of temperature can reveal geographical patterns and hotspots. We will use the temperature at 2015, the starting point of our simulation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# Create a xarray dataset from the pandas dataframe\n", - "# for convenient plotting with cartopy afterwards\n", - "ds = xr.Dataset({'tas_2015': ('points', training_data['tas_2015'])},\n", - " coords={'lon': ('points', training_data['lon']),\n", - " 'lat': ('points', training_data['lat'])}\n", - " )\n", - "ds" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# create geoaxes\n", - "ax = plt.axes(projection=ccrs.PlateCarree())\n", - "\n", - "# add coastlines\n", - "ax.coastlines()\n", - "\n", - "# plot the data\n", - "p = ax.scatter(ds['lon'], ds['lat'], c=ds['tas_2015'], cmap='coolwarm', transform=ccrs.PlateCarree())\n", - "\n", - "# add a colorbar\n", - "cbar = plt.colorbar(p, orientation='vertical')\n", - "cbar.set_label('Temperature (K)')\n", - "\n", - "# add a grid and labels\n", - "ax.gridlines(draw_labels={\"bottom\": \"x\", \"left\": \"y\"})\n", - "\n", - "# add title\n", - "plt.title('Spatial Distribution of\\nAnnual Mean Temperature anomalies (2015)\\n')\n", - "\n", - "# add a caption with adjusted y-coordinate to create space\n", - "caption_text = 'The anomalies are calculated by subtracting the annual means of the pre-industrial scenario from \\nthe annual means of the respective scenario.'\n", - "plt.figtext(0.5, -0.03, caption_text, ha='center', fontsize=10) # Adjusted y-coordinate to create space" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "We can see there are clear spatial variations in 2015 temperatures. Note the range of latitude and longitude values, this dataset does not cover the entire globe. In fact, it covers roughly the geographical region represented below:\n", - "\n", - "

W2D5_Tutorial1_map

\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now use the same plotting code to make a plot of the spatial distribution of total precipitation:" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Coding Exercise 1.4: Plotting Spatial Distribution of Total Precipitation\n", - "\n", - "In this exercise, you will complete the code to plot the spatial distribution of total precipitation. Use the provided plotting code as a template and replace the ellipses with appropriate values.\n", - "\n", - "*Note that you have the necessary libraries already imported* (`xarray`, `matplotlib.pyplot`, `cartopy.crs` *and* `pandas`)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "```python\n", - "def plot_spatial_distribution(data, col_name, c_label):\n", - " \"\"\"\n", - " Plot the spatial distribution of a variable of interest.\n", - "\n", - " Args:\n", - " data (DataFrame): DataFrame containing latitude, longitude, and data of interest.\n", - " col_name (str): Name of the column containing data of interest.\n", - " c_label (str): Label to describe quantity and unit for the colorbar labeling.\n", - "\n", - " Returns:\n", - " None\n", - " \"\"\"\n", - " # create a xarray dataset from the pandas dataframe\n", - " # for convenient plotting with cartopy afterwards\n", - " ds = xr.Dataset({col_name: ('points', data[col_name])},\n", - " coords={'lon': ('points', data['lon']),\n", - " 'lat': ('points', data['lat'])}\n", - " )\n", - "\n", - " # create geoaxes\n", - " ax = plt.axes(projection=ccrs.PlateCarree())\n", - "\n", - " # add coastlines\n", - " ax.coastlines()\n", - "\n", - " # plot the data\n", - " p = ax.scatter(..., ... ,... , cmap='coolwarm', transform=ccrs.PlateCarree())\n", - "\n", - " # add a colorbar\n", - " cbar = plt.colorbar(p, orientation='vertical')\n", - " cbar.set_label(c_label)\n", - "\n", - " # add a grid and labels\n", - " ax.gridlines(draw_labels={\"bottom\": \"x\", \"left\": \"y\"})\n", - "\n", - " # add title\n", - " plt.title('Spatial Distribution of\\n Annual Mean Anomalies\\n')\n", - " plt.show()\n", - "\n", - "# test your function along precipitation data\n", - "_ = ...\n", - "\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# to_remove solution\n", - "\n", - "def plot_spatial_distribution(data, col_name, c_label):\n", - " \"\"\"\n", - " Plot the spatial distribution of a variable of interest.\n", - "\n", - " Args:\n", - " data (DataFrame): DataFrame containing latitude, longitude, and data of interest.\n", - " col_name (str): Name of the column containing data of interest.\n", - " c_label (str): Label to describe quantity and unit for the colorbar labeling.\n", - "\n", - " Returns:\n", - " None\n", - " \"\"\"\n", - " # create a xarray dataset from the pandas dataframe\n", - " # for convenient plotting with cartopy afterwards\n", - " ds = xr.Dataset({col_name: ('points', data[col_name])},\n", - " coords={'lon': ('points', data['lon']),\n", - " 'lat': ('points', data['lat'])}\n", - " )\n", - "\n", - " # create geoaxes\n", - " ax = plt.axes(projection=ccrs.PlateCarree())\n", - "\n", - " # add coastlines\n", - " ax.coastlines()\n", - "\n", - " # plot the data\n", - " p = ax.scatter(ds['lon'], ds['lat'], c=ds[col_name], cmap='coolwarm', transform=ccrs.PlateCarree())\n", - "\n", - " # add a colorbar\n", - " cbar = plt.colorbar(p, orientation='vertical')\n", - " cbar.set_label(c_label)\n", - "\n", - " # add a grid and labels\n", - " ax.gridlines(draw_labels={\"bottom\": \"x\", \"left\": \"y\"})\n", - "\n", - " # add title\n", - " plt.title('Spatial Distribution of\\n Annual Mean Anomalies\\n')\n", - " plt.show()\n", - "\n", - "# test your function along precipitation data\n", - "_ = plot_spatial_distribution(training_data, 'pr_2015', 'Precipitation (mm)')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "**Time Series Plot of Emissions Scenarios:**\n", - "\n", - "\n", - "We will plot the time series of each of the four emissions scenarios in this dataset (we will get to the fifth one later). Each row in the dataset with the same 'scenario' label has the same emissions values over time. So we will only use the data from the first spatial location for each scenario." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Run this cell to plot the Time Series Plot of Emissions Scenarios:\n", - "# Don't worry about understanding this code! It's to set up the plot.\n", - "\n", - "# Set Seaborn style\n", - "sns.set_style(\"whitegrid\")\n", - "\n", - "# Extract emissions data for each scenario\n", - "CO2_data = training_data.filter(regex='CO2_\\d+')\n", - "SO2_data = training_data.filter(regex='SO2_\\d+')\n", - "CH4_data = training_data.filter(regex='CH4_\\d+')\n", - "BC_data = training_data.filter(regex='BC_\\d+')\n", - "\n", - "# Define the four scenarios\n", - "scenarios = ['ssp585', 'ssp370-lowNTCF','ssp126', 'ssp370',]\n", - "\n", - "# Create subplots for each emission gas\n", - "fig, axs = plt.subplots(4, 1, figsize=(8, 15), sharex=True)\n", - "\n", - "# Define units for each emission\n", - "units = {'CO2': 'GtCO2', 'CH4': 'GtCH4 / year', 'SO2': 'TgSO2 / year', 'BC': 'TgBC / year'}\n", - "\n", - "# Plot emissions data for each emission gas with enhanced styling\n", - "for i, (data, emission) in enumerate(zip([CO2_data, CH4_data, SO2_data,BC_data], ['CO2', 'CH4', 'SO2','BC'])):\n", - " # Plot each scenario for the current emission gas\n", - " for scenario in scenarios:\n", - " scenario_data = data[training_data['scenario'] == scenario]\n", - " axs[i].plot(range(2015, 2051), scenario_data.mean(axis=0), label=scenario)\n", - "\n", - " # Set ylabel and title for the current emission gas\n", - " axs[i].set_ylabel(f'{emission} Emissions ({units[emission]})', fontsize=12)\n", - " axs[i].set_title(f'{emission} Emissions', fontsize=14)\n", - " axs[i].legend()\n", - "\n", - "# Set common xlabel\n", - "plt.xlabel('Time (years)')\n", - "\n", - "# Adjust layout\n", - "plt.tight_layout()\n", - "\n", - "# Show legends\n", - "plt.legend()\n", - "\n", - "# Remove spines from all subplots\n", - "for ax in axs:\n", - " ax.spines['top'].set_visible(False)\n", - " ax.spines['right'].set_visible(False)\n", - "\n", - "# Customize ticks\n", - "plt.xticks()\n", - "plt.yticks()\n", - "\n", - "# Show the plot\n", - "plt.grid(True, linestyle='--')\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "This last plot displays the global mean emissions contained in the ClimateBench dataset over the years 2015 to 2050 for four atmospheric constituents that are important for defining the forcing (cumulative anthropogenic carbon dioxide CO$_2$, methane CH$_4$, sulfur dioxide SO$_2$, black carbon BC). Each line represents a different emission scenario, which shows us trends and variations in emissions over time. The 'ssp370-lowNTCF' refers to a variation of the ssp370 scenario which includes lower emissions of near-term climate forcers (NTCFs) such as aerosol (but not methane). \n", - "These emission scenarios are used in the following tutorials as features/predictors for our prediction of the temperature in 2050.\n", - "\n", - "All time series are derived from NorESM2 ScenarioMIP simulations available. Please read the paper of [Watson-Parris et al. (2022)](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) for a more detailed explanation of the ClimateBench dataset." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Summary\n", - "\n", - "In this tutorial, you acquainted yourself with the ClimateBench dataset and explored how machine learning contributes to climate analysis. We defined the versatility of machine learning and its role in predicting climate variables. By delving into the ClimateBench dataset, we highlight its accessibility in providing climate model data. We emphasize the importance of data visualization and engage in practical exercises to explore the dataset.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Resources" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "* [ClimateBench v1.0: A Benchmark for Data-Driven Climate Projections](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) " - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Tutorial1", - "provenance": [], - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.19" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial2.ipynb b/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial2.ipynb deleted file mode 100644 index 04b93e17f..000000000 --- a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial2.ipynb +++ /dev/null @@ -1,895 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/W2D5_Tutorial2.ipynb)   \"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial 2: Building and Training Random Forest Models\n", - "\n", - "**Week 2, Day 5, AI and Climate Change**\n", - "\n", - "__Content creators:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Content reviewers:__ Mujeeb Abdulfatai, Nkongho Ayuketang Arreyndip, Jeffrey N. A. Aryee, Paul Heubel, Jenna Pearson, Abel Shibu\n", - "\n", - "__Content editors:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Production editors:__ Konstantine Tsafatinos\n", - "\n", - "**Our 2024 Sponsors:** CMIP, NFDI4Earth" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial Objectives\n", - "\n", - "*Estimated timing of tutorial:* 35 minutes\n", - "\n", - "In this tutorial, you will \n", - "* Learn about decision trees and hyperparameters\n", - "* Learn about random forest models\n", - "* Understand how regression models are evaluated (R$^2$)\n", - "* Familiarize yourself with the scikit-learn package\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {}, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# imports\n", - "import matplotlib.pyplot as plt # For plotting graphs\n", - "import pandas as pd # For data manipulation\n", - "import ipywidgets as widgets # interactive display\n", - "from sklearn.ensemble import RandomForestRegressor # For Random Forest Regression\n", - "from sklearn.tree import DecisionTreeRegressor # For Decision Tree Regression\n", - "from sklearn.tree import plot_tree # For plotting decision trees" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Figure Settings\n", - "import ipywidgets as widgets # interactive display\n", - "\n", - "%config InlineBackend.figure_format = 'retina'\n", - "plt.style.use(\n", - " \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Set random seed\n", - "\n", - "# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n", - "\n", - "# Call `set_seed` function in the exercises to ensure reproducibility.\n", - "import random\n", - "import numpy as np\n", - "\n", - "def set_seed(seed=None):\n", - " if seed is None:\n", - " seed = np.random.choice(2 ** 32)\n", - " random.seed(seed)\n", - " np.random.seed(seed)\n", - " print(f'Random seed {seed} has been set.')\n", - "\n", - "# Set a global seed value for reproducibility\n", - "random_state = 42 # change 42 with any number you like\n", - "\n", - "set_seed(seed=random_state)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Video 1: Building and training Random Forest Models\n", - "\n", - "from ipywidgets import widgets\n", - "from IPython.display import YouTubeVideo\n", - "from IPython.display import IFrame\n", - "from IPython.display import display\n", - "\n", - "\n", - "class PlayVideo(IFrame):\n", - " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", - " self.id = id\n", - " if source == 'Bilibili':\n", - " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", - " elif source == 'Osf':\n", - " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", - " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", - "\n", - "\n", - "def display_videos(video_ids, W=400, H=300, fs=1):\n", - " tab_contents = []\n", - " for i, video_id in enumerate(video_ids):\n", - " out = widgets.Output()\n", - " with out:\n", - " if video_ids[i][0] == 'Youtube':\n", - " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", - " height=H, fs=fs, rel=0)\n", - " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", - " else:\n", - " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", - " height=H, fs=fs, autoplay=False)\n", - " if video_ids[i][0] == 'Bilibili':\n", - " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", - " elif video_ids[i][0] == 'Osf':\n", - " print(f'Video available at https://osf.io/{video.id}')\n", - " display(video)\n", - " tab_contents.append(out)\n", - " return tab_contents\n", - "\n", - "# curriculum or production team will provide these ids\n", - "video_ids = [('Youtube', 'st_1ygEGQTQ'), ('Bilibili', ''), ('Osf', '')]\n", - "tab_contents = display_videos(video_ids, W=854, H=480)\n", - "tabs = widgets.Tab()\n", - "tabs.children = tab_contents\n", - "for i in range(len(tab_contents)):\n", - " tabs.set_title(i, video_ids[i][0])\n", - "display(tabs)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Tutorial slides\n", - "\n", - "# @markdown\n", - "from ipywidgets import widgets\n", - "from IPython.display import IFrame\n", - "\n", - "link_id = \"kyv6w\"\n", - "\n", - "download_link = f\"https://osf.io/download/{link_id}/\"\n", - "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", - "# @markdown\n", - "out = widgets.Output()\n", - "with out:\n", - " print(f\"If you want to download the slides: {download_link}\")\n", - " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", - "display(out)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Section 1: Preparing the Data for Model Training" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "In this video, we learned about:\n", - "\n", - "1. Using regression for prediction tasks, like the one we have.\n", - "2. The conceptual understanding of decision trees and their regression capabilities.\n", - "3. Random forests as an ensemble of decision trees.\n", - "4. Training our model\n", - "4. Measuring model performance.\n", - "5. Utilizing the scikit-learn toolbox for regression tasks.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.1: Loading the data\n", - "\n", - "Remember from the previous tutorial how we loaded the `training_data`?\n", - "Let's again load the data here for this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "#Load Dataset\n", - "url_Climatebench_train_val = \"https://osf.io/y2pq7/download\"\n", - "training_data = pd.read_csv(url_Climatebench_train_val)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Next, we will prepare the data to train a model to predict temperature anomalies in 2050. Let's also remind ourselves of what the data contains:" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.2: Preparing the data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# Check column names (assuming a pandas DataFrame)\n", - "print(\"Column names:\")\n", - "print(training_data.columns.tolist()) # List all column names" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "First, we will drop the `scenario` column from the data as it is just a label, but will not be passed into the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "training_data.pop('scenario')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "As we can see, scenario is no longer in the dataset:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "print(\"Column names:\")\n", - "print(training_data.columns.tolist()) # List all column names" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Next, we need to pull out our target variable (that is, the variable we want our model to predict). Here that is `tas_FINAL`, the temperature anomaly in 2050. The anomalies in every case are calculated by subtracting the annual means of the pre-industrial scenario from the annual means of the respective scenario of interest." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "target = training_data.pop('tas_FINAL')\n", - "target" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "*Note: we will need to repeat these preprocessing steps anytime we load this (or other) data.*" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Section 2: Fit Decision Tree and Random Forest" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now we can train our models. As mentioned in the video, Decision Trees and Random Forest Models can both do regression. Specifically:\n", - "\n", - "***Decision Tree Regression***: \n", - "* Decision trees recursively partition the feature space into regions based on feature values to predict the target variable.\n", - "* Each leaf node represents a prediction.\n", - "* Single trees can be prone to capturing noise in the data (not what we want!). \n", - "\n", - "***Random Forest Regression***: \n", - "* An ensemble method that combines multiple decision trees to improve predictive performance.\n", - "* Each tree is trained on a random subset of the data.\n", - "* Aggregates predictions of individual trees to improve performance.\n", - "* Typically more robust/doesn't capture noise.\n", - "\n", - "We will see an example of both here.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "First, let's train a single decision tree to try to predict 2050 temperature anomalies using 2015 temperature anomalies and emissions data. We can control the depth of our decision tree (which is the maximum number of splits performed), which we will set to 20 here." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 2.0: Scikit-learn\n", - "\n", - "In this and coming sub-sections, we will utilize [Scikit-learn](https://scikit-learn.org/stable/), commonly referred to as `sklearn`, a renowned Python library extensively employed for machine learning endeavors. It provides a comprehensive array of functions and tools tailored for various machine learning tasks. Specifically, we will concentrate on the [`DecisionTreeRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html) and [`RandomForestRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) modules offered by Scikit-learn." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 2.1: Training the Decision Tree and Analyzing the Results" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# instantiate the model:\n", - "dt_regressor = DecisionTreeRegressor(random_state=random_state,max_depth=20)\n", - "\n", - "# fit/train the model with the data:\n", - "dt_regressor.fit(training_data, target) #pass in the model inputs and the target it should predict" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "We've trained our first model! Now let's see how well it performs. As discussed in the video, we will use the coefficient of determination (also known as the R-squared value, $R^2$) as the measure of how well the model is doing.\n", - "\n", - "We can get this value by calling the `score` function and providing the data we want the score calculated on. Here we will evaluate the model on the same data it was trained on." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "
\n", - " Learn more about the R-Squared value and Coefficient of determination \n", - "\n", - "\n", - " The **R-squared** value indicates the proportion of the variance in the target variable that is predicted from the model.\n", - "\n", - "Specifically, the ***coefficient of determination*** is calculated using the formula:\n", - "\n", - "$$\n", - "\\color{#3182CE}{R^2} = 1 - \\frac{\\color{#DC3912}{SS_{\\text{residual}}}}{\\color{#FF9900}{SS_{\\text{total}}}}\n", - "$$\n", - "\n", - "where:\n", - "- $\\color{#FF9900}{SS_{\\text{total}}}$ represents the total sum of squares, calculated as the sum of squared differences between the target variable $\\color{#2CA02C}{y}$ and its mean $\\color{#2CA02C}{\\bar{y}}$:\n", - "\n", - "$$\n", - "\\color{#FF9900}{SS_{\\text{total}}} = \\sum_{i=1}^{n} (\\color{#2CA02C}{y_i} - \\color{#2CA02C}{\\bar{y}})^2\n", - "$$\n", - "\n", - "- $\\color{#DC3912}{SS_{\\text{residual}}}$ denotes the residual sum of squares, computed as the sum of squared differences between the observed target values $\\color{#2CA02C}{y}$ and the predicted values $\\color{#FF5733}{\\hat{y}}$ provided by the model:\n", - "\n", - "$$\n", - "\\color{#DC3912}{SS_{\\text{residual}}} = \\sum_{i=1}^{n} (\\color{#2CA02C}{y_i} - \\color{#FF5733}{\\hat{y}_i})^2\n", - "$$\n", - "\n", - "The $\\color{#3182CE}{R^2}$ score thus quantifies the proportion of variance in the target variable that is predictable from the independent variables in the model.\n", - "\n", - "This value ranges from 0 to 1, where 1 indicates a perfect fit, meaning the model explains all the variability in the target variable.\n", - "
\n", - "\n", - "---" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "dt_regressor.score(training_data, target)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "\n", - "Now, let's create a scatter plot to compare the true temperature anomaly values in 2050 to those predicted by the model:\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Scatter Plot: Predicted vs. True Temperatures for Decision Tree\n", - "\n", - "# Get predicted values\n", - "predicted = dt_regressor.predict(training_data)\n", - "\n", - "# Create scatter plot\n", - "plt.scatter(predicted, target, color='b', label='Comparison of Predicted and True Temperatures')\n", - "plt.plot([0, 4], [0, 4], color='r', label='Ideal Line') # Add a diagonal line for reference\n", - "plt.xlabel('Predicted Temperatures (K)')\n", - "plt.ylabel('True Temperatures (K)')\n", - "plt.title('Annual mean temperature anomaly', fontsize=14)\n", - "\n", - "# Add a caption with adjusted y-coordinate to create space\n", - "caption_text = 'The anomalies are calculated by subtracting the annual means of the pre-industrial scenario from \\nthe annual means of the respective scenario.'\n", - "plt.figtext(0.5, -0.03, caption_text, ha='center', fontsize=10) # Adjusted y-coordinate to create space\n", - "\n", - "plt.legend()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "
\n", - " What can we conclude from this score and the scatter plot?
\n", - "First, pause and think by yourself. Then, compare it with the information provided here:\n", - "
\n", - "\n", - "As we can see, the model achieves a high score of ~0.9984 on the training data. This indicates that the model can explain approximately 99.84% of the variance in the target variable based on the features in the training dataset. Such a high score suggests that the model fits the training data very well and can effectively capture the underlying patterns or relationships between the features and the target variable. We can see the close alignment between the true value and the value predicted by the model in the plot.\n", - "\n", - "However, it's essential to note that achieving a high score on the training data does not guarantee the model's performance on unseen data (i.e., the test or validation datasets). We will explore this more in the next tutorial.\n", - "
\n", - "\n", - "---" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Interactive Demo 2.1: Variation in Performance with depth | Visualizing Decision Trees and Scatter plot\n", - "\n", - "In this interactive demo, we'll visualize decision trees using a widget. This widget enables interactive exploration of decision trees by adjusting two parameters: \n", - "`max_depth` controls the tree's complexity during training, while `dt_vis_depth` determines the depth of the tree to visualize. It dynamically trains a decision tree regressor based on `max_depth`, evaluates its performance with a scatter plot, and visualizes the tree structure up to `dt_vis_depth` using the plot_tree function. \n", - "This allows users to balance model complexity and interpretability, gaining insights into how different depths affect predictive accuracy and tree structure." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @markdown Make sure you execute this cell to enable the widget!\n", - "# Don't worry about understanding this code! It's to set up an interactive plot.\n", - "\n", - "# Function to train decision tree and display scatter plot\n", - "def train_and_plot(max_depth, visualize_depth):\n", - " global dt_regressor, training_data\n", - "\n", - " # Instantiate and train the decision tree regressor\n", - " dt_regressor = DecisionTreeRegressor(max_depth=max_depth)\n", - " dt_regressor.fit(training_data, target)\n", - "\n", - " # Calculate and print the score\n", - " score = dt_regressor.score(training_data, target)\n", - " print(f\"Model Score: {score}\")\n", - " print(f\"Please wait for ~{visualize_depth+visualize_depth/2} sec for the figure to render\")\n", - " # Generate scatter plot: Predicted vs. True Temperatures\n", - " predicted = dt_regressor.predict(training_data)\n", - " fig, axes = plt.subplots(1, 2, figsize=(15+pow(1.3,visualize_depth), 6+pow(1.2,visualize_depth)), gridspec_kw={'width_ratios': [1, 1+visualize_depth/4]})\n", - "\n", - " # Scatter plot\n", - " axes[0].scatter(predicted, target, color='blue', alpha=0.7, label='Comparison of Predicted and True Temperatures', edgecolors='black')\n", - " axes[0].plot([min(target), max(target)], [min(target), max(target)], color='red', linestyle='--', label='Ideal Prediction Line')\n", - " axes[0].set_xlabel('Predicted Temperature (K)', fontsize=12)\n", - " axes[0].set_ylabel('True Temperature (K)', fontsize=12)\n", - " axes[0].set_title('Annual mean temperature anomaly', fontsize=14)\n", - " axes[0].legend()\n", - " axes[0].grid(True)\n", - "\n", - " # Decision tree visualization\n", - " plot_tree(dt_regressor, feature_names=training_data.columns, filled=True, fontsize=8, max_depth=visualize_depth, ax=axes[1])\n", - " axes[1].set_title(f'Decision Tree Visualization (Train_max_depth = {max_depth}, dt_visualize_depth = {visualize_depth})')\n", - "\n", - " plt.tight_layout()\n", - " plt.show()\n", - "\n", - "# Interactive widget to control max_depth\n", - "@widgets.interact(max_depth=(1, 31, 1), dt_vis_depth=(1, 10, 1))\n", - "def visualize_tree_with_max_depth(max_depth=20, dt_vis_depth=3):\n", - " train_and_plot(max_depth, dt_vis_depth)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Interactive Demo 2.1 Discussion\n", - "\n", - "1. How does changing the max_depth parameter affect the decision tree's predictive accuracy and complexity? \n", - "\n", - "2. What insights can be gained by visualizing the decision tree at different depths (dt_vis_depth)?\n", - "\n", - "3. What patterns or trends do you observe in the residuals (differences between predicted and true temperatures) on the scatter plot? How can these insights guide adjustments to improve the model's predictive accuracy?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# to_remove explanation\n", - "\n", - "\"\"\"\n", - "Discussion:\n", - "1. Adjusting the `max_depth` parameter influences the complexity of the decision tree model.\n", - "Increasing `max_depth` may lead to a more complex model that can capture intricate patterns in the training data,\n", - "potentially resulting in higher predictive accuracy. (However, as we will discuss in the next tutorial, this can also increase the risk of overfitting,\n", - "where the model learns noise in the training data instead of true patterns, leading to poor generalization to unseen data.)\n", - "\n", - "2. Visualizing the decision tree at different depths (`dt_vis_depth`) provides insights into the hierarchy of features\n", - "and decision-making process within the model. Lower depths reveal high-level splits that capture broader patterns in the data,\n", - "while higher depths expose finer details and nuances. By adjusting `dt_vis_depth`,\n", - "users can focus on specific branches of the tree, uncovering key decision points and feature interactions.\n", - "This exploration helps in understanding how the model makes predictions and identifying influential features in the dataset.\n", - "\n", - "3. By examining the scatter plot, we can identify any consistent patterns or trends in the residuals, indicating potential systematic errors\n", - "or biases in the model's predictions. These observations can inform adjustments to the model, such as incorporating additional features\n", - "or refining existing ones, to enhance its accuracy. Identifying outliers\n", - "or clusters of residuals also highlights areas where the model may struggle to generalize,\n", - "suggesting targeted improvements for better performance.\n", - "\"\"\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 2.2: Training the Random forest and Analyzing the Results" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now we will train an ensemble of decisions trees, known as a random forest. For this we can use the built-in `RandomForestRegressor` from the [sklearn.ensemble.RandomForestRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html), which we have already imported." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "rf_regressor = RandomForestRegressor(random_state=random_state)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "The line of code creates a random forest regressor object named `rf_regressor`. This regressor is configured to use a specified `random_state` parameter, ensuring that the random number generation process within the algorithm is consistent across different runs. This helps maintain reproducibility in our experiments and ensures consistent results." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now you will train the model on our data and calculate its score on the same data. Create a plot like the one above in order to visually inspect its performance" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Coding Exercise 2.2: Model Training and Performance Visualization of Ranodm Forest\n", - "\n", - "In this exercise, you will train a random forest regressor model on your data and evaluate its performance by calculating its score on the same data. Additionally, you will create a scatter plot to visually inspect its performance." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "```python\n", - "def fit_and_visualize_rf(training_data, target):\n", - " \"\"\"Fit a random forest regressor to the training data and visualize the results.\n", - "\n", - " Args:\n", - " training_data (array-like): Input data for training the model.\n", - " target (array-like): Target variable for training the model.\n", - "\n", - " Returns:\n", - " None\n", - " \"\"\"\n", - " #################################################\n", - " ## TODO for students: Fit the random forest regressor and visualize the results ##\n", - " # Remove the following line of code once you have completed the exercise:\n", - " raise NotImplementedError(\"Student exercise: Fit the random forest regressor and visualize the results.\")\n", - " #################################################\n", - "\n", - " # fit the random forest regressor to the training data\n", - " _ = ...\n", - "\n", - " # print the R-squared score of the model\n", - " print('...')\n", - "\n", - " # predict the target variable using the trained model\n", - " predicted = rf_regressor.predict(training_data)\n", - "\n", - " # Create scatter plot\n", - " plt.scatter(predicted,target,color='b',label='Comparison of Predicted and True Temperatures')\n", - " plt.plot([0,4],[0,4],color='r', label='Ideal Line') # add a diagonal line for reference\n", - " plt.xlabel('Predicted Temperatures (K)')\n", - " plt.ylabel('True Temperatures (K)')\n", - " plt.legend()\n", - " plt.title('Annual mean temperature anomaly')\n", - " # add a caption with adjusted y-coordinate to create space\n", - " caption_text = 'The anomalies are calculated by subtracting the annual means of the pre-industrial scenario from \\nthe annual means of the respective scenario.'\n", - " plt.figtext(0.5, -0.03, caption_text, ha='center', fontsize=10) # adjusted y-coordinate to create space\n", - "\n", - "# test your function\n", - "_ = fit_and_visualize_rf(training_data, target)\n", - "\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# to_remove solution\n", - "\n", - "def fit_and_visualize_rf(training_data, target):\n", - " \"\"\"Fit a random forest regressor to the training data and visualize the results.\n", - "\n", - " Args:\n", - " training_data (array-like): Input data for training the model.\n", - " target (array-like): Target variable for training the model.\n", - "\n", - " Returns:\n", - " None\n", - " \"\"\"\n", - "\n", - " # fit the random forest regressor to the training data\n", - " _ = rf_regressor.fit(training_data, target)\n", - "\n", - " # print the R-squared score of the model\n", - " print(rf_regressor.score(training_data, target))\n", - "\n", - " # predict the target variable using the trained model\n", - " predicted = rf_regressor.predict(training_data)\n", - "\n", - " # Create scatter plot\n", - " plt.scatter(predicted,target,color='b',label='Comparison of Predicted and True Temperatures')\n", - " plt.plot([0,4],[0,4],color='r', label='Ideal Line') # add a diagonal line for reference\n", - " plt.xlabel('Predicted Temperatures (K)')\n", - " plt.ylabel('True Temperatures (K)')\n", - " plt.legend()\n", - " plt.title('Annual mean temperature anomaly')\n", - " # add a caption with adjusted y-coordinate to create space\n", - " caption_text = 'The anomalies are calculated by subtracting the annual means of the pre-industrial scenario from \\nthe annual means of the respective scenario.'\n", - " plt.figtext(0.5, -0.03, caption_text, ha='center', fontsize=10) # adjusted y-coordinate to create space\n", - "\n", - "# test your function\n", - "_ = fit_and_visualize_rf(training_data, target)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "It seems like our models are performing very well! Let's think a bit more in the next tutorial about what else we should do to evaluate our models...\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Summary\n", - "\n", - "Estimated timing of tutorial: 35 minutes\n", - "\n", - "In this tutorial, we delved into Random Forest Models and their application in climate prediction. We gained an understanding of regression and how Random Forests combine decision trees to improve predictive accuracy. Through practical exercises, we learned how to evaluate model performance and implement Random Forests using tools like scikit-learn.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Resources" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "* [ClimateBench v1.0: A Benchmark for Data-Driven Climate Projections](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) " - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Tutorial2", - "provenance": [], - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.19" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial3.ipynb b/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial3.ipynb deleted file mode 100644 index 5760b60a1..000000000 --- a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial3.ipynb +++ /dev/null @@ -1,656 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/W2D5_Tutorial3.ipynb)   \"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial 3: Testing Model Generalization\n", - "\n", - "**Week 2, Day 5, AI and Climate Change**\n", - "\n", - "__Content creators:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Content reviewers:__ Mujeeb Abdulfatai, Nkongho Ayuketang Arreyndip, Jeffrey N. A. Aryee, Paul Heubel, Jenna Pearson, Abel Shibu\n", - "\n", - "__Content editors:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Production editors:__ Konstantine Tsafatinos\n", - "\n", - "**Our 2024 Sponsors:** CMIP, NFDI4Earth" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial Objectives\n", - "\n", - "*Estimated timing of tutorial:* 25 minutes\n", - "\n", - "In this tutorial, you will\n", - "* Understand the problem of overfitting\n", - "* Understand generalization\n", - "* Learn to split data into train and test data\n", - "* Evaluate trained models on held-out test data\n", - "* Think about the relationship between model capacity and overfitting\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {}, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# imports:\n", - "\n", - "import pandas as pd # For data manipulation\n", - "from sklearn.model_selection import train_test_split # For splitting dataset into train and test sets\n", - "from sklearn.ensemble import RandomForestRegressor # For Random Forest Regression\n", - "from sklearn.tree import DecisionTreeRegressor # For Decision Tree Regression" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Figure Settings\n", - "import ipywidgets as widgets # interactive display\n", - "import matplotlib.pyplot as plt\n", - "\n", - "%config InlineBackend.figure_format = 'retina'\n", - "plt.style.use(\n", - " \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Set random seed\n", - "\n", - "# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n", - "\n", - "# Call `set_seed` function in the exercises to ensure reproducibility.\n", - "import random\n", - "import numpy as np\n", - "\n", - "def set_seed(seed=None):\n", - " if seed is None:\n", - " seed = np.random.choice(2 ** 32)\n", - " random.seed(seed)\n", - " np.random.seed(seed)\n", - " print(f'Random seed {seed} has been set.')\n", - "\n", - "# Set a global seed value for reproducibility\n", - "random_state = 42 # change 42 with any number you like\n", - "\n", - "set_seed(seed=random_state)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Video 1: Testing model generalization\n", - "\n", - "from ipywidgets import widgets\n", - "from IPython.display import YouTubeVideo\n", - "from IPython.display import IFrame\n", - "from IPython.display import display\n", - "\n", - "\n", - "class PlayVideo(IFrame):\n", - " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", - " self.id = id\n", - " if source == 'Bilibili':\n", - " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", - " elif source == 'Osf':\n", - " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", - " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", - "\n", - "\n", - "def display_videos(video_ids, W=400, H=300, fs=1):\n", - " tab_contents = []\n", - " for i, video_id in enumerate(video_ids):\n", - " out = widgets.Output()\n", - " with out:\n", - " if video_ids[i][0] == 'Youtube':\n", - " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", - " height=H, fs=fs, rel=0)\n", - " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", - " else:\n", - " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", - " height=H, fs=fs, autoplay=False)\n", - " if video_ids[i][0] == 'Bilibili':\n", - " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", - " elif video_ids[i][0] == 'Osf':\n", - " print(f'Video available at https://osf.io/{video.id}')\n", - " display(video)\n", - " tab_contents.append(out)\n", - " return tab_contents\n", - "\n", - "\n", - "video_ids = [('Youtube', 'gPM64fog-dc'), ('Bilibili', ''), ('Osf', '')]\n", - "tab_contents = display_videos(video_ids, W=854, H=480)\n", - "tabs = widgets.Tab()\n", - "tabs.children = tab_contents\n", - "for i in range(len(tab_contents)):\n", - " tabs.set_title(i, video_ids[i][0])\n", - "display(tabs)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Tutorial slides\n", - "\n", - "# @markdown\n", - "from ipywidgets import widgets\n", - "from IPython.display import IFrame\n", - "\n", - "link_id = \"t48yb\"\n", - "\n", - "download_link = f\"https://osf.io/download/{link_id}/\"\n", - "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", - "# @markdown\n", - "out = widgets.Output()\n", - "with out:\n", - " print(f\"If you want to download the slides: {download_link}\")\n", - " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", - "display(out)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Section 1: Model generalization" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "As discussed in the video, machine learning models can *overfit*. This means they essentially memorize the data points they were trained on. This makes them perform very well on those data points, but when they are presented with data they weren't trained on their predictions are not very good. Therefore, we need to evaluate our models according to how well they perform on data they weren't trained on.\n", - "\n", - "To do this, we will split the data into training and testing sets. The training set will be used to train the model, while the testing set will be used to evaluate how well the model performs on unseen data. This helps us ensure that our model can generalize well to new data and avoid overfitting.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.1: Load and Prepare the Data\n", - "\n", - "As we've learned in the previous tutorial, here we load our dataset and prepare it by removing unnecessary columns and extracting the target variable `tas_FINAL`, representing temperature anomalies in 2050. The anomalies in every case are calculated by subtracting the annual means of the pre-industrial scenario from the annual means of the respective scenario of interest." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# Load and Prepare the Data\n", - "url_Climatebench_train_val = \"https://osf.io/y2pq7/download\" # Dataset URL\n", - "training_data = pd.read_csv(url_Climatebench_train_val) # Load the training data from the provided URL\n", - "training_data.pop('scenario') # drop the `scenario` column from the data as it is just a label, but will not be passed into the model.\n", - "target = training_data.pop('tas_FINAL') # Extract the target variable 'tas_FINAL' which we aim to predict" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.2: Data Splitting for Training and Testing\n", - "\n", - "Now, our primary objective is to prepare our dataset for model training and evaluation. To achieve this, we'll utilize the `train_test_split` function from Scikit-learn, which conveniently splits our dataset into training and testing subsets.\n", - "\n", - "To facilitate this process, we've imported the essential `train_test_split` function from Scikit-learn earlier in the code:\n", - "\n", - "```python\n", - "from sklearn.model_selection import train_test_split \n", - "```\n", - "\n", - "Our strategy involves randomly allocating 20% of the data for testing purposes, while reserving the remaining 80% for model training. This ensures that our model is evaluated on unseen data, which is crucial for assessing its real-world performance.\n", - "\n", - "With this function ready to use, let's seamlessly proceed to split our dataset and go ahead on the journey of model training and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# Split the data into training and testing sets\n", - "X_train, X_test, y_train, y_test = train_test_split(\n", - " training_data, target, test_size=0.2, random_state=1\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "We now have separated the input features (now called `X`) and the target variable (now called `y`) into a training set (`X_train`, `y_train`) and a test set (`X_test`, `y_test`)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.3: Train a decision tree model on the training data and evaluate it\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# Training the model on the training data\n", - "dt_regressor = DecisionTreeRegressor(random_state=random_state,max_depth=20)\n", - "dt_regressor.fit(X_train, y_train)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now we will evaluate the model on both the training and test data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "print('Performance on training data:', dt_regressor.score(X_train, y_train))\n", - "print('Performance on test data :', dt_regressor.score(X_test, y_test))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "We can see here that our model is overfitting: it is performing much better on the data it was trained on than on held-out test data." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.4: Train a random forest model on the testing data and evaluate it\n", - "\n", - "Use what you know to train a random forest model on the training data and evaluate it on both the training and test data.\n", - "We have already imported `RandomForestRegressor` in Setup section via\n", - "```python\n", - "from sklearn.ensemble import RandomForestRegressor \n", - "```\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "```python\n", - "def train_random_forest_model(X_train, y_train, X_test, y_test, random_state):\n", - " \"\"\"Train a Random Forest model and evaluate its performance.\n", - "\n", - " Args:\n", - " X_train (ndarray): Training features.\n", - " y_train (ndarray): Training labels.\n", - " X_test (ndarray): Test features.\n", - " y_test (ndarray): Test labels.\n", - " random_state (int): Random seed for reproducibility.\n", - "\n", - " Returns:\n", - " RandomForestRegressor: Trained Random Forest regressor model.\n", - " \"\"\"\n", - " #################################################\n", - " ## TODO for students: Train a random forest model on the testing data and evaluate it ##\n", - " # Implement training a RandomForestRegressor model using X_train and y_train\n", - " # Then, evaluate its performance on both training and test data using .score() method\n", - " # Print out the performance on training and test data\n", - " # Please remove the following line of code once you have completed the exercise:\n", - " raise NotImplementedError(\"Student exercise: Implement the training and evaluation process.\")\n", - " #################################################\n", - "\n", - " # Train the model on the training data\n", - " rf_regressor = RandomForestRegressor(random_state=random_state)\n", - "\n", - " # fit the model\n", - " _ = rf_regressor.fit(..., ...)\n", - "\n", - " print('Performance on training data :', rf_regressor.score(..., y_train))\n", - " print('Performance on test data :', rf_regressor.score(X_test, ...))\n", - "\n", - " return rf_regressor\n", - "\n", - "# test the function\n", - "rf_model = ...\n", - "\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# to_remove solution\n", - "\n", - "def train_random_forest_model(X_train, y_train, X_test, y_test, random_state):\n", - " \"\"\"Train a Random Forest model and evaluate its performance.\n", - "\n", - " Args:\n", - " X_train (ndarray): Training features.\n", - " y_train (ndarray): Training labels.\n", - " X_test (ndarray): Test features.\n", - " y_test (ndarray): Test labels.\n", - " random_state (int): Random seed for reproducibility.\n", - "\n", - " Returns:\n", - " RandomForestRegressor: Trained Random Forest regressor model.\n", - " \"\"\"\n", - "\n", - " # train the model on the training data\n", - " rf_regressor = RandomForestRegressor(random_state=random_state)\n", - "\n", - " # fit the model\n", - " _ = rf_regressor.fit(X_train, y_train)\n", - "\n", - " print('Performance on training data :', rf_regressor.score(X_train, y_train))\n", - " print('Performance on test data :', rf_regressor.score(X_test, y_test))\n", - "\n", - " return rf_regressor\n", - "\n", - "# test the function\n", - "rf_model = train_random_forest_model(X_train, y_train, X_test, y_test, random_state=42)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Question 1.4: Overfitting - Decision Tree vs Random Forest\n", - "\n", - "1. Does the random forest model overfit less than a single decision tree?\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# to_remove explanation\n", - "\n", - "\"\"\"\n", - "1. The difference between performance on training and test data is less for the random forest model therefore it overfits less.\n", - "This is consistent with what we learned about the benefit of using ensemble models.\n", - "\"\"\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.5: Explore Parameters of the Random Forest Model\n", - "\n", - "In the previous tutorial, you saw how we can control the depth of a single decision tree. \n", - "We can also control the depth of the decision trees used in our random forest model by passing a `max_depth` argument. We can also control the number of trees in the random forest model by setting `n_estimator`.\n", - "\n", - "Intuitively, these variables control the *capacity* of the model. Capacity loosely refers to the number of trainable parameters in the model. The more trees and the deeper they are, the more free parameters the model has to capture the training data. If the model has too low of capacity, it won't be powerful enough to capture complex relationships between the input features and the target variable. If it has too many parameters that it can move around, however, it may end up memorizing every single training point and therefore overfit.\n", - "\n", - "Use the sliders below to experiment with different values of `n_estimator` and `max_depth` and see how they impact performance on training and test data." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Interactive Demo 1.5: Performance of the Random Forest Regression\n", - "In this activity, you can adjust the sliders for `n_estimators` and `max_depth` to observe their effect on model performance:\n", - "\n", - "* `n_estimators`: Controls the number of trees in the Random Forest. \n", - "* `max_depth`: Sets the maximum depth of each tree. \n", - "After adjusting the sliders, the code fits a new Random Forest model and prints the training and testing scores, showing how changes in these parameters impact model performance." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Use the slider to change the values of 'n_estimators' and 'max_depth' and observe the effect on performance.\n", - "# @markdown Make sure you execute this cell to enable the widget!\n", - "\n", - "# Function to train random forest and display scatter plot\n", - "def train_rf_and_plot(X_tr, y_train, X_test, y_test, max_depth, n_estim):\n", - " global rf_regressor, X_train\n", - "\n", - " # Instantiate and train the decision tree regressor\n", - " rf_regressor = RandomForestRegressor(n_estimators=n_estim, max_depth=max_depth)\n", - " rf_regressor.fit(X_tr, y_train)\n", - "\n", - " # Calculate and print the scores\n", - " score_train = rf_regressor.score(X_tr, y_train)\n", - " score_test = rf_regressor.score(X_test, y_test)\n", - " print(f\"\\n\\tTraining Score: {score_train}\")\n", - " print(f\"\\tTesting Score : {score_test}\\n\")\n", - "\n", - " # Generate scatter plot: Predicted vs. True Temperatures\n", - " predicted = rf_regressor.predict(X_tr)\n", - "\n", - " fig, ax = plt.subplots()\n", - "\n", - " # Scatter plot\n", - " ax.scatter(predicted, y_train, color='blue', alpha=0.7, label='Comparison of Predicted and True Temperatures', edgecolors='black')\n", - " ax.plot([min(y_train), max(y_train)], [min(y_train), max(y_train)], color='red', linestyle='--', label='Ideal Prediction Line')\n", - " ax.set_xlabel('Predicted Temperature (K)')\n", - " ax.set_ylabel('True Temperature (K)')\n", - " ax.set_title('Annual mean temperature anomaly')\n", - " # add a caption\n", - " caption_text = 'The anomalies are calculated by subtracting the annual means of the pre-industrial scenario from \\nthe annual means of the respective scenario.'\n", - " plt.figtext(0.5, -0.03, caption_text, ha='center', fontsize=10) # Adjusted y-coordinate to create space\n", - " ax.legend()\n", - " ax.grid(True)\n", - "\n", - " plt.tight_layout()\n", - " plt.show()\n", - "\n", - "\n", - "# Interactive widget to control max_depth and n_estimators\n", - "@widgets.interact(max_depth=(1, 41, 1), n_estimators=(10,100,5))\n", - "def visualize_scores_with_max_depth(max_depth=20, n_estimators=50):\n", - " train_rf_and_plot(X_train, y_train, X_test, y_test, max_depth, n_estimators)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Interactive Demo 1.5: Discussion\n", - "\n", - "1. Did you observe any trends in how the performance changes? \n", - "2. Try to explain in you own words the concepts of capacity and overfitting and how they relate.\n", - "3. In addition to model capacity, what else could be changed to prevent overfitting?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# to_remove explanation\n", - "\n", - "\"\"\"\n", - "1. Observations: Adjusting `n_estimators` and `max_depth` may cause fluctuations in model performance. Increasing `n_estimators` initially improves performance, but too many trees may lead to overfitting. Similarly, increasing `max_depth` initially enhances performance by capturing complex patterns, but excessively deep trees may result in overfitting.\n", - "\n", - "2. Capacity and Overfitting: Capacity refers to a model's ability to capture complex patterns, while overfitting occurs when a model learns noise instead of true patterns. Increasing capacity, like using more trees or deeper trees, can lead to overfitting.\n", - "\n", - "3. Preventing Overfitting: Apart from adjusting model capacity, we could also consider training on a larger dataset. Machine learning techniques like regularization techniques, cross-validation, feature selection, and ensemble methods help prevent overfitting. These approaches ensure the model generalizes well to unseen data by balancing complexity and performance.\n", - "\n", - "\"\"\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Summary\n", - "\n", - "In this tutorial, we delved into the importance of training and testing sets in constructing robust machine learning models. Understanding the concept of overfitting and the necessity of using separate test sets for model assessment were pivotal. Through practical exercises, we acquired hands-on proficiency in data partitioning, model training, and performance evaluation.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Resources\n", - "\n", - "* [ClimateBench v1.0: A Benchmark for Data-Driven Climate Projections](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) \n" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Tutorial3", - "provenance": [], - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.19" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial4.ipynb b/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial4.ipynb deleted file mode 100644 index b0e4b1def..000000000 --- a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial4.ipynb +++ /dev/null @@ -1,678 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/W2D5_Tutorial4.ipynb)   \"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial 4: Testing Spatial Generalization\n", - "\n", - "**Week 2, Day 5, AI and Climate Change**\n", - "\n", - "__Content creators:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Content reviewers:__ Mujeeb Abdulfatai, Nkongho Ayuketang Arreyndip, Jeffrey N. A. Aryee, Paul Heubel, Jenna Pearson, Abel Shibu\n", - "\n", - "__Content editors:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Production editors:__ Konstantine Tsafatinos\n", - "\n", - "**Our 2024 Sponsors:** CMIP, NFDI4Earth" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial Objectives\n", - "\n", - "*Estimated timing of tutorial:* 20 minutes\n", - "\n", - "In this tutorial, you will: \n", - "* Learn the concept of within distribution generalization\n", - "* Test your model’s ability on a certain type of out-of-distribution data\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {}, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# imports:\n", - "\n", - "import matplotlib.pyplot as plt # For plotting graphs\n", - "import pandas as pd # For data manipulation\n", - "import xarray as xr\n", - "import cartopy.crs as ccrs\n", - "import cartopy.feature as cfeature\n", - "\n", - "# import specific machine learning models and tools\n", - "from sklearn.model_selection import train_test_split # For splitting dataset into train and test sets\n", - "from sklearn.ensemble import RandomForestRegressor # For Random Forest Regression" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Figure Settings\n", - "import ipywidgets as widgets # interactive display\n", - "\n", - "%config InlineBackend.figure_format = 'retina'\n", - "plt.style.use(\n", - " \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Helper functions\n", - "\n", - "# Load and Prepare the Data\n", - "url_Climatebench_train_val = \"https://osf.io/y2pq7/download\" # Dataset URL\n", - "training_data = pd.read_csv(url_Climatebench_train_val) # Load the training data from the provided URL\n", - "training_data.pop('scenario') # Drop the 'scenario' column as it's just a label and won't be passed into the model\n", - "target = training_data.pop('tas_FINAL') # Extract the target variable 'tas_FINAL' which we aim to predict\n", - "\n", - "# Split the data into training and testing sets\n", - "X_train, X_test, y_train, y_test = train_test_split(training_data, target, test_size=0.2, random_state=1)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Set random seed\n", - "\n", - "# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n", - "\n", - "# Call `set_seed` function in the exercises to ensure reproducibility.\n", - "import random\n", - "import numpy as np\n", - "\n", - "def set_seed(seed=None):\n", - " if seed is None:\n", - " seed = np.random.choice(2 ** 32)\n", - " random.seed(seed)\n", - " np.random.seed(seed)\n", - " print(f'Random seed {seed} has been set.')\n", - "\n", - "# Set a global seed value for reproducibility\n", - "random_state = 42 # change 42 with any number you like\n", - "\n", - "set_seed(seed=random_state)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Plotting functions\n", - "# @markdown Run this cell to define plotting function we will be using in this code\n", - "\n", - "def visualize_decision_tree(X_train, y_train, X_test, y_test, dt_model):\n", - " # Plot decision tree and regression\n", - " plt.figure(figsize=(10, 5))\n", - "\n", - " # Plot Decision Tree\n", - " plt.subplot(1, 2, 1)\n", - " plt.scatter(X_train, y_train, color='blue', label='Training data')\n", - " plt.scatter(X_test, y_test, color='green', label='Test data')\n", - " plt.plot(np.sort(X_test, axis=0), dt_model.predict(np.sort(X_test, axis=0)), color='red', label='Model')\n", - " plt.title('Decision Tree Regression')\n", - " plt.xlabel('Feature')\n", - " plt.ylabel('Target')\n", - " plt.legend()\n", - "\n", - " # Plot Decision Tree\n", - " plt.subplot(1, 2, 2)\n", - " plot_tree(dt_model, filled=True)\n", - " plt.title(\"Decision Tree\")\n", - "\n", - " plt.tight_layout()\n", - " plt.show()\n", - "\n", - "def visualize_random_forest(X_train, y_train, X_test, y_test, rf_model):\n", - " num_trees = len(rf_model.estimators_)\n", - " num_cols = min(3, num_trees)\n", - " num_rows = (num_trees + num_cols - 1) // num_cols\n", - "\n", - " plt.figure(figsize=(15, 6 * num_rows))\n", - "\n", - " # Plot Random Forest Regression\n", - " plt.subplot(num_rows, num_cols, 1)\n", - " plt.scatter(X_train, y_train, color='blue', label='Training data')\n", - " plt.scatter(X_test, y_test, color='green', label='Test data')\n", - " plt.plot(np.sort(X_test, axis=0), rf_model.predict(np.sort(X_test, axis=0)), color='red', label='Model')\n", - " plt.title('Random Forest Regression')\n", - " plt.xlabel('Feature')\n", - " plt.ylabel('Target')\n", - " plt.legend()\n", - "\n", - " # Plot Decision Trees within Random Forest\n", - " for i, tree in enumerate(rf_model.estimators_):\n", - " plt.subplot(num_rows, num_cols, i + 2)\n", - " plot_tree(tree, filled=True)\n", - " plt.title(f\"Tree {i+1}\")\n", - "\n", - " plt.tight_layout()\n", - " plt.show()\n", - "\n", - "def plot_spatial_distribution(data, col_name, c_label):\n", - " \"\"\"\n", - " Plot the spatial distribution of a variable of interest.\n", - "\n", - " Args:\n", - " data (DataFrame): DataFrame containing latitude, longitude, and data of interest.\n", - " col_name (str): Name of the column containing data of interest.\n", - " c_label (str): Label to describe quantity and unit for the colorbar labeling.\n", - "\n", - " Returns:\n", - " None\n", - " \"\"\"\n", - " # create a xarray dataset from the pandas dataframe\n", - " # for convenient plotting with cartopy afterwards\n", - " ds = xr.Dataset({col_name: ('points', data[col_name])},\n", - " coords={'lon': ('points', data['lon']),\n", - " 'lat': ('points', data['lat'])}\n", - " )\n", - "\n", - " # create geoaxes\n", - " ax = plt.axes(projection=ccrs.PlateCarree())\n", - " ax.set_extent([0.95*min(ds.lon.values), 1.05*max(ds.lon.values), 0.95*min(ds.lat.values), 1.05*max(ds.lat.values)])\n", - "\n", - " # add coastlines\n", - " ax.coastlines()\n", - " ax.add_feature(cfeature.OCEAN, alpha=0.1)\n", - " # add state borders\n", - " ax.add_feature(cfeature.BORDERS, edgecolor='darkgrey')\n", - "\n", - " # plot the data\n", - " p = ax.scatter(ds['lon'], ds['lat'], c=ds[col_name], cmap='coolwarm', transform=ccrs.PlateCarree())\n", - "\n", - " # add a colorbar\n", - " cbar = plt.colorbar(p, orientation='vertical')\n", - " cbar.set_label(c_label)\n", - "\n", - " # add a grid and labels\n", - " ax.gridlines(draw_labels={\"bottom\": \"x\", \"left\": \"y\"})\n", - "\n", - " # add title\n", - " plt.title('Spatial Distribution of\\n Annual Mean Anomalies\\n')\n", - " plt.show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Video 1: Testing spatial generalization\n", - "\n", - "from ipywidgets import widgets\n", - "from IPython.display import YouTubeVideo\n", - "from IPython.display import IFrame\n", - "from IPython.display import display\n", - "\n", - "\n", - "class PlayVideo(IFrame):\n", - " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", - " self.id = id\n", - " if source == 'Bilibili':\n", - " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", - " elif source == 'Osf':\n", - " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", - " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", - "\n", - "\n", - "def display_videos(video_ids, W=400, H=300, fs=1):\n", - " tab_contents = []\n", - " for i, video_id in enumerate(video_ids):\n", - " out = widgets.Output()\n", - " with out:\n", - " if video_ids[i][0] == 'Youtube':\n", - " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", - " height=H, fs=fs, rel=0)\n", - " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", - " else:\n", - " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", - " height=H, fs=fs, autoplay=False)\n", - " if video_ids[i][0] == 'Bilibili':\n", - " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", - " elif video_ids[i][0] == 'Osf':\n", - " print(f'Video available at https://osf.io/{video.id}')\n", - " display(video)\n", - " tab_contents.append(out)\n", - " return tab_contents\n", - "\n", - "# curriculum or production team will provide these ids\n", - "video_ids = [('Youtube', 'U8mshdRYwuY'), ('Bilibili', ''), ('Osf', '')]\n", - "tab_contents = display_videos(video_ids, W=854, H=480)\n", - "tabs = widgets.Tab()\n", - "tabs.children = tab_contents\n", - "for i in range(len(tab_contents)):\n", - " tabs.set_title(i, video_ids[i][0])\n", - "display(tabs)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Tutorial slides\n", - "\n", - "# @markdown\n", - "from ipywidgets import widgets\n", - "from IPython.display import IFrame\n", - "\n", - "link_id = \"26r8h\"\n", - "\n", - "download_link = f\"https://osf.io/download/{link_id}/\"\n", - "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", - "# @markdown\n", - "out = widgets.Output()\n", - "with out:\n", - " print(f\"If you want to download the slides: {download_link}\")\n", - " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", - "display(out)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "In the video, we discussed how we previously tested generalization to unseen data points from the same data distribution (i.e., same region and scenarios). \n", - "Now we will see if the model generalizes to data from a new region.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Section 1: Test generalization to held-out spatial locations" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.1: Load the New Testing Data\n", - "\n", - "We will take our random forest model that was trained on data from the region in the blue box and see if it can work well using lat/lon locations that come from the red box. We already have the data from the blue box region loaded, so now we just need to load the data from the red box.\n", - "\n", - "

W2D5_Tutorial4_map

" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# Loading the new Spatial test data\n", - "\n", - "url_spatial_test_data = \"https://osf.io/7tr49/download\" # location of test data\n", - "spatial_test_data = pd.read_csv(url_spatial_test_data) # Load spatial test data from the provided URL\n", - "spatial_test_data.pop('scenario') # drop the `scenario` column from the data as it is just a label, but will not be passed into the model.\n", - "spatial_test_target = spatial_test_data.pop('tas_FINAL') # extract the target variable 'tas_FINAL'\n", - "# display the prepared spatial test data\n", - "spatial_test_data" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "When we plot the temperature distribution over space, we can see that this dataset has a different range of latitude and longitude values than the initial dataset. We use a plotting function `plot_spatial_distribution()` that you completed in Coding Exercise 1.4 of Tutorial 1 that can be found in the *plotting function* of the Setup section." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# plot spatial distribution of temperature anomalies for 2015\n", - "col_name = 'tas_2015'\n", - "c_label = 'Temperature (K) in 2015'\n", - "plot_spatial_distribution(spatial_test_data, col_name, c_label)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.2: Evaluate the model" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "We've been playing around with the random forest model parameters. To make sure we know what model we are evaluating, let's train it again here on the training data specifically with `n_estimators = 80` and `max_depth = 50`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "rf_regressor = RandomForestRegressor(random_state=42, n_estimators=80, max_depth=50)\n", - "# Train the model on the training data\n", - "rf_regressor.fit(X_train, y_train)\n", - "train_score = rf_regressor.score(X_train,y_train)\n", - "test_score = rf_regressor.score(X_test,y_test)\n", - "print( \"Training Set Score : \", train_score)\n", - "print( \" Test Set Score : \", test_score)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now that the model has been trained on data from the blue box region, let's test how well it performs on data from the red box region" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "spatial_test_score = rf_regressor.score(spatial_test_data,spatial_test_target)\n", - "print( \"Spatial Test Data Score : \", spatial_test_score)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now it is your turn: Make a scatter plot of the predicted vs true 2050 temperature values for this data, like you did in the last tutorials." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Coding Exercise 1.2: Scatter Plot for Spatial data\n", - "\n", - "In this exercise implement the `scatter_plot_predicted_vs_true()` function to evaluate the performance of a pre-trained Random Forest regressor model on a new emissions scenario and create a scatter plot of predicted vs. true temperature values." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "```python\n", - "def scatter_plot_predicted_vs_true(spatial_test_data, true_values):\n", - " \"\"\"Create a scatter plot of predicted vs true temperature values.\n", - "\n", - " Args:\n", - " spatial_test_data: Test features.\n", - " true_values (ndarray): True temperature values.\n", - "\n", - " Returns:\n", - " None\n", - " \"\"\"\n", - "\n", - " # make predictions using the random forest regressor\n", - " spatial_test_predicted = rf_regressor.predict(spatial_test_data)\n", - "\n", - " spatial_test_score = rf_regressor.score(spatial_test_data, true_values)\n", - " print(\"\\nSpatial Test Data Score:\", spatial_test_score)\n", - "\n", - " # implement plt.scatter() to compare predicted and true temperature values\n", - " _ = ...\n", - " # implement plt.plot() to plot the diagonal line y=x\n", - " _ = ...\n", - "\n", - " # aesthetics\n", - " plt.xlabel('Predicted Temperatures (K)')\n", - " plt.ylabel('True Temperatures (K)')\n", - " plt.title('Annual mean temperature anomaly')\n", - "\n", - " # add a caption with adjusted y-coordinate to create space\n", - " caption_text = 'The anomalies are calculated by subtracting the annual means of the pre-industrial scenario from \\nthe annual means of the respective scenario.'\n", - " plt.figtext(0.5, -0.03, caption_text, ha='center', fontsize=10) # Adjusted y-coordinate to create space\n", - " plt.legend(loc='upper left')\n", - " plt.show()\n", - "\n", - "# test your function\n", - "_ = scatter_plot_predicted_vs_true(spatial_test_data,spatial_test_target)\n", - "\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# to_remove solution\n", - "\n", - "def scatter_plot_predicted_vs_true(spatial_test_data, true_values):\n", - " \"\"\"Create a scatter plot of predicted vs true temperature values.\n", - "\n", - " Args:\n", - " spatial_test_data: Test features.\n", - " true_values (ndarray): True temperature values.\n", - "\n", - " Returns:\n", - " None\n", - " \"\"\"\n", - "\n", - " # make predictions using the random forest regressor\n", - " spatial_test_predicted = rf_regressor.predict(spatial_test_data)\n", - "\n", - " spatial_test_score = rf_regressor.score(spatial_test_data, true_values)\n", - " print(\"\\nSpatial Test Data Score:\", spatial_test_score)\n", - "\n", - " # implement plt.scatter() to compare predicted and true temperature values\n", - " _ = plt.scatter(spatial_test_predicted, true_values, color='b', label='Comparison of Predicted and True Temperatures')\n", - " # implement plt.plot() to plot the diagonal line y=x\n", - " _ = plt.plot([min(spatial_test_predicted), max(spatial_test_predicted)], [min(true_values), max(true_values)], color='r', label='Ideal Line')\n", - "\n", - " # aesthetics\n", - " plt.xlabel('Predicted Temperatures (K)')\n", - " plt.ylabel('True Temperatures (K)')\n", - " plt.title('Annual mean temperature anomaly')\n", - "\n", - " # add a caption with adjusted y-coordinate to create space\n", - " caption_text = 'The anomalies are calculated by subtracting the annual means of the pre-industrial scenario from \\nthe annual means of the respective scenario.'\n", - " plt.figtext(0.5, -0.03, caption_text, ha='center', fontsize=10) # Adjusted y-coordinate to create space\n", - " plt.legend(loc='upper left')\n", - " plt.show()\n", - "\n", - "# test your function\n", - "_ = scatter_plot_predicted_vs_true(spatial_test_data,spatial_test_target)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Question 1.2: Performance of the model for new spatial location data\n", - "\n", - "1. Have you observed the decrease in score? \n", - "2. What do you believe could be the cause of this? \n", - "3. What do you think would happen if the model was tested on an even farther away region, for example, in North America?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# to_remove explanation\n", - "\n", - "\"\"\"\n", - "1. Yes, there appears to be a decrease in score when the model is tested on new location data.\n", - "2. The decrease in score could be attributed to the model's inability to generalize well to new locations.\n", - " It's possible that the model has learned patterns specific to the training data but fails to capture the nuances present in the new location data.\n", - "3. If the model was tested on an even farther away region, such as North America,\n", - " we might expect the performance to deteriorate further. This is because the model was trained on data from a different geographical region,\n", - " and it may struggle to accurately predict temperatures in regions with vastly different climate patterns and environmental factors.\n", - "\"\"\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Summary\n", - "\n", - "In this tutorial, you investigated the generalization capacity of machine learning models to novel geographical regions. The process involved assessing model performance on spatial datasets from diverse locations, shedding light on the model's adaptability across varying environmental contexts.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Resources" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "* [ClimateBench v1.0: A Benchmark for Data-Driven Climate Projections](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) \n" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Tutorial4", - "provenance": [], - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.19" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial5.ipynb b/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial5.ipynb deleted file mode 100644 index 54dbb8a24..000000000 --- a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial5.ipynb +++ /dev/null @@ -1,748 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/W2D5_Tutorial5.ipynb)   \"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial 5: Testing generalization to new scenarios\n", - "\n", - "**Week 2, Day 5, AI and Climate Change**\n", - "\n", - "__Content creators:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Content reviewers:__ Mujeeb Abdulfatai, Nkongho Ayuketang Arreyndip, Jeffrey N. A. Aryee, Paul Heubel, Jenna Pearson, Abel Shibu\n", - "\n", - "__Content editors:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Production editors:__ Konstantine Tsafatinos\n", - "\n", - "**Our 2024 Sponsors:** CMIP, NFDI4Earth" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial Objectives\n", - "\n", - "*Estimated timing of tutorial:* 20 minutes\n", - "\n", - "In this tutorial, you will\n", - "* Learn about a different type of out-of-distribution test of our model\n", - "* Evaluate the model's performance\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {}, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# imports\n", - "\n", - "import matplotlib.pyplot as plt # For plotting graphs\n", - "import pandas as pd # For data manipulation\n", - "# # Import specific machine learning models and tools\n", - "from sklearn.model_selection import train_test_split # For splitting dataset into train and test sets\n", - "from sklearn.ensemble import RandomForestRegressor # For Random Forest Regression" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Figure Settings\n", - "import ipywidgets as widgets # interactive display\n", - "\n", - "%config InlineBackend.figure_format = 'retina'\n", - "plt.style.use(\n", - " \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Helper functions\n", - "\n", - "# Load and Prepare the Data\n", - "url_Climatebench_train_val = \"https://osf.io/y2pq7/download\" # Dataset URL\n", - "training_data = pd.read_csv(url_Climatebench_train_val) # load the training data from the provided URL\n", - "training_data.pop('scenario') # drop the 'scenario' column as it's just a label and won't be passed into the model\n", - "target = training_data.pop('tas_FINAL') # extract the target variable 'tas_FINAL' which we aim to predict\n", - "\n", - "url_spatial_test_data = \"https://osf.io/7tr49/download\" # test data with different location\n", - "spatial_test_data = pd.read_csv(url_spatial_test_data) # load spatial test data from the provided URL\n", - "spatial_test_data.pop('scenario') # drop the `scenario` column from the data as it is just a label, but will not be passed into the model.\n", - "spatial_test_target = spatial_test_data.pop('tas_FINAL') # extract the target variable 'tas_FINAL'\n", - "\n", - "# Split the data into training and testing sets: 80%/20%\n", - "X_train, X_test, y_train, y_test = train_test_split(training_data, target, test_size=0.2, random_state=1)\n", - "\n", - "# Training the model on the training data\n", - "rf_regressor = RandomForestRegressor(random_state=42, n_estimators=80, max_depth=50)\n", - "rf_regressor.fit(X_train, y_train)\n", - "\n", - "spatial_test_score = rf_regressor.score(spatial_test_data,spatial_test_target)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Set random seed\n", - "\n", - "# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n", - "\n", - "# Call `set_seed` function in the exercises to ensure reproducibility.\n", - "import random\n", - "import numpy as np\n", - "\n", - "def set_seed(seed=None):\n", - " if seed is None:\n", - " seed = np.random.choice(2 ** 32)\n", - " random.seed(seed)\n", - " np.random.seed(seed)\n", - " print(f'Random seed {seed} has been set.')\n", - "\n", - "# Set a global seed value for reproducibility\n", - "random_state = 42 # change 42 with any number you like\n", - "\n", - "set_seed(seed=random_state)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Video 1: Testing generalization to new scenarios\n", - "\n", - "from ipywidgets import widgets\n", - "from IPython.display import YouTubeVideo\n", - "from IPython.display import IFrame\n", - "from IPython.display import display\n", - "\n", - "\n", - "class PlayVideo(IFrame):\n", - " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", - " self.id = id\n", - " if source == 'Bilibili':\n", - " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", - " elif source == 'Osf':\n", - " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", - " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", - "\n", - "\n", - "def display_videos(video_ids, W=400, H=300, fs=1):\n", - " tab_contents = []\n", - " for i, video_id in enumerate(video_ids):\n", - " out = widgets.Output()\n", - " with out:\n", - " if video_ids[i][0] == 'Youtube':\n", - " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", - " height=H, fs=fs, rel=0)\n", - " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", - " else:\n", - " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", - " height=H, fs=fs, autoplay=False)\n", - " if video_ids[i][0] == 'Bilibili':\n", - " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", - " elif video_ids[i][0] == 'Osf':\n", - " print(f'Video available at https://osf.io/{video.id}')\n", - " display(video)\n", - " tab_contents.append(out)\n", - " return tab_contents\n", - "\n", - "# curriculum or production team will provide these ids\n", - "video_ids = [('Youtube', 'L860LmyPoSg'), ('Bilibili', ''), ('Osf', '')]\n", - "tab_contents = display_videos(video_ids, W=854, H=480)\n", - "tabs = widgets.Tab()\n", - "tabs.children = tab_contents\n", - "for i in range(len(tab_contents)):\n", - " tabs.set_title(i, video_ids[i][0])\n", - "display(tabs)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Video Summary : \n", - "* Discussed how we previously tested generalization to an unseen region. \n", - "* Stressed that the real utility of these emulators is the ability to run new scenarios. \n", - "* Now we will see if the model generalizes to data from a new scenario.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Tutorial slides\n", - "\n", - "# @markdown\n", - "from ipywidgets import widgets\n", - "from IPython.display import IFrame\n", - "\n", - "link_id = \"2rq8x\"\n", - "\n", - "download_link = f\"https://osf.io/download/{link_id}/\"\n", - "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", - "# @markdown\n", - "out = widgets.Output()\n", - "with out:\n", - " print(f\"If you want to download the slides: {download_link}\")\n", - " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", - "display(out)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Section 1: Test Generalization to Held-out Emissions Scenario" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.1: Load the New Testing (Scenario) Data\n", - "Load the new dataset and print it. As you can see, the scenario for all of these datapoints is ssp245. This scenario was not included in our initial data set. According to the scenario descriptions included in the table in Tutorial 1, ssp245 represent a \"medium forcing future scenario\". The lat/lon locations are the same as the initial dataset (blue box region)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "url_scenario_test_data = \"https://osf.io/pkbwx/download\" # Dataset URL\n", - "scenario_test_data = pd.read_csv(url_scenario_test_data) # Load scenario test data from the provided URL\n", - "scenario_test_data" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now we will prepare the data to be fed into the pre-trained model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "scenario_test_data.pop('scenario') # remove the 'scenario' column from the dataset\n", - "scenario_test_target = scenario_test_data.pop('tas_FINAL') # extract the target variable 'tas_FINAL'\n", - "scenario_test_data # display the prepared scenario test data" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.2: Evaluate the Model on this New (Scenario) Data" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now let's evaluate our pre-trained model (`rf_regressor`) to see how well it performs on this new emissions scenario. Use what you know to evaluate the performance and make a scatter plot of predicted vs. true temperature values." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "```python\n", - "def evaluate_and_plot_scenario_performance(rf_regressor, scenario_test_data, scenario_test_target):\n", - " \"\"\"Evaluate the performance of the pre-trained model on the new emissions scenario\n", - " and create a scatter plot of predicted vs. true temperature values.\n", - "\n", - " Args:\n", - " rf_regressor (RandomForestRegressor): Pre-trained Random Forest regressor model.\n", - " scenario_test_data (ndarray): Test features for the new emissions scenario.\n", - " scenario_test_target (ndarray): True temperature values of the new emissions scenario.\n", - "\n", - " Returns:\n", - " float: Score of the model on the scenario test data.\n", - " \"\"\"\n", - "\n", - " # predict temperature values for the new emissions scenario\n", - " scenario_test_predicted = ...\n", - "\n", - " # evaluate the model on the new emissions scenario\n", - " scenario_test_score = ...\n", - " print(\"Scenario Test Score:\", scenario_test_score)\n", - "\n", - " # implement plt.scatter() to compare predicted and true temperature values\n", - " plt.figure()\n", - " _ = ...\n", - " # implement plt.plot() to plot the diagonal line y=x\n", - " _ = ...\n", - "\n", - " # aesthetics\n", - " plt.xlabel('Predicted Temperatures (K)')\n", - " plt.ylabel('True Temperatures (K)')\n", - " plt.title('Annual mean temperature anomaly\\n(New Emissions Scenario)')\n", - " plt.grid(True)\n", - " plt.show()\n", - "\n", - " return scenario_test_score\n", - "\n", - "# test your function\n", - "scenario_test_score = evaluate_and_plot_scenario_performance(rf_regressor, scenario_test_data, scenario_test_target)\n", - "\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# to_remove solution\n", - "\n", - "def evaluate_and_plot_scenario_performance(rf_regressor, scenario_test_data, scenario_test_target):\n", - " \"\"\"Evaluate the performance of the pre-trained model on the new emissions scenario\n", - " and create a scatter plot of predicted vs. true temperature values.\n", - "\n", - " Args:\n", - " rf_regressor (RandomForestRegressor): Pre-trained Random Forest regressor model.\n", - " scenario_test_data (ndarray): Test features for the new emissions scenario.\n", - " scenario_test_target (ndarray): True temperature values of the new emissions scenario.\n", - "\n", - " Returns:\n", - " float: Score of the model on the new emissions scenario.\n", - " \"\"\"\n", - "\n", - " # predict temperature values for the new emissions scenario\n", - " scenario_test_predicted = rf_regressor.predict(scenario_test_data)\n", - "\n", - " # evaluate the model on the new emissions scenario\n", - " scenario_test_score = rf_regressor.score(scenario_test_data, scenario_test_target)\n", - " print(\"Scenario Test Score:\", scenario_test_score)\n", - "\n", - " # implement plt.scatter() to compare predicted and true temperature values\n", - " plt.figure()\n", - " _ = plt.scatter(scenario_test_predicted, scenario_test_target, color='b', label='Comparison of Predicted and True Temperatures')\n", - " # implement plt.plot() to plot the diagonal line y=x\n", - " _ = plt.plot([min(scenario_test_predicted), max(scenario_test_predicted)], [min(scenario_test_target), max(scenario_test_target)], color='r',label='Ideal Line')\n", - "\n", - " # aesthetics\n", - " plt.xlabel('Predicted Temperatures (K)')\n", - " plt.ylabel('True Temperatures (K)')\n", - " plt.title('Annual mean temperature anomaly\\n(New Emissions Scenario)')\n", - " plt.grid(True)\n", - " plt.show()\n", - "\n", - " return scenario_test_score\n", - "\n", - "# test your function\n", - "scenario_test_score = evaluate_and_plot_scenario_performance(rf_regressor, scenario_test_data, scenario_test_target)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# to_remove explanation\n", - "\n", - "''' For TAs:\n", - "If the following error arises: \"ValueError: The feature names should match those that were passed during fit. Feature names unseen at fit time:- tas_FINAL\"\n", - "\n", - "Re-run all cells from the beginning, e.g. by clicking 'Run' -> 'Run All Above Selected Cell' in Jupyter Lab.\n", - "'''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Question 1.2: Performance of the Model on New Scenario Data\n", - "\n", - "1. Again, have you observed a decrease in the score? \n", - "2. What do you believe could be the cause of this? \n", - "3. What kind of new scenarios might the model perform better for?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# to_remove explanation\n", - "\n", - "\"\"\"\n", - "1. Yes, there appears to be a decrease in score when the model is tested on new scenario data, though it is still well-above 0 suggesting the model has learned something about predicting temperature from emissions.\n", - "2. The decrease in score could be due to the model's inability to generalize well to new scenarios. It's possible that the model was trained on a specific set of scenarios and may not perform as accurately when presented with new, unseen scenarios. Factors such as differences in data distribution, environmental conditions, or other variables not captured in the training data could contribute to this decrease in performance.\n", - "3. The model might perform better for new scenarios that are similar to the ones it was trained on. Additionally, if the new scenarios have data distributions and patterns that are more aligned with the training data, the model could potentially perform better. However, it's important to note that the model's performance on new scenarios will ultimately depend on how well it can adapt and generalize to the differences present in the new data.\n", - "\"\"\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "For the sake of clarity let's summarize all the result." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# summarize results\n", - "train_score = rf_regressor.score(X_train, y_train)\n", - "test_score = rf_regressor.score(X_test, y_test)\n", - "average_score = (train_score + test_score + spatial_test_score + scenario_test_score) / 4\n", - "\n", - "print(f\"\\tTraining Data Score : {train_score}\")\n", - "print(f\"\\tTesting Data Score on same Scenario/Region : {test_score}\")\n", - "print(f\"\\tHeld-out Spatial Region Test Score : {spatial_test_score}\")\n", - "print(f\"\\tHeld-out Scenario Test Score : {scenario_test_score}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "This shows us that the model does generalize somewhat (i.e. the score is well above zero even in the new regions and in the new scenario). However, it does not generalize very well. That is, it does not perform as well on data that differs from the data it was trained on. Ideally, we would be able to build a model that inherently learns the complex relationship between emissions scenarios and future temperature. A model that truly learned this relationship would be able to generalize to new scenarios and regions." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "\n", - "Do you have any ideas of how to build a better machine learning model to emulate climate models? Many scientists are working on this problem!" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Bonus Section 2: Try other Regression Models\n", - "\n", - "*Only complete this section if you are well ahead of schedule, or have already completed the final tutorial.*\n", - "\n", - "Random Forest models are not the only regression models that could be applied to this problem. In this code, we will use scikit-learn to train and evaluate various regression models on the Climate Bench dataset. We will load the data, split it, define models, train them with different settings, and evaluate their performance. We will calculate and print average scores for each model configuration and identify the best-performing model.\n", - "\n", - "For more information about the models used here and various other models, you can refer to [scikit-learn.org/stable/supervised_learning.html#supervised-learning](https://scikit-learn.org/stable/supervised_learning.html#supervised-learning). \n", - "*Note: the following cell may take ~2 minutes to run.*" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# Import necessary libraries\n", - "import matplotlib.pyplot as plt\n", - "from sklearn.pipeline import make_pipeline\n", - "from sklearn.preprocessing import StandardScaler\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, BaggingRegressor\n", - "from sklearn.svm import SVR\n", - "from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet\n", - "from sklearn.linear_model import RidgeCV\n", - "import pandas as pd\n", - "from sklearn.neural_network import MLPRegressor\n", - "\n", - "# Load datasets\n", - "train_val_data = pd.read_csv(\"https://osf.io/y2pq7/download\")\n", - "spatial_test_data = pd.read_csv(\"https://osf.io/7tr49/download\")\n", - "scenario_test_data = pd.read_csv(\"https://osf.io/pkbwx/download\")\n", - "\n", - "# Pop the 'scenario' column from all datasets\n", - "train_val_data.pop('scenario')\n", - "spatial_test_data.pop('scenario')\n", - "scenario_test_data.pop('scenario')\n", - "\n", - "# Split train_val_data into training and testing sets\n", - "X_train, X_test, y_train, y_test = train_test_split(train_val_data.drop(columns=[\"tas_FINAL\"]),\n", - " train_val_data[\"tas_FINAL\"],\n", - " test_size=0.2,\n", - " random_state=1)\n", - "\n", - "# Define models with different configurations\n", - "models = {\n", - " \"MLP\": [make_pipeline(StandardScaler(), MLPRegressor(hidden_layer_sizes=(50,), max_iter=1000)),\n", - " make_pipeline(StandardScaler(), MLPRegressor(hidden_layer_sizes=(500, 500, 500), random_state=1, max_iter=1000))],\n", - " \"RandomForest\": [make_pipeline(StandardScaler(), RandomForestRegressor(n_estimators=100, max_depth=None)),\n", - " make_pipeline(StandardScaler(), RandomForestRegressor(n_estimators=50, max_depth=10))],\n", - " \"GradientBoosting\": [make_pipeline(StandardScaler(), GradientBoostingRegressor(n_estimators=100, max_depth=3)),\n", - " make_pipeline(StandardScaler(), GradientBoostingRegressor(n_estimators=50, max_depth=2))],\n", - " \"BaggingRegressor\": [make_pipeline(StandardScaler(), BaggingRegressor(n_estimators=100)),\n", - " make_pipeline(StandardScaler(), BaggingRegressor(n_estimators=50))],\n", - " \"SVR\": [make_pipeline(StandardScaler(), SVR(kernel=\"linear\")),\n", - " make_pipeline(StandardScaler(), SVR(kernel=\"rbf\"))],\n", - " \"LinearRegression\": [make_pipeline(StandardScaler(), LinearRegression())],\n", - " \"Ridge\": [make_pipeline(StandardScaler(), Ridge())],\n", - " \"RidgeCV\":[RidgeCV(alphas=[167], cv=5)],\n", - " \"Lasso\": [make_pipeline(StandardScaler(), Lasso())],\n", - " \"ElasticNet\": [make_pipeline(StandardScaler(), ElasticNet())]\n", - "}\n", - "\n", - "# Train models and calculate score for each configuration\n", - "results = {}\n", - "for model_name, model_list in models.items():\n", - " model_results = []\n", - " for config_num, model in enumerate(model_list): # Add enumeration for configuration number\n", - " # Train model\n", - " model.fit(X_train, y_train)\n", - "\n", - " # Calculate scores\n", - " train_score = model.score(X_train, y_train)\n", - " test_score = model.score(X_test, y_test)\n", - " spatial_test_score = model.score(spatial_test_data.drop(columns=[\"tas_FINAL\"]), spatial_test_data[\"tas_FINAL\"])\n", - " scenario_test_score = model.score(scenario_test_data.drop(columns=[\"tas_FINAL\"]), scenario_test_data[\"tas_FINAL\"])\n", - "\n", - " # Append results\n", - " model_results.append({\n", - " \"Configuration\": config_num, # Add configuration number\n", - " \"Training Score\": train_score,\n", - " \"Testing Score\": test_score,\n", - " \"Spatial Test Score\": spatial_test_score,\n", - " \"Scenario Test Score\": scenario_test_score\n", - " })\n", - "\n", - " # Calculate average score for the model\n", - " average_score = sum(sum(result.values()) for result in model_results) / (len(model_results) * 4)\n", - "\n", - " # Store results including average score\n", - " results[model_name] = {\"Average Score\": average_score, \"Results\": model_results}\n", - "\n", - "# Print results including average score for each model\n", - "for model_name, model_data in results.items():\n", - " print(f\"Model:\\t{model_name}\")\n", - " print(f\"Average Score:\\t\\t\\t\\t {model_data['Average Score']}\")\n", - " print(\"Configuration-wise Average Scores:\")\n", - " for result in model_data['Results']:\n", - " print(f\"\\nConfiguration {result['Configuration']}: \"\n", - " f\"\\nTraining Score: {result['Training Score']}, \"\n", - " f\"\\nTesting Score: {result['Testing Score']}, \"\n", - " f\"\\nSpatial Test Score: {result['Spatial Test Score']}, \"\n", - " f\"\\nScenario Test Score: {result['Scenario Test Score']}\")\n", - " print()\n", - "\n", - "# Find the best model and its average score\n", - "best_model = max(results, key=lambda x: results[x][\"Average Score\"])\n", - "best_average_score = results[best_model][\"Average Score\"]\n", - "\n", - "# Print the best model and its average score\n", - "print(f\"\\nBest Model: {best_model}, Average Score: {best_average_score}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Let's plot the result. \n", - "*Note: This code will plot the actual score for positive average scores and zero for negative average scores.*" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title\n", - "# @markdown Run this cell to see the plot of results!\n", - "\n", - "import matplotlib.pyplot as plt\n", - "\n", - "# Extract model names and average scores from results\n", - "model_names = list(results.keys())\n", - "average_scores = [results[model_name][\"Average Score\"] for model_name in model_names]\n", - "\n", - "# Adjust scores to plot zero for negative scores\n", - "adjusted_scores = [score if score > 0 else 0 for score in average_scores]\n", - "\n", - "# Plotting\n", - "plt.figure()\n", - "plt.bar(model_names, adjusted_scores, color=['skyblue' if score > 0 else 'lightgray' for score in average_scores])\n", - "plt.xlabel('Model')\n", - "plt.ylabel('Average Score')\n", - "plt.title('Average Score of Different Regression Models')\n", - "plt.xticks(rotation=45, ha='right') # Rotate x-axis labels for better readability\n", - "plt.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "This quick sweep of models suggests Random Forest is a good choice, but recall that most of these models have hyperparameters. Varying these hyperparameters may lead to different results!\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Summary\n", - "\n", - "In this tutorial, we explored how machine learning models adapt to unfamiliar emissions scenarios. Evaluating model performance on datasets representing different emission scenarios provided insights into the models' capabilities in predicting climate variables under diverse environmental conditions.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Resources\n", - "\n", - "* [ClimateBench v1.0: A Benchmark for Data-Driven Climate Projections](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) \n", - "* [Scikit-learn.org, Supervised Learning](https://scikit-learn.org/stable/supervised_learning.html#supervised-learning)" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Tutorial5", - "provenance": [], - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.19" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial6.ipynb b/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial6.ipynb deleted file mode 100644 index 72acc9f10..000000000 --- a/tutorials/W2D4_AIandClimateChange/instructor/W2D5_Tutorial6.ipynb +++ /dev/null @@ -1,292 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/W2D5_Tutorial6.ipynb)   \"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial 6: Exploring other applications\n", - "\n", - "**Week 2, Day 5, AI and Climate Change**\n", - "\n", - "__Content creators:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Content reviewers:__ Mujeeb Abdulfatai, Nkongho Ayuketang Arreyndip, Jeffrey N. A. Aryee, Paul Heubel, Jenna Pearson, Abel Shibu\n", - "\n", - "__Content editors:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Production editors:__ Konstantine Tsafatinos\n", - "\n", - "**Our 2024 Sponsors:** CMIP, NFDI4Earth" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial Objectives\n", - "\n", - "*Estimated timing of tutorial:* 40 minutes\n", - "\n", - "In this tutorial, you will\n", - "* Discuss the many ways AI/machine learning can be applied to problems related to climate change\n", - "* Learn about resources in this domain\n", - "* Discuss issues when deploying an AI system on real problems\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {}, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# imports\n", - "import matplotlib.pyplot as plt" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Figure Settings\n", - "import ipywidgets as widgets # interactive display\n", - "\n", - "%config InlineBackend.figure_format = 'retina'\n", - "plt.style.use(\n", - " \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Video 1: Exploring other applications\n", - "\n", - "from ipywidgets import widgets\n", - "from IPython.display import YouTubeVideo\n", - "from IPython.display import IFrame\n", - "from IPython.display import display\n", - "\n", - "\n", - "class PlayVideo(IFrame):\n", - " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", - " self.id = id\n", - " if source == 'Bilibili':\n", - " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", - " elif source == 'Osf':\n", - " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", - " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", - "\n", - "\n", - "def display_videos(video_ids, W=400, H=300, fs=1):\n", - " tab_contents = []\n", - " for i, video_id in enumerate(video_ids):\n", - " out = widgets.Output()\n", - " with out:\n", - " if video_ids[i][0] == 'Youtube':\n", - " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", - " height=H, fs=fs, rel=0)\n", - " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", - " else:\n", - " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", - " height=H, fs=fs, autoplay=False)\n", - " if video_ids[i][0] == 'Bilibili':\n", - " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", - " elif video_ids[i][0] == 'Osf':\n", - " print(f'Video available at https://osf.io/{video.id}')\n", - " display(video)\n", - " tab_contents.append(out)\n", - " return tab_contents\n", - "\n", - "video_ids = [('Youtube', 'QwVQrXeZEqM'), ('Bilibili', ''), ('Osf', '')]\n", - "tab_contents = display_videos(video_ids, W=854, H=480)\n", - "tabs = widgets.Tab()\n", - "tabs.children = tab_contents\n", - "for i in range(len(tab_contents)):\n", - " tabs.set_title(i, video_ids[i][0])\n", - "display(tabs)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Tutorial slides\n", - "\n", - "# @markdown\n", - "from ipywidgets import widgets\n", - "from IPython.display import IFrame\n", - "\n", - "link_id = \"ezvn8\"\n", - "\n", - "download_link = f\"https://osf.io/download/{link_id}/\"\n", - "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", - "# @markdown\n", - "out = widgets.Output()\n", - "with out:\n", - " print(f\"If you want to download the slides: {download_link}\")\n", - " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", - "display(out)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Section 1: Exploring other applications" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "As discussed in the video, the objective of this tutorial is to help you to explore and think critically about different climate-related applications, frame problems in data science terms, and consider the potential impact of machine learning solutions in the real world. By the end of this tutorial, participants should have a better understanding of how to identify relevant problems and applications and consider the ethical and practical implications of using machine learning in a given domain.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "\n", - "\n", - "## Section 1.1: Finding Other Applications\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now that you know the basics of how machine learning tools can be applied to climate-related data, in this tutorial, you will explore more climate-related problems and think about how you would approach them using machine learning tools. Specifically, go to the Climate Change AI summaries page () and scroll to the Societal Impacts section. As a group, pick a topic you would like to discuss further and read the section on it." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.2: Questions to Consider" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Think about the example applications you just read about and reflect on these questions as a group.\n", - "- What kind of data would a machine learning algorithm need to train on for this application? What kind of domain experts would you want to interact with when building a model for this application?\n", - "- What type of generalization would you want to test for? How would you do so?\n", - "- Who would be most impacted by the use of this model in the real world? Who would be held accountable for the impacts from the model's use?" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "\n", - "# Summary\n", - "In this tutorial, we explored the importance of exploring more applications, framing problems in data science terms, and considering impact. We encourage you to continue exploring applications and framing problems in data science terms. Remember to consider the ethical implications of using applications and ensure that the models are appropriately and fairly integrated with stakeholders.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Resources\n", - "\n", - "* Climate change AI [wiki](https://wiki.climatechange.ai/wiki/Buildings_and_Cities).\n", - "\n", - "* If you want to gain more skills in building machine learning models, check out the Neuromatch [Deep Learning Course](https://deeplearning.neuromatch.io/tutorials/intro.html)" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Tutorial6", - "provenance": [], - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.19" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tutorials/W2D4_AIandClimateChange/student/W2D5_DaySummary.ipynb b/tutorials/W2D4_AIandClimateChange/student/W2D5_DaySummary.ipynb deleted file mode 100644 index cf3e5fe89..000000000 --- a/tutorials/W2D4_AIandClimateChange/student/W2D5_DaySummary.ipynb +++ /dev/null @@ -1,41 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "760f5fe4", - "metadata": {}, - "source": [ - "# Day Summary" - ] - }, - { - "cell_type": "markdown", - "id": "cbf64b85", - "metadata": {}, - "source": [ - "In this day, you learned how to explore data, ways to build and train regression models, the basics of random forest models and artificial neural networks, methods for assessing feature importance, and the nuances of how to evaluate a model's performance. " - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.8" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/tutorials/W2D4_AIandClimateChange/student/W2D5_Intro.ipynb b/tutorials/W2D4_AIandClimateChange/student/W2D5_Intro.ipynb deleted file mode 100644 index 82846bd89..000000000 --- a/tutorials/W2D4_AIandClimateChange/student/W2D5_Intro.ipynb +++ /dev/null @@ -1,187 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "8b35183e", - "metadata": {}, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ClimateMatchAcademy/course-content/blob/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/student/W2D5_Intro.ipynb)   \"Open\n" - ] - }, - { - "cell_type": "markdown", - "id": "ce6deecc", - "metadata": { - "execution": {} - }, - "source": [ - "# Intro\n" - ] - }, - { - "cell_type": "markdown", - "id": "652bed63", - "metadata": { - "execution": {} - }, - "source": [ - "## Overview\n" - ] - }, - { - "cell_type": "markdown", - "id": "d3c95a91", - "metadata": { - "execution": {} - }, - "source": [ - "Today's materials will provide an overview of data science and machine learning and how these topics can be applied to topics related to climate science and climate change. Particularly, we will explore two real world data sets that represent the impact of climate change on health and agriculture and learn how to model machine learning models that can predict output values and categorize data.\n" - ] - }, - { - "cell_type": "markdown", - "id": "e16287a1", - "metadata": { - "execution": {} - }, - "source": [ - "## Video 1: Climate Change Impacts on the SDGs and the Role of AI\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "9e032d93", - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @markdown\n", - "\n", - "from ipywidgets import widgets\n", - "from IPython.display import YouTubeVideo\n", - "from IPython.display import IFrame\n", - "from IPython.display import display\n", - "\n", - "\n", - "class PlayVideo(IFrame):\n", - " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", - " self.id = id\n", - " if source == \"Bilibili\":\n", - " src = f\"https://player.bilibili.com/player.html?bvid={id}&page={page}\"\n", - " elif source == \"Osf\":\n", - " src = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render\"\n", - " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", - "\n", - "\n", - "def display_videos(video_ids, W=400, H=300, fs=1):\n", - " tab_contents = []\n", - " for i, video_id in enumerate(video_ids):\n", - " out = widgets.Output()\n", - " with out:\n", - " if video_ids[i][0] == \"Youtube\":\n", - " video = YouTubeVideo(\n", - " id=video_ids[i][1], width=W, height=H, fs=fs, rel=0\n", - " )\n", - " print(f\"Video available at https://youtube.com/watch?v={video.id}\")\n", - " else:\n", - " video = PlayVideo(\n", - " id=video_ids[i][1],\n", - " source=video_ids[i][0],\n", - " width=W,\n", - " height=H,\n", - " fs=fs,\n", - " autoplay=False,\n", - " )\n", - " if video_ids[i][0] == \"Bilibili\":\n", - " print(\n", - " f\"Video available at https://www.bilibili.com/video/{video.id}\"\n", - " )\n", - " elif video_ids[i][0] == \"Osf\":\n", - " print(f\"Video available at https://osf.io/{video.id}\")\n", - " display(video)\n", - " tab_contents.append(out)\n", - " return tab_contents\n", - "\n", - "\n", - "video_ids = [(\"Youtube\", \"cEBv2yhKrtk\"), (\"Bilibili\", \"BV1Du41157AM\")]\n", - "tab_contents = display_videos(video_ids, W=730, H=410)\n", - "tabs = widgets.Tab()\n", - "tabs.children = tab_contents\n", - "for i in range(len(tab_contents)):\n", - " tabs.set_title(i, video_ids[i][0])\n", - "display(tabs)" - ] - }, - { - "cell_type": "markdown", - "id": "3dfb0d40", - "metadata": { - "execution": {} - }, - "source": [ - "## Slides\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "ccaea697", - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @markdown\n", - "from ipywidgets import widgets\n", - "from IPython.display import IFrame\n", - "\n", - "link_id = \"rqst6\"\n", - "\n", - "download_link = f\"https://osf.io/download/{link_id}/\"\n", - "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", - "# @markdown\n", - "out = widgets.Output()\n", - "with out:\n", - " print(f\"If you want to download the slides: {download_link}\")\n", - " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", - "display(out)\n" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Intro", - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.8" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/tutorials/W2D4_AIandClimateChange/student/W2D5_Outro.ipynb b/tutorials/W2D4_AIandClimateChange/student/W2D5_Outro.ipynb deleted file mode 100644 index de88cde04..000000000 --- a/tutorials/W2D4_AIandClimateChange/student/W2D5_Outro.ipynb +++ /dev/null @@ -1,56 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "cbe66527", - "metadata": { - "execution": {} - }, - "source": [ - "# Outro" - ] - }, - { - "cell_type": "markdown", - "id": "223a5d62", - "metadata": { - "execution": {} - }, - "source": [ - "The tools learned on this day are widely applicable to many climate topics, some of which were already covered in this course but also many more topic areas of importance to climate change mitigation and adaptation. As discussed, however, care most always be taken when interpreting and using machine learning models in the real world." - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Outro", - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.12" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial1.ipynb b/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial1.ipynb deleted file mode 100644 index fbdb5a08a..000000000 --- a/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial1.ipynb +++ /dev/null @@ -1,789 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/W2D5_Tutorial1.ipynb)   \"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial 1: ClimateBench Dataset and How Machine Learning Can Help\n", - "\n", - "**Week 2, Day 5, AI and Climate Change**\n", - "\n", - "__Content creators:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Content reviewers:__ Mujeeb Abdulfatai, Nkongho Ayuketang Arreyndip, Jeffrey N. A. Aryee, Paul Heubel, Jenna Pearson, Abel Shibu\n", - "\n", - "__Content editors:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Production editors:__ Konstantine Tsafatinos\n", - "\n", - "**Our 2024 Sponsors:** CMIP, NFDI4Earth" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "editable": true, - "execution": {}, - "slideshow": { - "slide_type": "" - }, - "tags": [] - }, - "source": [ - "# Tutorial Objectives\n", - "\n", - "*Estimated timing of tutorial:* 25 minutes\n", - "\n", - "Today, you will work on a total of 6 short tutorials. In Tutorial 1, you delve into the fundamentals, including discussions on climate model emulators and the ClimateBench dataset. You gain insights into Earth System Models (ESMs) and Shared Socioeconomic Pathways (SSPs), alongside practical visualization techniques for ClimateBench features. Tutorial 2 expands on these foundations, exploring decision trees, hyperparameters, and random forest models. You learn to evaluate regression models, focusing on the coefficient of determination (R$^2$), and gain hands-on experience implementing models using `scikit-learn`. Tutorial 3 shifts focus to mitigating overfitting in machine learning models. Here, you learn the importance of model generalization and acquire practical skills for splitting data into training and test sets. In Tutorial 4, you refine your understanding of model robustness, with emphasis on within-distribution generalization and testing model performance on similar data. Tutorial 5 challenges you to test our models on various types of out-of-distribution data, while also exploring the role of climate model emulators in climate science research. Finally, Tutorial 6 concludes the series by discussing practical applications of AI and machine learning in addressing climate change-related challenges, and introducing available resources and tools in the field of climate change AI.\n", - "\n", - "In this tutorial, you will\n", - "* Learn about the basics of data science and machine learning.\n", - "* Define “climate model emulators”.\n", - "* Introduce the ClimateBench dataset.\n", - "* Visualize features from this dataset.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {}, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# imports\n", - "import matplotlib.pyplot as plt # For plotting graphs\n", - "import pandas as pd # For data manipulation\n", - "import xarray as xr # For multidimensional data manipulation\n", - "import seaborn as sns # For advanced visualizations\n", - "import cartopy.crs as ccrs # for geospatial visualizations" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Figure Settings\n", - "import ipywidgets as widgets # interactive display\n", - "\n", - "%config InlineBackend.figure_format = 'retina'\n", - "plt.style.use(\n", - " \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Set random seed\n", - "\n", - "# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n", - "\n", - "# Call `set_seed` function in the exercises to ensure reproducibility.\n", - "import random\n", - "import numpy as np\n", - "\n", - "def set_seed(seed=None):\n", - " if seed is None:\n", - " seed = np.random.choice(2 ** 32)\n", - " random.seed(seed)\n", - " np.random.seed(seed)\n", - " print(f'Random seed {seed} has been set.')\n", - "\n", - "# Set a global seed value for reproducibility\n", - "random_state = 42 # change 42 with any number you like\n", - "\n", - "set_seed(seed=random_state)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Video 1: Machine Learning on ClimateBench data\n", - "\n", - "from ipywidgets import widgets\n", - "from IPython.display import YouTubeVideo\n", - "from IPython.display import IFrame\n", - "from IPython.display import display\n", - "\n", - "\n", - "class PlayVideo(IFrame):\n", - " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", - " self.id = id\n", - " if source == 'Bilibili':\n", - " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", - " elif source == 'Osf':\n", - " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", - " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", - "\n", - "\n", - "def display_videos(video_ids, W=400, H=300, fs=1):\n", - " tab_contents = []\n", - " for i, video_id in enumerate(video_ids):\n", - " out = widgets.Output()\n", - " with out:\n", - " if video_ids[i][0] == 'Youtube':\n", - " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", - " height=H, fs=fs, rel=0)\n", - " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", - " else:\n", - " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", - " height=H, fs=fs, autoplay=False)\n", - " if video_ids[i][0] == 'Bilibili':\n", - " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", - " elif video_ids[i][0] == 'Osf':\n", - " print(f'Video available at https://osf.io/{video.id}')\n", - " display(video)\n", - " tab_contents.append(out)\n", - " return tab_contents\n", - "\n", - "# curriculum or production team will provide these ids\n", - "video_ids = [('Youtube', 'k1jrcheoWP8'), ('Bilibili', ''), ('Osf', '')]\n", - "tab_contents = display_videos(video_ids, W=854, H=480)\n", - "tabs = widgets.Tab()\n", - "tabs.children = tab_contents\n", - "for i in range(len(tab_contents)):\n", - " tabs.set_title(i, video_ids[i][0])\n", - "display(tabs)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "editable": true, - "execution": {}, - "slideshow": { - "slide_type": "" - }, - "tags": [] - }, - "outputs": [], - "source": [ - "# @title Tutorial slides\n", - "\n", - "# @markdown\n", - "from ipywidgets import widgets\n", - "from IPython.display import IFrame\n", - "\n", - "link_id = \"4k3jd\"\n", - "\n", - "download_link = f\"https://osf.io/download/{link_id}/\"\n", - "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", - "# @markdown\n", - "out = widgets.Output()\n", - "with out:\n", - " print(f\"If you want to download the slides: {download_link}\")\n", - " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", - "display(out)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Section 1: ClimateBench Dataset and How Machine Learning Can Help\n", - "\n", - "Section Objectives:\n", - "* Understand how machine learning can be helpful generally\n", - "* Understand the climate model data we will be working with\n", - "* Understand the concept of a climate model emulator\n", - "* Learn how to explore the dataset\n", - "\n", - "\n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.1: About the ClimateBench dataset" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "The ClimateBench dataset offers a comprehensive collection of hypothetical climate data derived from sophisticated computer simulations (specifically, the NorESM2 model, available via CIMP6). It includes information on key climate variables such as temperature, precipitation, and diurnal temperature range. These values are collected by running simulations that represent the different Shared Socioeconomic Pathways (SSPs). Each pathway is associated with a different projected emissions profile over time. This data thus provides insights into how these climate variables may change in the future due to different emission scenarios. By utilizing this dataset, researchers can develop predictive models to better understand and anticipate the impacts of climate change, ultimately aiding in the development of effective mitigation strategies. Specifically, this data set is well-formatted for training *machine learning models*, which is exactly what you will do here.\n", - "\n", - "A brief overview of the ClimateBench dataset is provided below; for additional details, please refer to the full paper -\n", - "\n", - "[ClimateBench v1.0: A Benchmark for Data-Driven Climate Projections](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Spatial Resolution:\n", - "The simulations are conducted on a grid with a spatial resolution of approximately 2°, allowing for analysis of regional climate patterns and phenomena." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Variables:\n", - "The dataset includes four main variables defined for each point on the grid:\n", - "1. **Temperature (TAS)**: Represents the annual mean surface air temperature.\n", - "2. **Diurnal Temperature Range (DTR)**: Reflects the difference between the maximum and minimum temperatures within a day averaged annually.\n", - "3. **Precipitation (PR)**: Indicates the annual total precipitation.\n", - "4. **90th Percentile of Precipitation (PR90)**: Captures extreme precipitation events by identifying the 90th percentile of daily precipitation values. \n", - " " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### ScenarioMIP Simulations:\n", - "The dataset incorporates ScenarioMIP simulations, exploring various future emission pathways under different socio-economic scenarios. Each scenario is defined by a set of annual emissions values over future years. We will look at 5 different scenarios in total here." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Emissions Inputs:\n", - "Emissions scenarios are defined according to the following four types of emissions:\n", - "- Carbon dioxide (CO2) concentrations.\n", - "- Methane (CH4) concentrations.\n", - "- Sulfur dioxide (SO2) emissions, a precursor to sulfate aerosols.\n", - "- Black carbon (BC) emissions.\n", - "\n", - "Note: In the ClimateBench dataset, sulfur dioxide and black carbon emissions are provided as a spatial map over grid locations, but we will just look at global totals here." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Model Specifications:\n", - "- Simulation Model: the NorESM2 model is run in its low atmosphere-medium ocean resolution (LM) configuration.\n", - "- Model Components: Fully coupled earth system including the atmosphere, land, ocean, ice, and biogeochemistry components.\n", - "- Ensemble Averaging: Target variables are averaged over three ensemble members to mitigate internal variability contributions.\n", - "\n", - "By leveraging the ClimateBench dataset, researchers gain insights into climate dynamics, enabling the development and evaluation of predictive models crucial for understanding and addressing climate change challenges." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "

W2D5_Tutorial1_climatebench_Scenario

" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "For simplicity's sake, we'll utilize a **condensed version of the ClimateBench dataset**. As mentioned above, we will be looking at only 5 scenarios ('SSPs', listed above as \"experiments\"), and all emissions will be given as global annual averages for the years 2015 to 2050. Furthermore, we will include climate variables for each spatial location (as defined by latitude and longitude for a restricted region) for the year 2015. The target for our model prediction will be temperature in the year 2050 for each spatial location." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.2: Load the Dataset (Condensed Version)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "We will use `pandas` to interact with the data, which is shared in the `.csv` format. First, let us load the environmental data into a pandas dataframe and print its contents." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "#Load Dataset\n", - "url_Climatebench_train_val = \"https://osf.io/y2pq7/download\"\n", - "training_data = pd.read_csv(url_Climatebench_train_val)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.3: Explore Data Structure" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Next, we will quickly explore the size of the data, check for missing data, and understand column names" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "print(training_data.shape)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "This tells us we have 3240 rows and 152 columns.\n", - "\n", - "Let's look at what these rows and columns mean:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "training_data" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Each row represents a combination of spatial location and scenario. The scenario can be found in the 'scenario' column while the location is given in the 'lat' and 'lon' columns. Climate variables for 2015 are given in the following columns and tas_FINAL represents the temperature in 2050. After these columns, we get the annual global emissions values for each of the 4 emissions types included in ClimateBench, starting in 2015 and ending in 2050." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "**Handle Missing Values (if necessary)**:\n", - "\n", - "We cannot train a machine learning model if there are values missing anywhere in this dataset. Therefore, we will check for missing values using `training_data.isnull().sum()`, which sums the number of 'null' or missing values. \n", - "If missing values exist, we can consider imputation techniques (e.g., [`fillna`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html), [`interpolate`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html)) based on the nature of the data and the specific column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "training_data.isnull().sum()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Here, there are no missing values as the sum of all [`isnull()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isnull.html) values is zero for all columns. So we are good to go!" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.4: Visualize the data\n", - "In this section, we'll utilize visualization techniques to explore the dataset, uncovering underlying patterns and distributions of the variables. Visualizations are instrumental in making informed decisions and conducting comprehensive data analysis." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "**Spatial Distribution of Temperature and Precipitation:** \n", - "Plotting the spatial distribution of temperature can reveal geographical patterns and hotspots. We will use the temperature at 2015, the starting point of our simulation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# Create a xarray dataset from the pandas dataframe\n", - "# for convenient plotting with cartopy afterwards\n", - "ds = xr.Dataset({'tas_2015': ('points', training_data['tas_2015'])},\n", - " coords={'lon': ('points', training_data['lon']),\n", - " 'lat': ('points', training_data['lat'])}\n", - " )\n", - "ds" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# create geoaxes\n", - "ax = plt.axes(projection=ccrs.PlateCarree())\n", - "\n", - "# add coastlines\n", - "ax.coastlines()\n", - "\n", - "# plot the data\n", - "p = ax.scatter(ds['lon'], ds['lat'], c=ds['tas_2015'], cmap='coolwarm', transform=ccrs.PlateCarree())\n", - "\n", - "# add a colorbar\n", - "cbar = plt.colorbar(p, orientation='vertical')\n", - "cbar.set_label('Temperature (K)')\n", - "\n", - "# add a grid and labels\n", - "ax.gridlines(draw_labels={\"bottom\": \"x\", \"left\": \"y\"})\n", - "\n", - "# add title\n", - "plt.title('Spatial Distribution of\\nAnnual Mean Temperature anomalies (2015)\\n')\n", - "\n", - "# add a caption with adjusted y-coordinate to create space\n", - "caption_text = 'The anomalies are calculated by subtracting the annual means of the pre-industrial scenario from \\nthe annual means of the respective scenario.'\n", - "plt.figtext(0.5, -0.03, caption_text, ha='center', fontsize=10) # Adjusted y-coordinate to create space" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "We can see there are clear spatial variations in 2015 temperatures. Note the range of latitude and longitude values, this dataset does not cover the entire globe. In fact, it covers roughly the geographical region represented below:\n", - "\n", - "

W2D5_Tutorial1_map

\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now use the same plotting code to make a plot of the spatial distribution of total precipitation:" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Coding Exercise 1.4: Plotting Spatial Distribution of Total Precipitation\n", - "\n", - "In this exercise, you will complete the code to plot the spatial distribution of total precipitation. Use the provided plotting code as a template and replace the ellipses with appropriate values.\n", - "\n", - "*Note that you have the necessary libraries already imported* (`xarray`, `matplotlib.pyplot`, `cartopy.crs` *and* `pandas`)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "def plot_spatial_distribution(data, col_name, c_label):\n", - " \"\"\"\n", - " Plot the spatial distribution of a variable of interest.\n", - "\n", - " Args:\n", - " data (DataFrame): DataFrame containing latitude, longitude, and data of interest.\n", - " col_name (str): Name of the column containing data of interest.\n", - " c_label (str): Label to describe quantity and unit for the colorbar labeling.\n", - "\n", - " Returns:\n", - " None\n", - " \"\"\"\n", - " # create a xarray dataset from the pandas dataframe\n", - " # for convenient plotting with cartopy afterwards\n", - " ds = xr.Dataset({col_name: ('points', data[col_name])},\n", - " coords={'lon': ('points', data['lon']),\n", - " 'lat': ('points', data['lat'])}\n", - " )\n", - "\n", - " # create geoaxes\n", - " ax = plt.axes(projection=ccrs.PlateCarree())\n", - "\n", - " # add coastlines\n", - " ax.coastlines()\n", - "\n", - " # plot the data\n", - " p = ax.scatter(..., ... ,... , cmap='coolwarm', transform=ccrs.PlateCarree())\n", - "\n", - " # add a colorbar\n", - " cbar = plt.colorbar(p, orientation='vertical')\n", - " cbar.set_label(c_label)\n", - "\n", - " # add a grid and labels\n", - " ax.gridlines(draw_labels={\"bottom\": \"x\", \"left\": \"y\"})\n", - "\n", - " # add title\n", - " plt.title('Spatial Distribution of\\n Annual Mean Anomalies\\n')\n", - " plt.show()\n", - "\n", - "# test your function along precipitation data\n", - "_ = ..." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/solutions/W2D5_Tutorial1_Solution_3f5694a6.py)\n", - "\n", - "*Example output:*\n", - "\n", - "Solution hint\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "**Time Series Plot of Emissions Scenarios:**\n", - "\n", - "\n", - "We will plot the time series of each of the four emissions scenarios in this dataset (we will get to the fifth one later). Each row in the dataset with the same 'scenario' label has the same emissions values over time. So we will only use the data from the first spatial location for each scenario." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Run this cell to plot the Time Series Plot of Emissions Scenarios:\n", - "# Don't worry about understanding this code! It's to set up the plot.\n", - "\n", - "# Set Seaborn style\n", - "sns.set_style(\"whitegrid\")\n", - "\n", - "# Extract emissions data for each scenario\n", - "CO2_data = training_data.filter(regex='CO2_\\d+')\n", - "SO2_data = training_data.filter(regex='SO2_\\d+')\n", - "CH4_data = training_data.filter(regex='CH4_\\d+')\n", - "BC_data = training_data.filter(regex='BC_\\d+')\n", - "\n", - "# Define the four scenarios\n", - "scenarios = ['ssp585', 'ssp370-lowNTCF','ssp126', 'ssp370',]\n", - "\n", - "# Create subplots for each emission gas\n", - "fig, axs = plt.subplots(4, 1, figsize=(8, 15), sharex=True)\n", - "\n", - "# Define units for each emission\n", - "units = {'CO2': 'GtCO2', 'CH4': 'GtCH4 / year', 'SO2': 'TgSO2 / year', 'BC': 'TgBC / year'}\n", - "\n", - "# Plot emissions data for each emission gas with enhanced styling\n", - "for i, (data, emission) in enumerate(zip([CO2_data, CH4_data, SO2_data,BC_data], ['CO2', 'CH4', 'SO2','BC'])):\n", - " # Plot each scenario for the current emission gas\n", - " for scenario in scenarios:\n", - " scenario_data = data[training_data['scenario'] == scenario]\n", - " axs[i].plot(range(2015, 2051), scenario_data.mean(axis=0), label=scenario)\n", - "\n", - " # Set ylabel and title for the current emission gas\n", - " axs[i].set_ylabel(f'{emission} Emissions ({units[emission]})', fontsize=12)\n", - " axs[i].set_title(f'{emission} Emissions', fontsize=14)\n", - " axs[i].legend()\n", - "\n", - "# Set common xlabel\n", - "plt.xlabel('Time (years)')\n", - "\n", - "# Adjust layout\n", - "plt.tight_layout()\n", - "\n", - "# Show legends\n", - "plt.legend()\n", - "\n", - "# Remove spines from all subplots\n", - "for ax in axs:\n", - " ax.spines['top'].set_visible(False)\n", - " ax.spines['right'].set_visible(False)\n", - "\n", - "# Customize ticks\n", - "plt.xticks()\n", - "plt.yticks()\n", - "\n", - "# Show the plot\n", - "plt.grid(True, linestyle='--')\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "This last plot displays the global mean emissions contained in the ClimateBench dataset over the years 2015 to 2050 for four atmospheric constituents that are important for defining the forcing (cumulative anthropogenic carbon dioxide CO$_2$, methane CH$_4$, sulfur dioxide SO$_2$, black carbon BC). Each line represents a different emission scenario, which shows us trends and variations in emissions over time. The 'ssp370-lowNTCF' refers to a variation of the ssp370 scenario which includes lower emissions of near-term climate forcers (NTCFs) such as aerosol (but not methane). \n", - "These emission scenarios are used in the following tutorials as features/predictors for our prediction of the temperature in 2050.\n", - "\n", - "All time series are derived from NorESM2 ScenarioMIP simulations available. Please read the paper of [Watson-Parris et al. (2022)](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) for a more detailed explanation of the ClimateBench dataset." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Summary\n", - "\n", - "In this tutorial, you acquainted yourself with the ClimateBench dataset and explored how machine learning contributes to climate analysis. We defined the versatility of machine learning and its role in predicting climate variables. By delving into the ClimateBench dataset, we highlight its accessibility in providing climate model data. We emphasize the importance of data visualization and engage in practical exercises to explore the dataset.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Resources" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "* [ClimateBench v1.0: A Benchmark for Data-Driven Climate Projections](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) " - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Tutorial1", - "provenance": [], - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.19" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial2.ipynb b/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial2.ipynb deleted file mode 100644 index 0958f37b6..000000000 --- a/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial2.ipynb +++ /dev/null @@ -1,843 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/W2D5_Tutorial2.ipynb)   \"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial 2: Building and Training Random Forest Models\n", - "\n", - "**Week 2, Day 5, AI and Climate Change**\n", - "\n", - "__Content creators:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Content reviewers:__ Mujeeb Abdulfatai, Nkongho Ayuketang Arreyndip, Jeffrey N. A. Aryee, Paul Heubel, Jenna Pearson, Abel Shibu\n", - "\n", - "__Content editors:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Production editors:__ Konstantine Tsafatinos\n", - "\n", - "**Our 2024 Sponsors:** CMIP, NFDI4Earth" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial Objectives\n", - "\n", - "*Estimated timing of tutorial:* 35 minutes\n", - "\n", - "In this tutorial, you will \n", - "* Learn about decision trees and hyperparameters\n", - "* Learn about random forest models\n", - "* Understand how regression models are evaluated (R$^2$)\n", - "* Familiarize yourself with the scikit-learn package\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {}, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# imports\n", - "import matplotlib.pyplot as plt # For plotting graphs\n", - "import pandas as pd # For data manipulation\n", - "import ipywidgets as widgets # interactive display\n", - "from sklearn.ensemble import RandomForestRegressor # For Random Forest Regression\n", - "from sklearn.tree import DecisionTreeRegressor # For Decision Tree Regression\n", - "from sklearn.tree import plot_tree # For plotting decision trees" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Figure Settings\n", - "import ipywidgets as widgets # interactive display\n", - "\n", - "%config InlineBackend.figure_format = 'retina'\n", - "plt.style.use(\n", - " \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Set random seed\n", - "\n", - "# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n", - "\n", - "# Call `set_seed` function in the exercises to ensure reproducibility.\n", - "import random\n", - "import numpy as np\n", - "\n", - "def set_seed(seed=None):\n", - " if seed is None:\n", - " seed = np.random.choice(2 ** 32)\n", - " random.seed(seed)\n", - " np.random.seed(seed)\n", - " print(f'Random seed {seed} has been set.')\n", - "\n", - "# Set a global seed value for reproducibility\n", - "random_state = 42 # change 42 with any number you like\n", - "\n", - "set_seed(seed=random_state)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Video 1: Building and training Random Forest Models\n", - "\n", - "from ipywidgets import widgets\n", - "from IPython.display import YouTubeVideo\n", - "from IPython.display import IFrame\n", - "from IPython.display import display\n", - "\n", - "\n", - "class PlayVideo(IFrame):\n", - " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", - " self.id = id\n", - " if source == 'Bilibili':\n", - " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", - " elif source == 'Osf':\n", - " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", - " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", - "\n", - "\n", - "def display_videos(video_ids, W=400, H=300, fs=1):\n", - " tab_contents = []\n", - " for i, video_id in enumerate(video_ids):\n", - " out = widgets.Output()\n", - " with out:\n", - " if video_ids[i][0] == 'Youtube':\n", - " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", - " height=H, fs=fs, rel=0)\n", - " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", - " else:\n", - " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", - " height=H, fs=fs, autoplay=False)\n", - " if video_ids[i][0] == 'Bilibili':\n", - " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", - " elif video_ids[i][0] == 'Osf':\n", - " print(f'Video available at https://osf.io/{video.id}')\n", - " display(video)\n", - " tab_contents.append(out)\n", - " return tab_contents\n", - "\n", - "# curriculum or production team will provide these ids\n", - "video_ids = [('Youtube', 'st_1ygEGQTQ'), ('Bilibili', ''), ('Osf', '')]\n", - "tab_contents = display_videos(video_ids, W=854, H=480)\n", - "tabs = widgets.Tab()\n", - "tabs.children = tab_contents\n", - "for i in range(len(tab_contents)):\n", - " tabs.set_title(i, video_ids[i][0])\n", - "display(tabs)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Tutorial slides\n", - "\n", - "# @markdown\n", - "from ipywidgets import widgets\n", - "from IPython.display import IFrame\n", - "\n", - "link_id = \"kyv6w\"\n", - "\n", - "download_link = f\"https://osf.io/download/{link_id}/\"\n", - "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", - "# @markdown\n", - "out = widgets.Output()\n", - "with out:\n", - " print(f\"If you want to download the slides: {download_link}\")\n", - " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", - "display(out)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Section 1: Preparing the Data for Model Training" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "In this video, we learned about:\n", - "\n", - "1. Using regression for prediction tasks, like the one we have.\n", - "2. The conceptual understanding of decision trees and their regression capabilities.\n", - "3. Random forests as an ensemble of decision trees.\n", - "4. Training our model\n", - "4. Measuring model performance.\n", - "5. Utilizing the scikit-learn toolbox for regression tasks.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.1: Loading the data\n", - "\n", - "Remember from the previous tutorial how we loaded the `training_data`?\n", - "Let's again load the data here for this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "#Load Dataset\n", - "url_Climatebench_train_val = \"https://osf.io/y2pq7/download\"\n", - "training_data = pd.read_csv(url_Climatebench_train_val)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Next, we will prepare the data to train a model to predict temperature anomalies in 2050. Let's also remind ourselves of what the data contains:" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.2: Preparing the data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# Check column names (assuming a pandas DataFrame)\n", - "print(\"Column names:\")\n", - "print(training_data.columns.tolist()) # List all column names" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "First, we will drop the `scenario` column from the data as it is just a label, but will not be passed into the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "training_data.pop('scenario')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "As we can see, scenario is no longer in the dataset:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "print(\"Column names:\")\n", - "print(training_data.columns.tolist()) # List all column names" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Next, we need to pull out our target variable (that is, the variable we want our model to predict). Here that is `tas_FINAL`, the temperature anomaly in 2050. The anomalies in every case are calculated by subtracting the annual means of the pre-industrial scenario from the annual means of the respective scenario of interest." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "target = training_data.pop('tas_FINAL')\n", - "target" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "*Note: we will need to repeat these preprocessing steps anytime we load this (or other) data.*" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Section 2: Fit Decision Tree and Random Forest" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now we can train our models. As mentioned in the video, Decision Trees and Random Forest Models can both do regression. Specifically:\n", - "\n", - "***Decision Tree Regression***: \n", - "* Decision trees recursively partition the feature space into regions based on feature values to predict the target variable.\n", - "* Each leaf node represents a prediction.\n", - "* Single trees can be prone to capturing noise in the data (not what we want!). \n", - "\n", - "***Random Forest Regression***: \n", - "* An ensemble method that combines multiple decision trees to improve predictive performance.\n", - "* Each tree is trained on a random subset of the data.\n", - "* Aggregates predictions of individual trees to improve performance.\n", - "* Typically more robust/doesn't capture noise.\n", - "\n", - "We will see an example of both here.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "First, let's train a single decision tree to try to predict 2050 temperature anomalies using 2015 temperature anomalies and emissions data. We can control the depth of our decision tree (which is the maximum number of splits performed), which we will set to 20 here." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 2.0: Scikit-learn\n", - "\n", - "In this and coming sub-sections, we will utilize [Scikit-learn](https://scikit-learn.org/stable/), commonly referred to as `sklearn`, a renowned Python library extensively employed for machine learning endeavors. It provides a comprehensive array of functions and tools tailored for various machine learning tasks. Specifically, we will concentrate on the [`DecisionTreeRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html) and [`RandomForestRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) modules offered by Scikit-learn." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 2.1: Training the Decision Tree and Analyzing the Results" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# instantiate the model:\n", - "dt_regressor = DecisionTreeRegressor(random_state=random_state,max_depth=20)\n", - "\n", - "# fit/train the model with the data:\n", - "dt_regressor.fit(training_data, target) #pass in the model inputs and the target it should predict" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "We've trained our first model! Now let's see how well it performs. As discussed in the video, we will use the coefficient of determination (also known as the R-squared value, $R^2$) as the measure of how well the model is doing.\n", - "\n", - "We can get this value by calling the `score` function and providing the data we want the score calculated on. Here we will evaluate the model on the same data it was trained on." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "
\n", - " Learn more about the R-Squared value and Coefficient of determination \n", - "\n", - "\n", - " The **R-squared** value indicates the proportion of the variance in the target variable that is predicted from the model.\n", - "\n", - "Specifically, the ***coefficient of determination*** is calculated using the formula:\n", - "\n", - "$$\n", - "\\color{#3182CE}{R^2} = 1 - \\frac{\\color{#DC3912}{SS_{\\text{residual}}}}{\\color{#FF9900}{SS_{\\text{total}}}}\n", - "$$\n", - "\n", - "where:\n", - "- $\\color{#FF9900}{SS_{\\text{total}}}$ represents the total sum of squares, calculated as the sum of squared differences between the target variable $\\color{#2CA02C}{y}$ and its mean $\\color{#2CA02C}{\\bar{y}}$:\n", - "\n", - "$$\n", - "\\color{#FF9900}{SS_{\\text{total}}} = \\sum_{i=1}^{n} (\\color{#2CA02C}{y_i} - \\color{#2CA02C}{\\bar{y}})^2\n", - "$$\n", - "\n", - "- $\\color{#DC3912}{SS_{\\text{residual}}}$ denotes the residual sum of squares, computed as the sum of squared differences between the observed target values $\\color{#2CA02C}{y}$ and the predicted values $\\color{#FF5733}{\\hat{y}}$ provided by the model:\n", - "\n", - "$$\n", - "\\color{#DC3912}{SS_{\\text{residual}}} = \\sum_{i=1}^{n} (\\color{#2CA02C}{y_i} - \\color{#FF5733}{\\hat{y}_i})^2\n", - "$$\n", - "\n", - "The $\\color{#3182CE}{R^2}$ score thus quantifies the proportion of variance in the target variable that is predictable from the independent variables in the model.\n", - "\n", - "This value ranges from 0 to 1, where 1 indicates a perfect fit, meaning the model explains all the variability in the target variable.\n", - "
\n", - "\n", - "---" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "dt_regressor.score(training_data, target)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "\n", - "Now, let's create a scatter plot to compare the true temperature anomaly values in 2050 to those predicted by the model:\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Scatter Plot: Predicted vs. True Temperatures for Decision Tree\n", - "\n", - "# Get predicted values\n", - "predicted = dt_regressor.predict(training_data)\n", - "\n", - "# Create scatter plot\n", - "plt.scatter(predicted, target, color='b', label='Comparison of Predicted and True Temperatures')\n", - "plt.plot([0, 4], [0, 4], color='r', label='Ideal Line') # Add a diagonal line for reference\n", - "plt.xlabel('Predicted Temperatures (K)')\n", - "plt.ylabel('True Temperatures (K)')\n", - "plt.title('Annual mean temperature anomaly', fontsize=14)\n", - "\n", - "# Add a caption with adjusted y-coordinate to create space\n", - "caption_text = 'The anomalies are calculated by subtracting the annual means of the pre-industrial scenario from \\nthe annual means of the respective scenario.'\n", - "plt.figtext(0.5, -0.03, caption_text, ha='center', fontsize=10) # Adjusted y-coordinate to create space\n", - "\n", - "plt.legend()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "
\n", - " What can we conclude from this score and the scatter plot?
\n", - "First, pause and think by yourself. Then, compare it with the information provided here:\n", - "
\n", - "\n", - "As we can see, the model achieves a high score of ~0.9984 on the training data. This indicates that the model can explain approximately 99.84% of the variance in the target variable based on the features in the training dataset. Such a high score suggests that the model fits the training data very well and can effectively capture the underlying patterns or relationships between the features and the target variable. We can see the close alignment between the true value and the value predicted by the model in the plot.\n", - "\n", - "However, it's essential to note that achieving a high score on the training data does not guarantee the model's performance on unseen data (i.e., the test or validation datasets). We will explore this more in the next tutorial.\n", - "
\n", - "\n", - "---" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Interactive Demo 2.1: Variation in Performance with depth | Visualizing Decision Trees and Scatter plot\n", - "\n", - "In this interactive demo, we'll visualize decision trees using a widget. This widget enables interactive exploration of decision trees by adjusting two parameters: \n", - "`max_depth` controls the tree's complexity during training, while `dt_vis_depth` determines the depth of the tree to visualize. It dynamically trains a decision tree regressor based on `max_depth`, evaluates its performance with a scatter plot, and visualizes the tree structure up to `dt_vis_depth` using the plot_tree function. \n", - "This allows users to balance model complexity and interpretability, gaining insights into how different depths affect predictive accuracy and tree structure." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @markdown Make sure you execute this cell to enable the widget!\n", - "# Don't worry about understanding this code! It's to set up an interactive plot.\n", - "\n", - "# Function to train decision tree and display scatter plot\n", - "def train_and_plot(max_depth, visualize_depth):\n", - " global dt_regressor, training_data\n", - "\n", - " # Instantiate and train the decision tree regressor\n", - " dt_regressor = DecisionTreeRegressor(max_depth=max_depth)\n", - " dt_regressor.fit(training_data, target)\n", - "\n", - " # Calculate and print the score\n", - " score = dt_regressor.score(training_data, target)\n", - " print(f\"Model Score: {score}\")\n", - " print(f\"Please wait for ~{visualize_depth+visualize_depth/2} sec for the figure to render\")\n", - " # Generate scatter plot: Predicted vs. True Temperatures\n", - " predicted = dt_regressor.predict(training_data)\n", - " fig, axes = plt.subplots(1, 2, figsize=(15+pow(1.3,visualize_depth), 6+pow(1.2,visualize_depth)), gridspec_kw={'width_ratios': [1, 1+visualize_depth/4]})\n", - "\n", - " # Scatter plot\n", - " axes[0].scatter(predicted, target, color='blue', alpha=0.7, label='Comparison of Predicted and True Temperatures', edgecolors='black')\n", - " axes[0].plot([min(target), max(target)], [min(target), max(target)], color='red', linestyle='--', label='Ideal Prediction Line')\n", - " axes[0].set_xlabel('Predicted Temperature (K)', fontsize=12)\n", - " axes[0].set_ylabel('True Temperature (K)', fontsize=12)\n", - " axes[0].set_title('Annual mean temperature anomaly', fontsize=14)\n", - " axes[0].legend()\n", - " axes[0].grid(True)\n", - "\n", - " # Decision tree visualization\n", - " plot_tree(dt_regressor, feature_names=training_data.columns, filled=True, fontsize=8, max_depth=visualize_depth, ax=axes[1])\n", - " axes[1].set_title(f'Decision Tree Visualization (Train_max_depth = {max_depth}, dt_visualize_depth = {visualize_depth})')\n", - "\n", - " plt.tight_layout()\n", - " plt.show()\n", - "\n", - "# Interactive widget to control max_depth\n", - "@widgets.interact(max_depth=(1, 31, 1), dt_vis_depth=(1, 10, 1))\n", - "def visualize_tree_with_max_depth(max_depth=20, dt_vis_depth=3):\n", - " train_and_plot(max_depth, dt_vis_depth)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Interactive Demo 2.1 Discussion\n", - "\n", - "1. How does changing the max_depth parameter affect the decision tree's predictive accuracy and complexity? \n", - "\n", - "2. What insights can be gained by visualizing the decision tree at different depths (dt_vis_depth)?\n", - "\n", - "3. What patterns or trends do you observe in the residuals (differences between predicted and true temperatures) on the scatter plot? How can these insights guide adjustments to improve the model's predictive accuracy?" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/solutions/W2D5_Tutorial2_Solution_054dc038.py)\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 2.2: Training the Random forest and Analyzing the Results" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now we will train an ensemble of decisions trees, known as a random forest. For this we can use the built-in `RandomForestRegressor` from the [sklearn.ensemble.RandomForestRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html), which we have already imported." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "rf_regressor = RandomForestRegressor(random_state=random_state)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "The line of code creates a random forest regressor object named `rf_regressor`. This regressor is configured to use a specified `random_state` parameter, ensuring that the random number generation process within the algorithm is consistent across different runs. This helps maintain reproducibility in our experiments and ensures consistent results." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now you will train the model on our data and calculate its score on the same data. Create a plot like the one above in order to visually inspect its performance" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Coding Exercise 2.2: Model Training and Performance Visualization of Ranodm Forest\n", - "\n", - "In this exercise, you will train a random forest regressor model on your data and evaluate its performance by calculating its score on the same data. Additionally, you will create a scatter plot to visually inspect its performance." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "def fit_and_visualize_rf(training_data, target):\n", - " \"\"\"Fit a random forest regressor to the training data and visualize the results.\n", - "\n", - " Args:\n", - " training_data (array-like): Input data for training the model.\n", - " target (array-like): Target variable for training the model.\n", - "\n", - " Returns:\n", - " None\n", - " \"\"\"\n", - " #################################################\n", - " ## TODO for students: Fit the random forest regressor and visualize the results ##\n", - " # Remove the following line of code once you have completed the exercise:\n", - " raise NotImplementedError(\"Student exercise: Fit the random forest regressor and visualize the results.\")\n", - " #################################################\n", - "\n", - " # fit the random forest regressor to the training data\n", - " _ = ...\n", - "\n", - " # print the R-squared score of the model\n", - " print('...')\n", - "\n", - " # predict the target variable using the trained model\n", - " predicted = rf_regressor.predict(training_data)\n", - "\n", - " # Create scatter plot\n", - " plt.scatter(predicted,target,color='b',label='Comparison of Predicted and True Temperatures')\n", - " plt.plot([0,4],[0,4],color='r', label='Ideal Line') # add a diagonal line for reference\n", - " plt.xlabel('Predicted Temperatures (K)')\n", - " plt.ylabel('True Temperatures (K)')\n", - " plt.legend()\n", - " plt.title('Annual mean temperature anomaly')\n", - " # add a caption with adjusted y-coordinate to create space\n", - " caption_text = 'The anomalies are calculated by subtracting the annual means of the pre-industrial scenario from \\nthe annual means of the respective scenario.'\n", - " plt.figtext(0.5, -0.03, caption_text, ha='center', fontsize=10) # adjusted y-coordinate to create space\n", - "\n", - "# test your function\n", - "_ = fit_and_visualize_rf(training_data, target)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/solutions/W2D5_Tutorial2_Solution_cef4c6f7.py)\n", - "\n", - "*Example output:*\n", - "\n", - "Solution hint\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "It seems like our models are performing very well! Let's think a bit more in the next tutorial about what else we should do to evaluate our models...\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Summary\n", - "\n", - "Estimated timing of tutorial: 35 minutes\n", - "\n", - "In this tutorial, we delved into Random Forest Models and their application in climate prediction. We gained an understanding of regression and how Random Forests combine decision trees to improve predictive accuracy. Through practical exercises, we learned how to evaluate model performance and implement Random Forests using tools like scikit-learn.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Resources" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "* [ClimateBench v1.0: A Benchmark for Data-Driven Climate Projections](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) " - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Tutorial2", - "provenance": [], - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.19" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial3.ipynb b/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial3.ipynb deleted file mode 100644 index 33d5516e7..000000000 --- a/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial3.ipynb +++ /dev/null @@ -1,612 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/W2D5_Tutorial3.ipynb)   \"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial 3: Testing Model Generalization\n", - "\n", - "**Week 2, Day 5, AI and Climate Change**\n", - "\n", - "__Content creators:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Content reviewers:__ Mujeeb Abdulfatai, Nkongho Ayuketang Arreyndip, Jeffrey N. A. Aryee, Paul Heubel, Jenna Pearson, Abel Shibu\n", - "\n", - "__Content editors:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Production editors:__ Konstantine Tsafatinos\n", - "\n", - "**Our 2024 Sponsors:** CMIP, NFDI4Earth" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial Objectives\n", - "\n", - "*Estimated timing of tutorial:* 25 minutes\n", - "\n", - "In this tutorial, you will\n", - "* Understand the problem of overfitting\n", - "* Understand generalization\n", - "* Learn to split data into train and test data\n", - "* Evaluate trained models on held-out test data\n", - "* Think about the relationship between model capacity and overfitting\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {}, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# imports:\n", - "\n", - "import pandas as pd # For data manipulation\n", - "from sklearn.model_selection import train_test_split # For splitting dataset into train and test sets\n", - "from sklearn.ensemble import RandomForestRegressor # For Random Forest Regression\n", - "from sklearn.tree import DecisionTreeRegressor # For Decision Tree Regression" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Figure Settings\n", - "import ipywidgets as widgets # interactive display\n", - "import matplotlib.pyplot as plt\n", - "\n", - "%config InlineBackend.figure_format = 'retina'\n", - "plt.style.use(\n", - " \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Set random seed\n", - "\n", - "# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n", - "\n", - "# Call `set_seed` function in the exercises to ensure reproducibility.\n", - "import random\n", - "import numpy as np\n", - "\n", - "def set_seed(seed=None):\n", - " if seed is None:\n", - " seed = np.random.choice(2 ** 32)\n", - " random.seed(seed)\n", - " np.random.seed(seed)\n", - " print(f'Random seed {seed} has been set.')\n", - "\n", - "# Set a global seed value for reproducibility\n", - "random_state = 42 # change 42 with any number you like\n", - "\n", - "set_seed(seed=random_state)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Video 1: Testing model generalization\n", - "\n", - "from ipywidgets import widgets\n", - "from IPython.display import YouTubeVideo\n", - "from IPython.display import IFrame\n", - "from IPython.display import display\n", - "\n", - "\n", - "class PlayVideo(IFrame):\n", - " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", - " self.id = id\n", - " if source == 'Bilibili':\n", - " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", - " elif source == 'Osf':\n", - " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", - " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", - "\n", - "\n", - "def display_videos(video_ids, W=400, H=300, fs=1):\n", - " tab_contents = []\n", - " for i, video_id in enumerate(video_ids):\n", - " out = widgets.Output()\n", - " with out:\n", - " if video_ids[i][0] == 'Youtube':\n", - " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", - " height=H, fs=fs, rel=0)\n", - " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", - " else:\n", - " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", - " height=H, fs=fs, autoplay=False)\n", - " if video_ids[i][0] == 'Bilibili':\n", - " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", - " elif video_ids[i][0] == 'Osf':\n", - " print(f'Video available at https://osf.io/{video.id}')\n", - " display(video)\n", - " tab_contents.append(out)\n", - " return tab_contents\n", - "\n", - "\n", - "video_ids = [('Youtube', 'gPM64fog-dc'), ('Bilibili', ''), ('Osf', '')]\n", - "tab_contents = display_videos(video_ids, W=854, H=480)\n", - "tabs = widgets.Tab()\n", - "tabs.children = tab_contents\n", - "for i in range(len(tab_contents)):\n", - " tabs.set_title(i, video_ids[i][0])\n", - "display(tabs)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Tutorial slides\n", - "\n", - "# @markdown\n", - "from ipywidgets import widgets\n", - "from IPython.display import IFrame\n", - "\n", - "link_id = \"t48yb\"\n", - "\n", - "download_link = f\"https://osf.io/download/{link_id}/\"\n", - "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", - "# @markdown\n", - "out = widgets.Output()\n", - "with out:\n", - " print(f\"If you want to download the slides: {download_link}\")\n", - " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", - "display(out)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Section 1: Model generalization" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "As discussed in the video, machine learning models can *overfit*. This means they essentially memorize the data points they were trained on. This makes them perform very well on those data points, but when they are presented with data they weren't trained on their predictions are not very good. Therefore, we need to evaluate our models according to how well they perform on data they weren't trained on.\n", - "\n", - "To do this, we will split the data into training and testing sets. The training set will be used to train the model, while the testing set will be used to evaluate how well the model performs on unseen data. This helps us ensure that our model can generalize well to new data and avoid overfitting.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.1: Load and Prepare the Data\n", - "\n", - "As we've learned in the previous tutorial, here we load our dataset and prepare it by removing unnecessary columns and extracting the target variable `tas_FINAL`, representing temperature anomalies in 2050. The anomalies in every case are calculated by subtracting the annual means of the pre-industrial scenario from the annual means of the respective scenario of interest." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# Load and Prepare the Data\n", - "url_Climatebench_train_val = \"https://osf.io/y2pq7/download\" # Dataset URL\n", - "training_data = pd.read_csv(url_Climatebench_train_val) # Load the training data from the provided URL\n", - "training_data.pop('scenario') # drop the `scenario` column from the data as it is just a label, but will not be passed into the model.\n", - "target = training_data.pop('tas_FINAL') # Extract the target variable 'tas_FINAL' which we aim to predict" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.2: Data Splitting for Training and Testing\n", - "\n", - "Now, our primary objective is to prepare our dataset for model training and evaluation. To achieve this, we'll utilize the `train_test_split` function from Scikit-learn, which conveniently splits our dataset into training and testing subsets.\n", - "\n", - "To facilitate this process, we've imported the essential `train_test_split` function from Scikit-learn earlier in the code:\n", - "\n", - "```python\n", - "from sklearn.model_selection import train_test_split \n", - "```\n", - "\n", - "Our strategy involves randomly allocating 20% of the data for testing purposes, while reserving the remaining 80% for model training. This ensures that our model is evaluated on unseen data, which is crucial for assessing its real-world performance.\n", - "\n", - "With this function ready to use, let's seamlessly proceed to split our dataset and go ahead on the journey of model training and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# Split the data into training and testing sets\n", - "X_train, X_test, y_train, y_test = train_test_split(\n", - " training_data, target, test_size=0.2, random_state=1\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "We now have separated the input features (now called `X`) and the target variable (now called `y`) into a training set (`X_train`, `y_train`) and a test set (`X_test`, `y_test`)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.3: Train a decision tree model on the training data and evaluate it\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# Training the model on the training data\n", - "dt_regressor = DecisionTreeRegressor(random_state=random_state,max_depth=20)\n", - "dt_regressor.fit(X_train, y_train)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now we will evaluate the model on both the training and test data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "print('Performance on training data:', dt_regressor.score(X_train, y_train))\n", - "print('Performance on test data :', dt_regressor.score(X_test, y_test))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "We can see here that our model is overfitting: it is performing much better on the data it was trained on than on held-out test data." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.4: Train a random forest model on the testing data and evaluate it\n", - "\n", - "Use what you know to train a random forest model on the training data and evaluate it on both the training and test data.\n", - "We have already imported `RandomForestRegressor` in Setup section via\n", - "```python\n", - "from sklearn.ensemble import RandomForestRegressor \n", - "```\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "def train_random_forest_model(X_train, y_train, X_test, y_test, random_state):\n", - " \"\"\"Train a Random Forest model and evaluate its performance.\n", - "\n", - " Args:\n", - " X_train (ndarray): Training features.\n", - " y_train (ndarray): Training labels.\n", - " X_test (ndarray): Test features.\n", - " y_test (ndarray): Test labels.\n", - " random_state (int): Random seed for reproducibility.\n", - "\n", - " Returns:\n", - " RandomForestRegressor: Trained Random Forest regressor model.\n", - " \"\"\"\n", - " #################################################\n", - " ## TODO for students: Train a random forest model on the testing data and evaluate it ##\n", - " # Implement training a RandomForestRegressor model using X_train and y_train\n", - " # Then, evaluate its performance on both training and test data using .score() method\n", - " # Print out the performance on training and test data\n", - " # Please remove the following line of code once you have completed the exercise:\n", - " raise NotImplementedError(\"Student exercise: Implement the training and evaluation process.\")\n", - " #################################################\n", - "\n", - " # Train the model on the training data\n", - " rf_regressor = RandomForestRegressor(random_state=random_state)\n", - "\n", - " # fit the model\n", - " _ = rf_regressor.fit(..., ...)\n", - "\n", - " print('Performance on training data :', rf_regressor.score(..., y_train))\n", - " print('Performance on test data :', rf_regressor.score(X_test, ...))\n", - "\n", - " return rf_regressor\n", - "\n", - "# test the function\n", - "rf_model = ..." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/solutions/W2D5_Tutorial3_Solution_f952faa5.py)\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Question 1.4: Overfitting - Decision Tree vs Random Forest\n", - "\n", - "1. Does the random forest model overfit less than a single decision tree?\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/solutions/W2D5_Tutorial3_Solution_91ef3636.py)\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.5: Explore Parameters of the Random Forest Model\n", - "\n", - "In the previous tutorial, you saw how we can control the depth of a single decision tree. \n", - "We can also control the depth of the decision trees used in our random forest model by passing a `max_depth` argument. We can also control the number of trees in the random forest model by setting `n_estimator`.\n", - "\n", - "Intuitively, these variables control the *capacity* of the model. Capacity loosely refers to the number of trainable parameters in the model. The more trees and the deeper they are, the more free parameters the model has to capture the training data. If the model has too low of capacity, it won't be powerful enough to capture complex relationships between the input features and the target variable. If it has too many parameters that it can move around, however, it may end up memorizing every single training point and therefore overfit.\n", - "\n", - "Use the sliders below to experiment with different values of `n_estimator` and `max_depth` and see how they impact performance on training and test data." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Interactive Demo 1.5: Performance of the Random Forest Regression\n", - "In this activity, you can adjust the sliders for `n_estimators` and `max_depth` to observe their effect on model performance:\n", - "\n", - "* `n_estimators`: Controls the number of trees in the Random Forest. \n", - "* `max_depth`: Sets the maximum depth of each tree. \n", - "After adjusting the sliders, the code fits a new Random Forest model and prints the training and testing scores, showing how changes in these parameters impact model performance." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Use the slider to change the values of 'n_estimators' and 'max_depth' and observe the effect on performance.\n", - "# @markdown Make sure you execute this cell to enable the widget!\n", - "\n", - "# Function to train random forest and display scatter plot\n", - "def train_rf_and_plot(X_tr, y_train, X_test, y_test, max_depth, n_estim):\n", - " global rf_regressor, X_train\n", - "\n", - " # Instantiate and train the decision tree regressor\n", - " rf_regressor = RandomForestRegressor(n_estimators=n_estim, max_depth=max_depth)\n", - " rf_regressor.fit(X_tr, y_train)\n", - "\n", - " # Calculate and print the scores\n", - " score_train = rf_regressor.score(X_tr, y_train)\n", - " score_test = rf_regressor.score(X_test, y_test)\n", - " print(f\"\\n\\tTraining Score: {score_train}\")\n", - " print(f\"\\tTesting Score : {score_test}\\n\")\n", - "\n", - " # Generate scatter plot: Predicted vs. True Temperatures\n", - " predicted = rf_regressor.predict(X_tr)\n", - "\n", - " fig, ax = plt.subplots()\n", - "\n", - " # Scatter plot\n", - " ax.scatter(predicted, y_train, color='blue', alpha=0.7, label='Comparison of Predicted and True Temperatures', edgecolors='black')\n", - " ax.plot([min(y_train), max(y_train)], [min(y_train), max(y_train)], color='red', linestyle='--', label='Ideal Prediction Line')\n", - " ax.set_xlabel('Predicted Temperature (K)')\n", - " ax.set_ylabel('True Temperature (K)')\n", - " ax.set_title('Annual mean temperature anomaly')\n", - " # add a caption\n", - " caption_text = 'The anomalies are calculated by subtracting the annual means of the pre-industrial scenario from \\nthe annual means of the respective scenario.'\n", - " plt.figtext(0.5, -0.03, caption_text, ha='center', fontsize=10) # Adjusted y-coordinate to create space\n", - " ax.legend()\n", - " ax.grid(True)\n", - "\n", - " plt.tight_layout()\n", - " plt.show()\n", - "\n", - "\n", - "# Interactive widget to control max_depth and n_estimators\n", - "@widgets.interact(max_depth=(1, 41, 1), n_estimators=(10,100,5))\n", - "def visualize_scores_with_max_depth(max_depth=20, n_estimators=50):\n", - " train_rf_and_plot(X_train, y_train, X_test, y_test, max_depth, n_estimators)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Interactive Demo 1.5: Discussion\n", - "\n", - "1. Did you observe any trends in how the performance changes? \n", - "2. Try to explain in you own words the concepts of capacity and overfitting and how they relate.\n", - "3. In addition to model capacity, what else could be changed to prevent overfitting?" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/solutions/W2D5_Tutorial3_Solution_47df169a.py)\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Summary\n", - "\n", - "In this tutorial, we delved into the importance of training and testing sets in constructing robust machine learning models. Understanding the concept of overfitting and the necessity of using separate test sets for model assessment were pivotal. Through practical exercises, we acquired hands-on proficiency in data partitioning, model training, and performance evaluation.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Resources\n", - "\n", - "* [ClimateBench v1.0: A Benchmark for Data-Driven Climate Projections](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) \n" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Tutorial3", - "provenance": [], - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.19" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial4.ipynb b/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial4.ipynb deleted file mode 100644 index 2d9b935e2..000000000 --- a/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial4.ipynb +++ /dev/null @@ -1,635 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/W2D5_Tutorial4.ipynb)   \"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial 4: Testing Spatial Generalization\n", - "\n", - "**Week 2, Day 5, AI and Climate Change**\n", - "\n", - "__Content creators:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Content reviewers:__ Mujeeb Abdulfatai, Nkongho Ayuketang Arreyndip, Jeffrey N. A. Aryee, Paul Heubel, Jenna Pearson, Abel Shibu\n", - "\n", - "__Content editors:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Production editors:__ Konstantine Tsafatinos\n", - "\n", - "**Our 2024 Sponsors:** CMIP, NFDI4Earth" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial Objectives\n", - "\n", - "*Estimated timing of tutorial:* 20 minutes\n", - "\n", - "In this tutorial, you will: \n", - "* Learn the concept of within distribution generalization\n", - "* Test your model’s ability on a certain type of out-of-distribution data\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {}, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# imports:\n", - "\n", - "import matplotlib.pyplot as plt # For plotting graphs\n", - "import pandas as pd # For data manipulation\n", - "import xarray as xr\n", - "import cartopy.crs as ccrs\n", - "import cartopy.feature as cfeature\n", - "\n", - "# import specific machine learning models and tools\n", - "from sklearn.model_selection import train_test_split # For splitting dataset into train and test sets\n", - "from sklearn.ensemble import RandomForestRegressor # For Random Forest Regression" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Figure Settings\n", - "import ipywidgets as widgets # interactive display\n", - "\n", - "%config InlineBackend.figure_format = 'retina'\n", - "plt.style.use(\n", - " \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Helper functions\n", - "\n", - "# Load and Prepare the Data\n", - "url_Climatebench_train_val = \"https://osf.io/y2pq7/download\" # Dataset URL\n", - "training_data = pd.read_csv(url_Climatebench_train_val) # Load the training data from the provided URL\n", - "training_data.pop('scenario') # Drop the 'scenario' column as it's just a label and won't be passed into the model\n", - "target = training_data.pop('tas_FINAL') # Extract the target variable 'tas_FINAL' which we aim to predict\n", - "\n", - "# Split the data into training and testing sets\n", - "X_train, X_test, y_train, y_test = train_test_split(training_data, target, test_size=0.2, random_state=1)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Set random seed\n", - "\n", - "# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n", - "\n", - "# Call `set_seed` function in the exercises to ensure reproducibility.\n", - "import random\n", - "import numpy as np\n", - "\n", - "def set_seed(seed=None):\n", - " if seed is None:\n", - " seed = np.random.choice(2 ** 32)\n", - " random.seed(seed)\n", - " np.random.seed(seed)\n", - " print(f'Random seed {seed} has been set.')\n", - "\n", - "# Set a global seed value for reproducibility\n", - "random_state = 42 # change 42 with any number you like\n", - "\n", - "set_seed(seed=random_state)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Plotting functions\n", - "# @markdown Run this cell to define plotting function we will be using in this code\n", - "\n", - "def visualize_decision_tree(X_train, y_train, X_test, y_test, dt_model):\n", - " # Plot decision tree and regression\n", - " plt.figure(figsize=(10, 5))\n", - "\n", - " # Plot Decision Tree\n", - " plt.subplot(1, 2, 1)\n", - " plt.scatter(X_train, y_train, color='blue', label='Training data')\n", - " plt.scatter(X_test, y_test, color='green', label='Test data')\n", - " plt.plot(np.sort(X_test, axis=0), dt_model.predict(np.sort(X_test, axis=0)), color='red', label='Model')\n", - " plt.title('Decision Tree Regression')\n", - " plt.xlabel('Feature')\n", - " plt.ylabel('Target')\n", - " plt.legend()\n", - "\n", - " # Plot Decision Tree\n", - " plt.subplot(1, 2, 2)\n", - " plot_tree(dt_model, filled=True)\n", - " plt.title(\"Decision Tree\")\n", - "\n", - " plt.tight_layout()\n", - " plt.show()\n", - "\n", - "def visualize_random_forest(X_train, y_train, X_test, y_test, rf_model):\n", - " num_trees = len(rf_model.estimators_)\n", - " num_cols = min(3, num_trees)\n", - " num_rows = (num_trees + num_cols - 1) // num_cols\n", - "\n", - " plt.figure(figsize=(15, 6 * num_rows))\n", - "\n", - " # Plot Random Forest Regression\n", - " plt.subplot(num_rows, num_cols, 1)\n", - " plt.scatter(X_train, y_train, color='blue', label='Training data')\n", - " plt.scatter(X_test, y_test, color='green', label='Test data')\n", - " plt.plot(np.sort(X_test, axis=0), rf_model.predict(np.sort(X_test, axis=0)), color='red', label='Model')\n", - " plt.title('Random Forest Regression')\n", - " plt.xlabel('Feature')\n", - " plt.ylabel('Target')\n", - " plt.legend()\n", - "\n", - " # Plot Decision Trees within Random Forest\n", - " for i, tree in enumerate(rf_model.estimators_):\n", - " plt.subplot(num_rows, num_cols, i + 2)\n", - " plot_tree(tree, filled=True)\n", - " plt.title(f\"Tree {i+1}\")\n", - "\n", - " plt.tight_layout()\n", - " plt.show()\n", - "\n", - "def plot_spatial_distribution(data, col_name, c_label):\n", - " \"\"\"\n", - " Plot the spatial distribution of a variable of interest.\n", - "\n", - " Args:\n", - " data (DataFrame): DataFrame containing latitude, longitude, and data of interest.\n", - " col_name (str): Name of the column containing data of interest.\n", - " c_label (str): Label to describe quantity and unit for the colorbar labeling.\n", - "\n", - " Returns:\n", - " None\n", - " \"\"\"\n", - " # create a xarray dataset from the pandas dataframe\n", - " # for convenient plotting with cartopy afterwards\n", - " ds = xr.Dataset({col_name: ('points', data[col_name])},\n", - " coords={'lon': ('points', data['lon']),\n", - " 'lat': ('points', data['lat'])}\n", - " )\n", - "\n", - " # create geoaxes\n", - " ax = plt.axes(projection=ccrs.PlateCarree())\n", - " ax.set_extent([0.95*min(ds.lon.values), 1.05*max(ds.lon.values), 0.95*min(ds.lat.values), 1.05*max(ds.lat.values)])\n", - "\n", - " # add coastlines\n", - " ax.coastlines()\n", - " ax.add_feature(cfeature.OCEAN, alpha=0.1)\n", - " # add state borders\n", - " ax.add_feature(cfeature.BORDERS, edgecolor='darkgrey')\n", - "\n", - " # plot the data\n", - " p = ax.scatter(ds['lon'], ds['lat'], c=ds[col_name], cmap='coolwarm', transform=ccrs.PlateCarree())\n", - "\n", - " # add a colorbar\n", - " cbar = plt.colorbar(p, orientation='vertical')\n", - " cbar.set_label(c_label)\n", - "\n", - " # add a grid and labels\n", - " ax.gridlines(draw_labels={\"bottom\": \"x\", \"left\": \"y\"})\n", - "\n", - " # add title\n", - " plt.title('Spatial Distribution of\\n Annual Mean Anomalies\\n')\n", - " plt.show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Video 1: Testing spatial generalization\n", - "\n", - "from ipywidgets import widgets\n", - "from IPython.display import YouTubeVideo\n", - "from IPython.display import IFrame\n", - "from IPython.display import display\n", - "\n", - "\n", - "class PlayVideo(IFrame):\n", - " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", - " self.id = id\n", - " if source == 'Bilibili':\n", - " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", - " elif source == 'Osf':\n", - " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", - " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", - "\n", - "\n", - "def display_videos(video_ids, W=400, H=300, fs=1):\n", - " tab_contents = []\n", - " for i, video_id in enumerate(video_ids):\n", - " out = widgets.Output()\n", - " with out:\n", - " if video_ids[i][0] == 'Youtube':\n", - " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", - " height=H, fs=fs, rel=0)\n", - " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", - " else:\n", - " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", - " height=H, fs=fs, autoplay=False)\n", - " if video_ids[i][0] == 'Bilibili':\n", - " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", - " elif video_ids[i][0] == 'Osf':\n", - " print(f'Video available at https://osf.io/{video.id}')\n", - " display(video)\n", - " tab_contents.append(out)\n", - " return tab_contents\n", - "\n", - "# curriculum or production team will provide these ids\n", - "video_ids = [('Youtube', 'U8mshdRYwuY'), ('Bilibili', ''), ('Osf', '')]\n", - "tab_contents = display_videos(video_ids, W=854, H=480)\n", - "tabs = widgets.Tab()\n", - "tabs.children = tab_contents\n", - "for i in range(len(tab_contents)):\n", - " tabs.set_title(i, video_ids[i][0])\n", - "display(tabs)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Tutorial slides\n", - "\n", - "# @markdown\n", - "from ipywidgets import widgets\n", - "from IPython.display import IFrame\n", - "\n", - "link_id = \"26r8h\"\n", - "\n", - "download_link = f\"https://osf.io/download/{link_id}/\"\n", - "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", - "# @markdown\n", - "out = widgets.Output()\n", - "with out:\n", - " print(f\"If you want to download the slides: {download_link}\")\n", - " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", - "display(out)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "In the video, we discussed how we previously tested generalization to unseen data points from the same data distribution (i.e., same region and scenarios). \n", - "Now we will see if the model generalizes to data from a new region.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Section 1: Test generalization to held-out spatial locations" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.1: Load the New Testing Data\n", - "\n", - "We will take our random forest model that was trained on data from the region in the blue box and see if it can work well using lat/lon locations that come from the red box. We already have the data from the blue box region loaded, so now we just need to load the data from the red box.\n", - "\n", - "

W2D5_Tutorial4_map

" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# Loading the new Spatial test data\n", - "\n", - "url_spatial_test_data = \"https://osf.io/7tr49/download\" # location of test data\n", - "spatial_test_data = pd.read_csv(url_spatial_test_data) # Load spatial test data from the provided URL\n", - "spatial_test_data.pop('scenario') # drop the `scenario` column from the data as it is just a label, but will not be passed into the model.\n", - "spatial_test_target = spatial_test_data.pop('tas_FINAL') # extract the target variable 'tas_FINAL'\n", - "# display the prepared spatial test data\n", - "spatial_test_data" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "When we plot the temperature distribution over space, we can see that this dataset has a different range of latitude and longitude values than the initial dataset. We use a plotting function `plot_spatial_distribution()` that you completed in Coding Exercise 1.4 of Tutorial 1 that can be found in the *plotting function* of the Setup section." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# plot spatial distribution of temperature anomalies for 2015\n", - "col_name = 'tas_2015'\n", - "c_label = 'Temperature (K) in 2015'\n", - "plot_spatial_distribution(spatial_test_data, col_name, c_label)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.2: Evaluate the model" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "We've been playing around with the random forest model parameters. To make sure we know what model we are evaluating, let's train it again here on the training data specifically with `n_estimators = 80` and `max_depth = 50`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "rf_regressor = RandomForestRegressor(random_state=42, n_estimators=80, max_depth=50)\n", - "# Train the model on the training data\n", - "rf_regressor.fit(X_train, y_train)\n", - "train_score = rf_regressor.score(X_train,y_train)\n", - "test_score = rf_regressor.score(X_test,y_test)\n", - "print( \"Training Set Score : \", train_score)\n", - "print( \" Test Set Score : \", test_score)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now that the model has been trained on data from the blue box region, let's test how well it performs on data from the red box region" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "spatial_test_score = rf_regressor.score(spatial_test_data,spatial_test_target)\n", - "print( \"Spatial Test Data Score : \", spatial_test_score)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now it is your turn: Make a scatter plot of the predicted vs true 2050 temperature values for this data, like you did in the last tutorials." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Coding Exercise 1.2: Scatter Plot for Spatial data\n", - "\n", - "In this exercise implement the `scatter_plot_predicted_vs_true()` function to evaluate the performance of a pre-trained Random Forest regressor model on a new emissions scenario and create a scatter plot of predicted vs. true temperature values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "def scatter_plot_predicted_vs_true(spatial_test_data, true_values):\n", - " \"\"\"Create a scatter plot of predicted vs true temperature values.\n", - "\n", - " Args:\n", - " spatial_test_data: Test features.\n", - " true_values (ndarray): True temperature values.\n", - "\n", - " Returns:\n", - " None\n", - " \"\"\"\n", - "\n", - " # make predictions using the random forest regressor\n", - " spatial_test_predicted = rf_regressor.predict(spatial_test_data)\n", - "\n", - " spatial_test_score = rf_regressor.score(spatial_test_data, true_values)\n", - " print(\"\\nSpatial Test Data Score:\", spatial_test_score)\n", - "\n", - " # implement plt.scatter() to compare predicted and true temperature values\n", - " _ = ...\n", - " # implement plt.plot() to plot the diagonal line y=x\n", - " _ = ...\n", - "\n", - " # aesthetics\n", - " plt.xlabel('Predicted Temperatures (K)')\n", - " plt.ylabel('True Temperatures (K)')\n", - " plt.title('Annual mean temperature anomaly')\n", - "\n", - " # add a caption with adjusted y-coordinate to create space\n", - " caption_text = 'The anomalies are calculated by subtracting the annual means of the pre-industrial scenario from \\nthe annual means of the respective scenario.'\n", - " plt.figtext(0.5, -0.03, caption_text, ha='center', fontsize=10) # Adjusted y-coordinate to create space\n", - " plt.legend(loc='upper left')\n", - " plt.show()\n", - "\n", - "# test your function\n", - "_ = scatter_plot_predicted_vs_true(spatial_test_data,spatial_test_target)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/solutions/W2D5_Tutorial4_Solution_a66d4b87.py)\n", - "\n", - "*Example output:*\n", - "\n", - "Solution hint\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Question 1.2: Performance of the model for new spatial location data\n", - "\n", - "1. Have you observed the decrease in score? \n", - "2. What do you believe could be the cause of this? \n", - "3. What do you think would happen if the model was tested on an even farther away region, for example, in North America?" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/solutions/W2D5_Tutorial4_Solution_00900b53.py)\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Summary\n", - "\n", - "In this tutorial, you investigated the generalization capacity of machine learning models to novel geographical regions. The process involved assessing model performance on spatial datasets from diverse locations, shedding light on the model's adaptability across varying environmental contexts.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Resources" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "* [ClimateBench v1.0: A Benchmark for Data-Driven Climate Projections](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) \n" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Tutorial4", - "provenance": [], - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.19" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial5.ipynb b/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial5.ipynb deleted file mode 100644 index 8e5576043..000000000 --- a/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial5.ipynb +++ /dev/null @@ -1,700 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/W2D5_Tutorial5.ipynb)   \"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial 5: Testing generalization to new scenarios\n", - "\n", - "**Week 2, Day 5, AI and Climate Change**\n", - "\n", - "__Content creators:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Content reviewers:__ Mujeeb Abdulfatai, Nkongho Ayuketang Arreyndip, Jeffrey N. A. Aryee, Paul Heubel, Jenna Pearson, Abel Shibu\n", - "\n", - "__Content editors:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Production editors:__ Konstantine Tsafatinos\n", - "\n", - "**Our 2024 Sponsors:** CMIP, NFDI4Earth" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial Objectives\n", - "\n", - "*Estimated timing of tutorial:* 20 minutes\n", - "\n", - "In this tutorial, you will\n", - "* Learn about a different type of out-of-distribution test of our model\n", - "* Evaluate the model's performance\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {}, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# imports\n", - "\n", - "import matplotlib.pyplot as plt # For plotting graphs\n", - "import pandas as pd # For data manipulation\n", - "# # Import specific machine learning models and tools\n", - "from sklearn.model_selection import train_test_split # For splitting dataset into train and test sets\n", - "from sklearn.ensemble import RandomForestRegressor # For Random Forest Regression" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Figure Settings\n", - "import ipywidgets as widgets # interactive display\n", - "\n", - "%config InlineBackend.figure_format = 'retina'\n", - "plt.style.use(\n", - " \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Helper functions\n", - "\n", - "# Load and Prepare the Data\n", - "url_Climatebench_train_val = \"https://osf.io/y2pq7/download\" # Dataset URL\n", - "training_data = pd.read_csv(url_Climatebench_train_val) # load the training data from the provided URL\n", - "training_data.pop('scenario') # drop the 'scenario' column as it's just a label and won't be passed into the model\n", - "target = training_data.pop('tas_FINAL') # extract the target variable 'tas_FINAL' which we aim to predict\n", - "\n", - "url_spatial_test_data = \"https://osf.io/7tr49/download\" # test data with different location\n", - "spatial_test_data = pd.read_csv(url_spatial_test_data) # load spatial test data from the provided URL\n", - "spatial_test_data.pop('scenario') # drop the `scenario` column from the data as it is just a label, but will not be passed into the model.\n", - "spatial_test_target = spatial_test_data.pop('tas_FINAL') # extract the target variable 'tas_FINAL'\n", - "\n", - "# Split the data into training and testing sets: 80%/20%\n", - "X_train, X_test, y_train, y_test = train_test_split(training_data, target, test_size=0.2, random_state=1)\n", - "\n", - "# Training the model on the training data\n", - "rf_regressor = RandomForestRegressor(random_state=42, n_estimators=80, max_depth=50)\n", - "rf_regressor.fit(X_train, y_train)\n", - "\n", - "spatial_test_score = rf_regressor.score(spatial_test_data,spatial_test_target)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Set random seed\n", - "\n", - "# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n", - "\n", - "# Call `set_seed` function in the exercises to ensure reproducibility.\n", - "import random\n", - "import numpy as np\n", - "\n", - "def set_seed(seed=None):\n", - " if seed is None:\n", - " seed = np.random.choice(2 ** 32)\n", - " random.seed(seed)\n", - " np.random.seed(seed)\n", - " print(f'Random seed {seed} has been set.')\n", - "\n", - "# Set a global seed value for reproducibility\n", - "random_state = 42 # change 42 with any number you like\n", - "\n", - "set_seed(seed=random_state)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Video 1: Testing generalization to new scenarios\n", - "\n", - "from ipywidgets import widgets\n", - "from IPython.display import YouTubeVideo\n", - "from IPython.display import IFrame\n", - "from IPython.display import display\n", - "\n", - "\n", - "class PlayVideo(IFrame):\n", - " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", - " self.id = id\n", - " if source == 'Bilibili':\n", - " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", - " elif source == 'Osf':\n", - " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", - " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", - "\n", - "\n", - "def display_videos(video_ids, W=400, H=300, fs=1):\n", - " tab_contents = []\n", - " for i, video_id in enumerate(video_ids):\n", - " out = widgets.Output()\n", - " with out:\n", - " if video_ids[i][0] == 'Youtube':\n", - " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", - " height=H, fs=fs, rel=0)\n", - " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", - " else:\n", - " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", - " height=H, fs=fs, autoplay=False)\n", - " if video_ids[i][0] == 'Bilibili':\n", - " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", - " elif video_ids[i][0] == 'Osf':\n", - " print(f'Video available at https://osf.io/{video.id}')\n", - " display(video)\n", - " tab_contents.append(out)\n", - " return tab_contents\n", - "\n", - "# curriculum or production team will provide these ids\n", - "video_ids = [('Youtube', 'L860LmyPoSg'), ('Bilibili', ''), ('Osf', '')]\n", - "tab_contents = display_videos(video_ids, W=854, H=480)\n", - "tabs = widgets.Tab()\n", - "tabs.children = tab_contents\n", - "for i in range(len(tab_contents)):\n", - " tabs.set_title(i, video_ids[i][0])\n", - "display(tabs)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Video Summary : \n", - "* Discussed how we previously tested generalization to an unseen region. \n", - "* Stressed that the real utility of these emulators is the ability to run new scenarios. \n", - "* Now we will see if the model generalizes to data from a new scenario.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Tutorial slides\n", - "\n", - "# @markdown\n", - "from ipywidgets import widgets\n", - "from IPython.display import IFrame\n", - "\n", - "link_id = \"2rq8x\"\n", - "\n", - "download_link = f\"https://osf.io/download/{link_id}/\"\n", - "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", - "# @markdown\n", - "out = widgets.Output()\n", - "with out:\n", - " print(f\"If you want to download the slides: {download_link}\")\n", - " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", - "display(out)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Section 1: Test Generalization to Held-out Emissions Scenario" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.1: Load the New Testing (Scenario) Data\n", - "Load the new dataset and print it. As you can see, the scenario for all of these datapoints is ssp245. This scenario was not included in our initial data set. According to the scenario descriptions included in the table in Tutorial 1, ssp245 represent a \"medium forcing future scenario\". The lat/lon locations are the same as the initial dataset (blue box region)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "url_scenario_test_data = \"https://osf.io/pkbwx/download\" # Dataset URL\n", - "scenario_test_data = pd.read_csv(url_scenario_test_data) # Load scenario test data from the provided URL\n", - "scenario_test_data" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now we will prepare the data to be fed into the pre-trained model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "scenario_test_data.pop('scenario') # remove the 'scenario' column from the dataset\n", - "scenario_test_target = scenario_test_data.pop('tas_FINAL') # extract the target variable 'tas_FINAL'\n", - "scenario_test_data # display the prepared scenario test data" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.2: Evaluate the Model on this New (Scenario) Data" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now let's evaluate our pre-trained model (`rf_regressor`) to see how well it performs on this new emissions scenario. Use what you know to evaluate the performance and make a scatter plot of predicted vs. true temperature values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "def evaluate_and_plot_scenario_performance(rf_regressor, scenario_test_data, scenario_test_target):\n", - " \"\"\"Evaluate the performance of the pre-trained model on the new emissions scenario\n", - " and create a scatter plot of predicted vs. true temperature values.\n", - "\n", - " Args:\n", - " rf_regressor (RandomForestRegressor): Pre-trained Random Forest regressor model.\n", - " scenario_test_data (ndarray): Test features for the new emissions scenario.\n", - " scenario_test_target (ndarray): True temperature values of the new emissions scenario.\n", - "\n", - " Returns:\n", - " float: Score of the model on the scenario test data.\n", - " \"\"\"\n", - "\n", - " # predict temperature values for the new emissions scenario\n", - " scenario_test_predicted = ...\n", - "\n", - " # evaluate the model on the new emissions scenario\n", - " scenario_test_score = ...\n", - " print(\"Scenario Test Score:\", scenario_test_score)\n", - "\n", - " # implement plt.scatter() to compare predicted and true temperature values\n", - " plt.figure()\n", - " _ = ...\n", - " # implement plt.plot() to plot the diagonal line y=x\n", - " _ = ...\n", - "\n", - " # aesthetics\n", - " plt.xlabel('Predicted Temperatures (K)')\n", - " plt.ylabel('True Temperatures (K)')\n", - " plt.title('Annual mean temperature anomaly\\n(New Emissions Scenario)')\n", - " plt.grid(True)\n", - " plt.show()\n", - "\n", - " return scenario_test_score\n", - "\n", - "# test your function\n", - "scenario_test_score = evaluate_and_plot_scenario_performance(rf_regressor, scenario_test_data, scenario_test_target)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/solutions/W2D5_Tutorial5_Solution_467c5f77.py)\n", - "\n", - "*Example output:*\n", - "\n", - "Solution hint\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/solutions/W2D5_Tutorial5_Solution_ecbd72cf.py)\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "### Question 1.2: Performance of the Model on New Scenario Data\n", - "\n", - "1. Again, have you observed a decrease in the score? \n", - "2. What do you believe could be the cause of this? \n", - "3. What kind of new scenarios might the model perform better for?" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "execution": {} - }, - "source": [ - "[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/solutions/W2D5_Tutorial5_Solution_593ef299.py)\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "For the sake of clarity let's summarize all the result." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# summarize results\n", - "train_score = rf_regressor.score(X_train, y_train)\n", - "test_score = rf_regressor.score(X_test, y_test)\n", - "average_score = (train_score + test_score + spatial_test_score + scenario_test_score) / 4\n", - "\n", - "print(f\"\\tTraining Data Score : {train_score}\")\n", - "print(f\"\\tTesting Data Score on same Scenario/Region : {test_score}\")\n", - "print(f\"\\tHeld-out Spatial Region Test Score : {spatial_test_score}\")\n", - "print(f\"\\tHeld-out Scenario Test Score : {scenario_test_score}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "This shows us that the model does generalize somewhat (i.e. the score is well above zero even in the new regions and in the new scenario). However, it does not generalize very well. That is, it does not perform as well on data that differs from the data it was trained on. Ideally, we would be able to build a model that inherently learns the complex relationship between emissions scenarios and future temperature. A model that truly learned this relationship would be able to generalize to new scenarios and regions." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "\n", - "Do you have any ideas of how to build a better machine learning model to emulate climate models? Many scientists are working on this problem!" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Bonus Section 2: Try other Regression Models\n", - "\n", - "*Only complete this section if you are well ahead of schedule, or have already completed the final tutorial.*\n", - "\n", - "Random Forest models are not the only regression models that could be applied to this problem. In this code, we will use scikit-learn to train and evaluate various regression models on the Climate Bench dataset. We will load the data, split it, define models, train them with different settings, and evaluate their performance. We will calculate and print average scores for each model configuration and identify the best-performing model.\n", - "\n", - "For more information about the models used here and various other models, you can refer to [scikit-learn.org/stable/supervised_learning.html#supervised-learning](https://scikit-learn.org/stable/supervised_learning.html#supervised-learning). \n", - "*Note: the following cell may take ~2 minutes to run.*" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {} - }, - "outputs": [], - "source": [ - "# Import necessary libraries\n", - "import matplotlib.pyplot as plt\n", - "from sklearn.pipeline import make_pipeline\n", - "from sklearn.preprocessing import StandardScaler\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, BaggingRegressor\n", - "from sklearn.svm import SVR\n", - "from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet\n", - "from sklearn.linear_model import RidgeCV\n", - "import pandas as pd\n", - "from sklearn.neural_network import MLPRegressor\n", - "\n", - "# Load datasets\n", - "train_val_data = pd.read_csv(\"https://osf.io/y2pq7/download\")\n", - "spatial_test_data = pd.read_csv(\"https://osf.io/7tr49/download\")\n", - "scenario_test_data = pd.read_csv(\"https://osf.io/pkbwx/download\")\n", - "\n", - "# Pop the 'scenario' column from all datasets\n", - "train_val_data.pop('scenario')\n", - "spatial_test_data.pop('scenario')\n", - "scenario_test_data.pop('scenario')\n", - "\n", - "# Split train_val_data into training and testing sets\n", - "X_train, X_test, y_train, y_test = train_test_split(train_val_data.drop(columns=[\"tas_FINAL\"]),\n", - " train_val_data[\"tas_FINAL\"],\n", - " test_size=0.2,\n", - " random_state=1)\n", - "\n", - "# Define models with different configurations\n", - "models = {\n", - " \"MLP\": [make_pipeline(StandardScaler(), MLPRegressor(hidden_layer_sizes=(50,), max_iter=1000)),\n", - " make_pipeline(StandardScaler(), MLPRegressor(hidden_layer_sizes=(500, 500, 500), random_state=1, max_iter=1000))],\n", - " \"RandomForest\": [make_pipeline(StandardScaler(), RandomForestRegressor(n_estimators=100, max_depth=None)),\n", - " make_pipeline(StandardScaler(), RandomForestRegressor(n_estimators=50, max_depth=10))],\n", - " \"GradientBoosting\": [make_pipeline(StandardScaler(), GradientBoostingRegressor(n_estimators=100, max_depth=3)),\n", - " make_pipeline(StandardScaler(), GradientBoostingRegressor(n_estimators=50, max_depth=2))],\n", - " \"BaggingRegressor\": [make_pipeline(StandardScaler(), BaggingRegressor(n_estimators=100)),\n", - " make_pipeline(StandardScaler(), BaggingRegressor(n_estimators=50))],\n", - " \"SVR\": [make_pipeline(StandardScaler(), SVR(kernel=\"linear\")),\n", - " make_pipeline(StandardScaler(), SVR(kernel=\"rbf\"))],\n", - " \"LinearRegression\": [make_pipeline(StandardScaler(), LinearRegression())],\n", - " \"Ridge\": [make_pipeline(StandardScaler(), Ridge())],\n", - " \"RidgeCV\":[RidgeCV(alphas=[167], cv=5)],\n", - " \"Lasso\": [make_pipeline(StandardScaler(), Lasso())],\n", - " \"ElasticNet\": [make_pipeline(StandardScaler(), ElasticNet())]\n", - "}\n", - "\n", - "# Train models and calculate score for each configuration\n", - "results = {}\n", - "for model_name, model_list in models.items():\n", - " model_results = []\n", - " for config_num, model in enumerate(model_list): # Add enumeration for configuration number\n", - " # Train model\n", - " model.fit(X_train, y_train)\n", - "\n", - " # Calculate scores\n", - " train_score = model.score(X_train, y_train)\n", - " test_score = model.score(X_test, y_test)\n", - " spatial_test_score = model.score(spatial_test_data.drop(columns=[\"tas_FINAL\"]), spatial_test_data[\"tas_FINAL\"])\n", - " scenario_test_score = model.score(scenario_test_data.drop(columns=[\"tas_FINAL\"]), scenario_test_data[\"tas_FINAL\"])\n", - "\n", - " # Append results\n", - " model_results.append({\n", - " \"Configuration\": config_num, # Add configuration number\n", - " \"Training Score\": train_score,\n", - " \"Testing Score\": test_score,\n", - " \"Spatial Test Score\": spatial_test_score,\n", - " \"Scenario Test Score\": scenario_test_score\n", - " })\n", - "\n", - " # Calculate average score for the model\n", - " average_score = sum(sum(result.values()) for result in model_results) / (len(model_results) * 4)\n", - "\n", - " # Store results including average score\n", - " results[model_name] = {\"Average Score\": average_score, \"Results\": model_results}\n", - "\n", - "# Print results including average score for each model\n", - "for model_name, model_data in results.items():\n", - " print(f\"Model:\\t{model_name}\")\n", - " print(f\"Average Score:\\t\\t\\t\\t {model_data['Average Score']}\")\n", - " print(\"Configuration-wise Average Scores:\")\n", - " for result in model_data['Results']:\n", - " print(f\"\\nConfiguration {result['Configuration']}: \"\n", - " f\"\\nTraining Score: {result['Training Score']}, \"\n", - " f\"\\nTesting Score: {result['Testing Score']}, \"\n", - " f\"\\nSpatial Test Score: {result['Spatial Test Score']}, \"\n", - " f\"\\nScenario Test Score: {result['Scenario Test Score']}\")\n", - " print()\n", - "\n", - "# Find the best model and its average score\n", - "best_model = max(results, key=lambda x: results[x][\"Average Score\"])\n", - "best_average_score = results[best_model][\"Average Score\"]\n", - "\n", - "# Print the best model and its average score\n", - "print(f\"\\nBest Model: {best_model}, Average Score: {best_average_score}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Let's plot the result. \n", - "*Note: This code will plot the actual score for positive average scores and zero for negative average scores.*" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title\n", - "# @markdown Run this cell to see the plot of results!\n", - "\n", - "import matplotlib.pyplot as plt\n", - "\n", - "# Extract model names and average scores from results\n", - "model_names = list(results.keys())\n", - "average_scores = [results[model_name][\"Average Score\"] for model_name in model_names]\n", - "\n", - "# Adjust scores to plot zero for negative scores\n", - "adjusted_scores = [score if score > 0 else 0 for score in average_scores]\n", - "\n", - "# Plotting\n", - "plt.figure()\n", - "plt.bar(model_names, adjusted_scores, color=['skyblue' if score > 0 else 'lightgray' for score in average_scores])\n", - "plt.xlabel('Model')\n", - "plt.ylabel('Average Score')\n", - "plt.title('Average Score of Different Regression Models')\n", - "plt.xticks(rotation=45, ha='right') # Rotate x-axis labels for better readability\n", - "plt.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "This quick sweep of models suggests Random Forest is a good choice, but recall that most of these models have hyperparameters. Varying these hyperparameters may lead to different results!\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Summary\n", - "\n", - "In this tutorial, we explored how machine learning models adapt to unfamiliar emissions scenarios. Evaluating model performance on datasets representing different emission scenarios provided insights into the models' capabilities in predicting climate variables under diverse environmental conditions.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Resources\n", - "\n", - "* [ClimateBench v1.0: A Benchmark for Data-Driven Climate Projections](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) \n", - "* [Scikit-learn.org, Supervised Learning](https://scikit-learn.org/stable/supervised_learning.html#supervised-learning)" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Tutorial5", - "provenance": [], - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.19" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial6.ipynb b/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial6.ipynb deleted file mode 100644 index 72acc9f10..000000000 --- a/tutorials/W2D4_AIandClimateChange/student/W2D5_Tutorial6.ipynb +++ /dev/null @@ -1,292 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W2D5_ClimateResponse-AdaptationImpact/W2D5_Tutorial6.ipynb)   \"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial 6: Exploring other applications\n", - "\n", - "**Week 2, Day 5, AI and Climate Change**\n", - "\n", - "__Content creators:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Content reviewers:__ Mujeeb Abdulfatai, Nkongho Ayuketang Arreyndip, Jeffrey N. A. Aryee, Paul Heubel, Jenna Pearson, Abel Shibu\n", - "\n", - "__Content editors:__ Deepak Mewada, Grace Lindsay\n", - "\n", - "__Production editors:__ Konstantine Tsafatinos\n", - "\n", - "**Our 2024 Sponsors:** CMIP, NFDI4Earth" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Tutorial Objectives\n", - "\n", - "*Estimated timing of tutorial:* 40 minutes\n", - "\n", - "In this tutorial, you will\n", - "* Discuss the many ways AI/machine learning can be applied to problems related to climate change\n", - "* Learn about resources in this domain\n", - "* Discuss issues when deploying an AI system on real problems\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": {}, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# imports\n", - "import matplotlib.pyplot as plt" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Figure Settings\n", - "import ipywidgets as widgets # interactive display\n", - "\n", - "%config InlineBackend.figure_format = 'retina'\n", - "plt.style.use(\n", - " \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Video 1: Exploring other applications\n", - "\n", - "from ipywidgets import widgets\n", - "from IPython.display import YouTubeVideo\n", - "from IPython.display import IFrame\n", - "from IPython.display import display\n", - "\n", - "\n", - "class PlayVideo(IFrame):\n", - " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", - " self.id = id\n", - " if source == 'Bilibili':\n", - " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", - " elif source == 'Osf':\n", - " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", - " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", - "\n", - "\n", - "def display_videos(video_ids, W=400, H=300, fs=1):\n", - " tab_contents = []\n", - " for i, video_id in enumerate(video_ids):\n", - " out = widgets.Output()\n", - " with out:\n", - " if video_ids[i][0] == 'Youtube':\n", - " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", - " height=H, fs=fs, rel=0)\n", - " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", - " else:\n", - " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", - " height=H, fs=fs, autoplay=False)\n", - " if video_ids[i][0] == 'Bilibili':\n", - " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", - " elif video_ids[i][0] == 'Osf':\n", - " print(f'Video available at https://osf.io/{video.id}')\n", - " display(video)\n", - " tab_contents.append(out)\n", - " return tab_contents\n", - "\n", - "video_ids = [('Youtube', 'QwVQrXeZEqM'), ('Bilibili', ''), ('Osf', '')]\n", - "tab_contents = display_videos(video_ids, W=854, H=480)\n", - "tabs = widgets.Tab()\n", - "tabs.children = tab_contents\n", - "for i in range(len(tab_contents)):\n", - " tabs.set_title(i, video_ids[i][0])\n", - "display(tabs)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "execution": {} - }, - "outputs": [], - "source": [ - "# @title Tutorial slides\n", - "\n", - "# @markdown\n", - "from ipywidgets import widgets\n", - "from IPython.display import IFrame\n", - "\n", - "link_id = \"ezvn8\"\n", - "\n", - "download_link = f\"https://osf.io/download/{link_id}/\"\n", - "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", - "# @markdown\n", - "out = widgets.Output()\n", - "with out:\n", - " print(f\"If you want to download the slides: {download_link}\")\n", - " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", - "display(out)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Section 1: Exploring other applications" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "As discussed in the video, the objective of this tutorial is to help you to explore and think critically about different climate-related applications, frame problems in data science terms, and consider the potential impact of machine learning solutions in the real world. By the end of this tutorial, participants should have a better understanding of how to identify relevant problems and applications and consider the ethical and practical implications of using machine learning in a given domain.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "\n", - "\n", - "## Section 1.1: Finding Other Applications\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Now that you know the basics of how machine learning tools can be applied to climate-related data, in this tutorial, you will explore more climate-related problems and think about how you would approach them using machine learning tools. Specifically, go to the Climate Change AI summaries page () and scroll to the Societal Impacts section. As a group, pick a topic you would like to discuss further and read the section on it." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "## Section 1.2: Questions to Consider" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "Think about the example applications you just read about and reflect on these questions as a group.\n", - "- What kind of data would a machine learning algorithm need to train on for this application? What kind of domain experts would you want to interact with when building a model for this application?\n", - "- What type of generalization would you want to test for? How would you do so?\n", - "- Who would be most impacted by the use of this model in the real world? Who would be held accountable for the impacts from the model's use?" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "\n", - "# Summary\n", - "In this tutorial, we explored the importance of exploring more applications, framing problems in data science terms, and considering impact. We encourage you to continue exploring applications and framing problems in data science terms. Remember to consider the ethical implications of using applications and ensure that the models are appropriately and fairly integrated with stakeholders.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "execution": {} - }, - "source": [ - "# Resources\n", - "\n", - "* Climate change AI [wiki](https://wiki.climatechange.ai/wiki/Buildings_and_Cities).\n", - "\n", - "* If you want to gain more skills in building machine learning models, check out the Neuromatch [Deep Learning Course](https://deeplearning.neuromatch.io/tutorials/intro.html)" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "W2D5_Tutorial6", - "provenance": [], - "toc_visible": true - }, - "kernel": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.19" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -}