updated day6 morning

kcaylor · Sep 10, 2024 · 834a9fa · 834a9fa
1 parent 91a5011
commit 834a9fa
Show file tree

Hide file tree

Showing 124 changed files with 8,320 additions and 1,825 deletions.
diff --git a/_quarto.yml b/_quarto.yml
@@ -73,9 +73,9 @@ website:
           - text: Session 5b - 🐼 Cleaning Data 
             href: course-materials/interactive-sessions/5b_cleaning_data.qmd
           - text: Session 6a - 🐼 Grouping, Joining, and Sorting (Part I)
-            href: course-materials/interactive-sessions/6a_grouping_joining_sorting_1.qmd
-          - text: Session 6b - 🐼 Grouping, Joining, and Sorting (Part II)
-            href: course-materials/interactive-sessions/6b_grouping_joining_sorting_2.qmd
+            href: course-materials/interactive-sessions/6a_grouping_joining_sorting.qmd
+          # - text: Session 6b - 🐼 Grouping, Joining, and Sorting (Part II)
+          #   href: course-materials/interactive-sessions/6b_grouping_joining_sorting_2.qmd
           - text: Session 6c - 📆 Working with Dates
             href: course-materials/interactive-sessions/6c_dates.qmd
           - text: Session 7a - 📊 Data Visualization with Seaborn & Matplotlib (Part I)
@@ -94,6 +94,8 @@ website:
             href: course-materials/coding-colabs/4b_pandas_dataframes.qmd
           - text: 🙌 Session 5c - Data Cleaning
             href: course-materials/coding-colabs/5c_cleaning_data.qmd
+          - text: 🙌 Session 6b - Data Manipulation
+            href: course-materials/coding-colabs/6b_advanced_data_manipulation.qmd
           - text: 🙌 Session 7c - Exploring data using visualizations
             href: course-materials/coding-colabs/7c_visualizations.qmd
       - text: 👀 cheatsheets

diff --git a/course-materials/cheatsheets/matplotlib.qmd b/course-materials/cheatsheets/matplotlib.qmd
@@ -49,15 +49,15 @@ ax.plot(x, y, label='sin(x)')
 ```python
 x = np.random.rand(50)
 y = np.random.rand(50)
-ax.scatter(x, y, alpha=0.5)
+plt.scatter(x, y, alpha=0.5)
 ```
 
 ### Bar Plot
 
 ```python
 categories = ['A', 'B', 'C', 'D']
 values = [3, 7, 2, 5]
-ax.bar(categories, values)
+plt.bar(categories, values)
 ```
 
 ### Histogram

diff --git a/course-materials/coding-colabs/6b_advanced_data_manipulation.qmd b/course-materials/coding-colabs/6b_advanced_data_manipulation.qmd
@@ -0,0 +1,158 @@
+---
+title: "Coding Colab 6B"
+subtitle: "Analyzing Global Temperature Anomalies and CO2 Concentrations"
+jupyter: eds217_2024
+format:
+  html:
+    toc: true
+    toc-depth: 3
+    code-fold: show
+---
+
+## Introduction
+
+In this coding colab, you'll analyze global temperature anomalies and CO2 concentration data. You'll practice data manipulation, joining datasets, time series analysis, and visualization techniques.
+
+## Learning Objectives
+
+By the end of this colab, you will be able to:
+
+1. Load and preprocess time series data
+2. Join datasets based on datetime indices
+3. Perform basic time series analysis and resampling
+5. Apply data manipulation techniques to extract insights from environmental datasets
+
+## Setup
+
+Let's start by importing necessary libraries and loading our datasets:
+
+```{python}
+#| echo: true
+
+import pandas as pd
+import matplotlib.pyplot as plt
+import seaborn as sns
+
+# Load the temperature anomaly dataset
+temp_url = "monthly_temperature_data.csv"
+temp_df = pd.read_csv(temp_url, parse_dates=['Date'])
+
+# Load the CO2 concentration dataset
+co2_url = "monthly_co2_concentration.csv"
+co2_df = pd.read_csv(co2_url, parse_dates=['Date'])
+
+print("Temperature data:")
+print(temp_df.head())
+print("\nCO2 data:")
+print(co2_df.head())
+```
+
+## Task 1: Data Preparation
+
+1. Set the 'Date' column as the index for both dataframes.
+2. Ensure that there are no missing values in either dataset.
+
+```{python}
+#| echo: false
+#| include: false
+
+# Solution
+temp_df.set_index('Date', inplace=True)
+co2_df.set_index('Date', inplace=True)
+
+print("Missing values in temperature data:", temp_df.isnull().sum())
+print("Missing values in CO2 data:", co2_df.isnull().sum())
+
+```
+
+## Task 2: Joining Datasets
+
+1. Merge the temperature and CO2 datasets based on their date index.
+2. Handle any missing values that may have been introduced by the merge.
+3. Create some plots showing temperature anomalies and CO2 concentrations over time using pandas built-in plotting functions. 
+
+```{python}
+#| echo: false
+#| include: false
+
+# Solution
+combined_df = pd.merge(temp_df, co2_df, left_index=True, right_index=True, how='outer')
+combined_df.fillna(method='ffill', inplace=True)
+
+fig, ax1 = plt.subplots(figsize=(12, 6))
+
+ax1.set_xlabel('Year')
+ax1.set_ylabel('Temperature Anomaly (°C)', color='tab:red')
+ax1.plot(combined_df.index, combined_df['MonthlyAnomaly'], color='tab:red')
+ax1.tick_params(axis='y', labelcolor='tab:red')
+
+ax2 = ax1.twinx()
+ax2.set_ylabel('CO2 Concentration (ppm)', color='tab:blue')
+ax2.plot(combined_df.index, combined_df['CO2Concentration'], color='tab:blue')
+ax2.tick_params(axis='y', labelcolor='tab:blue')
+
+plt.title('Monthly Global Temperature Anomalies and CO2 Concentration')
+plt.show()
+```
+
+## Task 3: Time Series Analysis
+
+1. Resample the data to annual averages.
+2. Calculate the year-over-year change in temperature anomalies and CO2 concentrations.
+3. Create a scatter plot (use the [`plt.scatter()`](../cheatsheets/matplotlib.qmd) function) of annual temperature anomalies vs CO2 concentrations.
+
+```{python}
+#| echo: false
+#| include: false
+
+# Solution
+annual_data = combined_df.resample('YE').mean()
+
+annual_data['Temp_YoY_Change'] = annual_data['MonthlyAnomaly'].diff()
+annual_data['CO2_YoY_Change'] = annual_data['CO2Concentration'].diff()
+
+plt.figure(figsize=(10, 6))
+plt.scatter(annual_data['CO2Concentration'], annual_data['MonthlyAnomaly'])
+plt.xlabel('CO2 Concentration (ppm)')
+plt.ylabel('Temperature Anomaly (°C)')
+plt.title('Annual Temperature Anomaly vs CO2 Concentration')
+plt.show()
+```
+
+## Task 4: Seasonal Analysis
+
+1. Create a function to extract the season from a given date (hint: use the `date.month` attribute and if-elif-else to assign the season in your function).
+2. Use the function to create a new column called `Season`
+3. Calculate the average temperature anomaly and CO2 concentration for each season.
+4. Create a box plot (use [`sns.boxplot`](../cheatsheets/seaborn.qmd)) showing the distribution of temperature anomalies for each season.
+
+
+```{python}
+#| echo: false
+#| include: false
+
+# Solution
+def get_season(date):
+    month = date.month
+    if month in [12, 1, 2]:
+        return 'Winter'
+    elif month in [3, 4, 5]:
+        return 'Spring'
+    elif month in [6, 7, 8]:
+        return 'Summer'
+    else:
+        return 'Fall'
+
+combined_df['Season'] = combined_df.index.map(get_season)
+
+seasonal_avg = combined_df.groupby('Season').mean()
+print("Seasonal Averages:")
+print(seasonal_avg)
+
+plt.figure(figsize=(10, 6))
+sns.boxplot(x='Season', y='MonthlyAnomaly', data=combined_df)
+plt.title('Distribution of Temperature Anomalies by Season')
+plt.ylabel('Temperature Anomaly (°C)')
+plt.show()
+```
+