Skip to content

Commit

Permalink
updated day6 morning
Browse files Browse the repository at this point in the history
  • Loading branch information
kcaylor committed Sep 10, 2024
1 parent 91a5011 commit 834a9fa
Show file tree
Hide file tree
Showing 124 changed files with 8,320 additions and 1,825 deletions.
8 changes: 5 additions & 3 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,9 @@ website:
- text: Session 5b - 🐼 Cleaning Data
href: course-materials/interactive-sessions/5b_cleaning_data.qmd
- text: Session 6a - 🐼 Grouping, Joining, and Sorting (Part I)
href: course-materials/interactive-sessions/6a_grouping_joining_sorting_1.qmd
- text: Session 6b - 🐼 Grouping, Joining, and Sorting (Part II)
href: course-materials/interactive-sessions/6b_grouping_joining_sorting_2.qmd
href: course-materials/interactive-sessions/6a_grouping_joining_sorting.qmd
# - text: Session 6b - 🐼 Grouping, Joining, and Sorting (Part II)
# href: course-materials/interactive-sessions/6b_grouping_joining_sorting_2.qmd
- text: Session 6c - 📆 Working with Dates
href: course-materials/interactive-sessions/6c_dates.qmd
- text: Session 7a - 📊 Data Visualization with Seaborn & Matplotlib (Part I)
Expand All @@ -94,6 +94,8 @@ website:
href: course-materials/coding-colabs/4b_pandas_dataframes.qmd
- text: 🙌 Session 5c - Data Cleaning
href: course-materials/coding-colabs/5c_cleaning_data.qmd
- text: 🙌 Session 6b - Data Manipulation
href: course-materials/coding-colabs/6b_advanced_data_manipulation.qmd
- text: 🙌 Session 7c - Exploring data using visualizations
href: course-materials/coding-colabs/7c_visualizations.qmd
- text: 👀 cheatsheets
Expand Down
4 changes: 2 additions & 2 deletions course-materials/cheatsheets/matplotlib.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,15 @@ ax.plot(x, y, label='sin(x)')
```python
x = np.random.rand(50)
y = np.random.rand(50)
ax.scatter(x, y, alpha=0.5)
plt.scatter(x, y, alpha=0.5)
```

### Bar Plot

```python
categories = ['A', 'B', 'C', 'D']
values = [3, 7, 2, 5]
ax.bar(categories, values)
plt.bar(categories, values)
```

### Histogram
Expand Down
158 changes: 158 additions & 0 deletions course-materials/coding-colabs/6b_advanced_data_manipulation.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
---
title: "Coding Colab 6B"
subtitle: "Analyzing Global Temperature Anomalies and CO2 Concentrations"
jupyter: eds217_2024
format:
html:
toc: true
toc-depth: 3
code-fold: show
---

## Introduction

In this coding colab, you'll analyze global temperature anomalies and CO2 concentration data. You'll practice data manipulation, joining datasets, time series analysis, and visualization techniques.

## Learning Objectives

By the end of this colab, you will be able to:

1. Load and preprocess time series data
2. Join datasets based on datetime indices
3. Perform basic time series analysis and resampling
5. Apply data manipulation techniques to extract insights from environmental datasets

## Setup

Let's start by importing necessary libraries and loading our datasets:

```{python}
#| echo: true
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the temperature anomaly dataset
temp_url = "monthly_temperature_data.csv"
temp_df = pd.read_csv(temp_url, parse_dates=['Date'])
# Load the CO2 concentration dataset
co2_url = "monthly_co2_concentration.csv"
co2_df = pd.read_csv(co2_url, parse_dates=['Date'])
print("Temperature data:")
print(temp_df.head())
print("\nCO2 data:")
print(co2_df.head())
```

## Task 1: Data Preparation

1. Set the 'Date' column as the index for both dataframes.
2. Ensure that there are no missing values in either dataset.

```{python}
#| echo: false
#| include: false
# Solution
temp_df.set_index('Date', inplace=True)
co2_df.set_index('Date', inplace=True)
print("Missing values in temperature data:", temp_df.isnull().sum())
print("Missing values in CO2 data:", co2_df.isnull().sum())
```

## Task 2: Joining Datasets

1. Merge the temperature and CO2 datasets based on their date index.
2. Handle any missing values that may have been introduced by the merge.
3. Create some plots showing temperature anomalies and CO2 concentrations over time using pandas built-in plotting functions.

```{python}
#| echo: false
#| include: false
# Solution
combined_df = pd.merge(temp_df, co2_df, left_index=True, right_index=True, how='outer')
combined_df.fillna(method='ffill', inplace=True)
fig, ax1 = plt.subplots(figsize=(12, 6))
ax1.set_xlabel('Year')
ax1.set_ylabel('Temperature Anomaly (°C)', color='tab:red')
ax1.plot(combined_df.index, combined_df['MonthlyAnomaly'], color='tab:red')
ax1.tick_params(axis='y', labelcolor='tab:red')
ax2 = ax1.twinx()
ax2.set_ylabel('CO2 Concentration (ppm)', color='tab:blue')
ax2.plot(combined_df.index, combined_df['CO2Concentration'], color='tab:blue')
ax2.tick_params(axis='y', labelcolor='tab:blue')
plt.title('Monthly Global Temperature Anomalies and CO2 Concentration')
plt.show()
```

## Task 3: Time Series Analysis

1. Resample the data to annual averages.
2. Calculate the year-over-year change in temperature anomalies and CO2 concentrations.
3. Create a scatter plot (use the [`plt.scatter()`](../cheatsheets/matplotlib.qmd) function) of annual temperature anomalies vs CO2 concentrations.

```{python}
#| echo: false
#| include: false
# Solution
annual_data = combined_df.resample('YE').mean()
annual_data['Temp_YoY_Change'] = annual_data['MonthlyAnomaly'].diff()
annual_data['CO2_YoY_Change'] = annual_data['CO2Concentration'].diff()
plt.figure(figsize=(10, 6))
plt.scatter(annual_data['CO2Concentration'], annual_data['MonthlyAnomaly'])
plt.xlabel('CO2 Concentration (ppm)')
plt.ylabel('Temperature Anomaly (°C)')
plt.title('Annual Temperature Anomaly vs CO2 Concentration')
plt.show()
```

## Task 4: Seasonal Analysis

1. Create a function to extract the season from a given date (hint: use the `date.month` attribute and if-elif-else to assign the season in your function).
2. Use the function to create a new column called `Season`
3. Calculate the average temperature anomaly and CO2 concentration for each season.
4. Create a box plot (use [`sns.boxplot`](../cheatsheets/seaborn.qmd)) showing the distribution of temperature anomalies for each season.


```{python}
#| echo: false
#| include: false
# Solution
def get_season(date):
month = date.month
if month in [12, 1, 2]:
return 'Winter'
elif month in [3, 4, 5]:
return 'Spring'
elif month in [6, 7, 8]:
return 'Summer'
else:
return 'Fall'
combined_df['Season'] = combined_df.index.map(get_season)
seasonal_avg = combined_df.groupby('Season').mean()
print("Seasonal Averages:")
print(seasonal_avg)
plt.figure(figsize=(10, 6))
sns.boxplot(x='Season', y='MonthlyAnomaly', data=combined_df)
plt.title('Distribution of Temperature Anomalies by Season')
plt.ylabel('Temperature Anomaly (°C)')
plt.show()
```

Loading

0 comments on commit 834a9fa

Please sign in to comment.