-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
124 changed files
with
8,320 additions
and
1,825 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
158 changes: 158 additions & 0 deletions
158
course-materials/coding-colabs/6b_advanced_data_manipulation.qmd
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
--- | ||
title: "Coding Colab 6B" | ||
subtitle: "Analyzing Global Temperature Anomalies and CO2 Concentrations" | ||
jupyter: eds217_2024 | ||
format: | ||
html: | ||
toc: true | ||
toc-depth: 3 | ||
code-fold: show | ||
--- | ||
|
||
## Introduction | ||
|
||
In this coding colab, you'll analyze global temperature anomalies and CO2 concentration data. You'll practice data manipulation, joining datasets, time series analysis, and visualization techniques. | ||
|
||
## Learning Objectives | ||
|
||
By the end of this colab, you will be able to: | ||
|
||
1. Load and preprocess time series data | ||
2. Join datasets based on datetime indices | ||
3. Perform basic time series analysis and resampling | ||
5. Apply data manipulation techniques to extract insights from environmental datasets | ||
|
||
## Setup | ||
|
||
Let's start by importing necessary libraries and loading our datasets: | ||
|
||
```{python} | ||
#| echo: true | ||
import pandas as pd | ||
import matplotlib.pyplot as plt | ||
import seaborn as sns | ||
# Load the temperature anomaly dataset | ||
temp_url = "monthly_temperature_data.csv" | ||
temp_df = pd.read_csv(temp_url, parse_dates=['Date']) | ||
# Load the CO2 concentration dataset | ||
co2_url = "monthly_co2_concentration.csv" | ||
co2_df = pd.read_csv(co2_url, parse_dates=['Date']) | ||
print("Temperature data:") | ||
print(temp_df.head()) | ||
print("\nCO2 data:") | ||
print(co2_df.head()) | ||
``` | ||
|
||
## Task 1: Data Preparation | ||
|
||
1. Set the 'Date' column as the index for both dataframes. | ||
2. Ensure that there are no missing values in either dataset. | ||
|
||
```{python} | ||
#| echo: false | ||
#| include: false | ||
# Solution | ||
temp_df.set_index('Date', inplace=True) | ||
co2_df.set_index('Date', inplace=True) | ||
print("Missing values in temperature data:", temp_df.isnull().sum()) | ||
print("Missing values in CO2 data:", co2_df.isnull().sum()) | ||
``` | ||
|
||
## Task 2: Joining Datasets | ||
|
||
1. Merge the temperature and CO2 datasets based on their date index. | ||
2. Handle any missing values that may have been introduced by the merge. | ||
3. Create some plots showing temperature anomalies and CO2 concentrations over time using pandas built-in plotting functions. | ||
|
||
```{python} | ||
#| echo: false | ||
#| include: false | ||
# Solution | ||
combined_df = pd.merge(temp_df, co2_df, left_index=True, right_index=True, how='outer') | ||
combined_df.fillna(method='ffill', inplace=True) | ||
fig, ax1 = plt.subplots(figsize=(12, 6)) | ||
ax1.set_xlabel('Year') | ||
ax1.set_ylabel('Temperature Anomaly (°C)', color='tab:red') | ||
ax1.plot(combined_df.index, combined_df['MonthlyAnomaly'], color='tab:red') | ||
ax1.tick_params(axis='y', labelcolor='tab:red') | ||
ax2 = ax1.twinx() | ||
ax2.set_ylabel('CO2 Concentration (ppm)', color='tab:blue') | ||
ax2.plot(combined_df.index, combined_df['CO2Concentration'], color='tab:blue') | ||
ax2.tick_params(axis='y', labelcolor='tab:blue') | ||
plt.title('Monthly Global Temperature Anomalies and CO2 Concentration') | ||
plt.show() | ||
``` | ||
|
||
## Task 3: Time Series Analysis | ||
|
||
1. Resample the data to annual averages. | ||
2. Calculate the year-over-year change in temperature anomalies and CO2 concentrations. | ||
3. Create a scatter plot (use the [`plt.scatter()`](../cheatsheets/matplotlib.qmd) function) of annual temperature anomalies vs CO2 concentrations. | ||
|
||
```{python} | ||
#| echo: false | ||
#| include: false | ||
# Solution | ||
annual_data = combined_df.resample('YE').mean() | ||
annual_data['Temp_YoY_Change'] = annual_data['MonthlyAnomaly'].diff() | ||
annual_data['CO2_YoY_Change'] = annual_data['CO2Concentration'].diff() | ||
plt.figure(figsize=(10, 6)) | ||
plt.scatter(annual_data['CO2Concentration'], annual_data['MonthlyAnomaly']) | ||
plt.xlabel('CO2 Concentration (ppm)') | ||
plt.ylabel('Temperature Anomaly (°C)') | ||
plt.title('Annual Temperature Anomaly vs CO2 Concentration') | ||
plt.show() | ||
``` | ||
|
||
## Task 4: Seasonal Analysis | ||
|
||
1. Create a function to extract the season from a given date (hint: use the `date.month` attribute and if-elif-else to assign the season in your function). | ||
2. Use the function to create a new column called `Season` | ||
3. Calculate the average temperature anomaly and CO2 concentration for each season. | ||
4. Create a box plot (use [`sns.boxplot`](../cheatsheets/seaborn.qmd)) showing the distribution of temperature anomalies for each season. | ||
|
||
|
||
```{python} | ||
#| echo: false | ||
#| include: false | ||
# Solution | ||
def get_season(date): | ||
month = date.month | ||
if month in [12, 1, 2]: | ||
return 'Winter' | ||
elif month in [3, 4, 5]: | ||
return 'Spring' | ||
elif month in [6, 7, 8]: | ||
return 'Summer' | ||
else: | ||
return 'Fall' | ||
combined_df['Season'] = combined_df.index.map(get_season) | ||
seasonal_avg = combined_df.groupby('Season').mean() | ||
print("Seasonal Averages:") | ||
print(seasonal_avg) | ||
plt.figure(figsize=(10, 6)) | ||
sns.boxplot(x='Season', y='MonthlyAnomaly', data=combined_df) | ||
plt.title('Distribution of Temperature Anomalies by Season') | ||
plt.ylabel('Temperature Anomaly (°C)') | ||
plt.show() | ||
``` | ||
|
Oops, something went wrong.