diff --git a/course-materials/interactive-sessions/8a_github.qmd b/course-materials/interactive-sessions/8a_github.qmd index 0e98629..4ec9174 100644 --- a/course-materials/interactive-sessions/8a_github.qmd +++ b/course-materials/interactive-sessions/8a_github.qmd @@ -86,13 +86,22 @@ Before we commit our notebooks, let's clean them to remove output cells and exec pip install nbstripout ``` +:::{.callout-warning} +By default, the server won't add a new python package to the main package repository on `workbench`. For this reason, you will see a warning when running the pip command that looks something like this: + +``` +WARNING: The script nbstripout is installed in '/Users/[your user id]/.local/bin' which is not on PATH +``` + +Therefore, we need to access the `nbsripout` command by specifying it's location in your local user folder: + 2. Configure `nbstripout` for your repository: ```{python} #| echo: true #| eval: false -nbstripout --install --attributes .gitattributes +~/.local/bin/nbstripout --install --attributes .gitattributes ``` This sets up `nbstripout` to automatically clean your notebooks when you commit them. diff --git a/docs/course-materials/answer-keys/2c_lists_dictionaries_sets-key.html b/docs/course-materials/answer-keys/2c_lists_dictionaries_sets-key.html index bf8fa11..8572d50 100644 --- a/docs/course-materials/answer-keys/2c_lists_dictionaries_sets-key.html +++ b/docs/course-materials/answer-keys/2c_lists_dictionaries_sets-key.html @@ -483,7 +483,7 @@

Task 1: List Operat
  • Remove the second fruit from the list.
  • Print the final list.
  • -
    +
    # Example code for instructor
     fruits = ["apple", "banana", "cherry"]
     print("Original list:", fruits)
    @@ -512,7 +512,7 @@ 

    Task 2: Dicti
  • Update the quantity of an existing item.
  • Print the final inventory.
  • -
    +
    # Example code for instructor
     inventory = {
         "apples": 50,
    @@ -549,7 +549,7 @@ 

    Task
  • Find and print the intersection of the two sets.
  • Add a new element to the evens set.
  • -
    +
    # Example code for instructor
     evens = {2, 4, 6, 8, 10}
     odds = {1, 3, 5, 7, 9}
    @@ -583,7 +583,7 @@ 

  • Use a list comprehension to remove duplicates.
  • Print the results of both methods.
  • -
    +
    # Example code for instructor
     numbers = [1, 2, 2, 3, 3, 3, 4, 4, 5]
     
    diff --git a/docs/course-materials/answer-keys/3b_control_flows-key.html b/docs/course-materials/answer-keys/3b_control_flows-key.html
    index 5de93ae..be52878 100644
    --- a/docs/course-materials/answer-keys/3b_control_flows-key.html
    +++ b/docs/course-materials/answer-keys/3b_control_flows-key.html
    @@ -470,10 +470,10 @@ 

    Task 1: Simpl
  • Otherwise, print “Enjoy the pleasant weather!”
  • -
    +
    temperature = 20
    -
    +
    if temperature > 25:
         print("It's a hot day, stay hydrated!")
     else:
    @@ -497,10 +497,10 @@ 

    Task 2: Grade Clas
  • Below 60: “F”
  • -
    +
    score = 85
    -
    +
    if score >= 90:
         grade = 'A'
     elif score >= 80:
    @@ -528,7 +528,7 @@ 

    Task 3: Counting She
  • Use a for loop with the range() function
  • Print each number followed by “sheep”
  • -
    +
    for i in range(1,6):
         print(f"{i} sheep")
    @@ -548,10 +548,10 @@

    Task 4: Sum of Numbe
  • Use a for loop with the range() function to add each number to total
  • After the loop, print the total
  • -
    +
    total = 0
    -
    +
    for i in range(1,11):
         total = total + i
     
    @@ -573,10 +573,10 @@ 

    Task 5: Countdown

  • After each print, decrease the countdown by 1
  • When the countdown reaches 0, print “Blast off!”
  • -
    +
    countdown = 5
    -
    +
    while countdown > 0:
         print(countdown)
         # (-= is a python syntax shortcut inherited from C)
    diff --git a/docs/course-materials/answer-keys/3d_pandas_series-key.html b/docs/course-materials/answer-keys/3d_pandas_series-key.html
    index a092e0c..90dd055 100644
    --- a/docs/course-materials/answer-keys/3d_pandas_series-key.html
    +++ b/docs/course-materials/answer-keys/3d_pandas_series-key.html
    @@ -440,7 +440,7 @@ 

    Resources

    Setup

    First, let’s import the necessary libraries and create a sample Series.

    -
    +
    import pandas as pd
     import numpy as np
     
    @@ -463,7 +463,7 @@ 

    Exercise 1: C

    apple: $0.5, banana: $0.3, cherry: $1.0, date: $1.5, elderberry: $2.0

    -
    +
    # Create a Series called 'prices' with the same index as 'fruits'
     # Use these prices: apple: $0.5, banana: $0.3, cherry: $1.0, date: $1.5, elderberry: $2.0
     prices = pd.Series([0.5, 0.3, 1.0, 2.5, 3.0], index=fruits.values, name='Prices')
    @@ -486,7 +486,7 @@ 

    Exercise 2: S
  • Find the most expensive fruit.
  • Apply a 10% discount to all fruits priced over 1.0.
  • -
    +
    # 1. Calculate the total price of all fruits
     total_price = prices.sum()
     
    @@ -522,7 +522,7 @@ 

    Exercise 3: Ser
  • How many fruits cost less than $1.0?
  • What is the price range (difference between max and min prices)?
  • -
    +
    # 1. Calculate the average price of the fruits
     average_price = prices.mean()
     
    @@ -550,7 +550,7 @@ 

    Exercise 4:
  • Remove ‘banana’ from both Series.
  • Sort both Series by fruit name (alphabetically).
  • -
    +
    # 1. Add 'fig' to both Series (price: $1.2)
     fruits = pd.concat([fruits, pd.Series(['fig'], name='Fruits')])
     prices = pd.concat([prices, pd.Series([1.2], index=['fig'], name='Prices')])
    diff --git a/docs/course-materials/answer-keys/5c_cleaning_data-key.html b/docs/course-materials/answer-keys/5c_cleaning_data-key.html
    index 9821089..91245b8 100644
    --- a/docs/course-materials/answer-keys/5c_cleaning_data-key.html
    +++ b/docs/course-materials/answer-keys/5c_cleaning_data-key.html
    @@ -435,7 +435,7 @@ 

    Resources

    Setup

    First, let’s import the necessary libraries and load an example messy dataframe.

    -
    +
    import pandas as pd
     import numpy as np
     
    @@ -450,36 +450,36 @@ 

  • Removing duplicates
  • -
    +
    messy_df.drop_duplicates(inplace=True)
    1. Handling missing values (either fill or dropna to remove rows with missing data)
    -
    +
    messy_df = messy_df.dropna()
    1. Ensuring consistent data types (dates, strings)
    -
    +
    messy_df['site'] = messy_df['site'].astype('string')
     messy_df['collection date'] = pd.to_datetime(messy_df['collection date'])
    1. Formatting the ‘site’ column for consistency
    -
    +
    messy_df['site'] = messy_df['site'].str.lower().replace('sitec','site_c')
    1. Making sure all column names are lower case, without whitespace.
    -
    +
    messy_df.rename(columns={'collection date': 'collection_date'}, inplace=True)

    Try to implement these steps using the techniques we’ve learned.

    -
    +
    cleaned_df = messy_df.copy()
     
     print("Cleaned DataFrame:")
    diff --git a/docs/course-materials/answer-keys/7c_visualizations-key.html b/docs/course-materials/answer-keys/7c_visualizations-key.html
    index 72df041..00e73d8 100644
    --- a/docs/course-materials/answer-keys/7c_visualizations-key.html
    +++ b/docs/course-materials/answer-keys/7c_visualizations-key.html
    @@ -437,7 +437,7 @@ 

    Introduction

    Setup

    First, let’s import the necessary libraries and load our dataset.

    -
    +
    Code
    import pandas as pd
    @@ -495,7 +495,7 @@ 

    +
    Code
    # Answer for Task 1
    @@ -536,7 +536,7 @@ 

    Task 2: Exam
  • Modify the pairplot to show the species information using different colors.
  • Interpret the pairplot: which variables seem to be most strongly correlated? Do you notice any patterns related to species?
  • -
    +
    Code
    # Answer for Task 2
    @@ -572,7 +572,7 @@ 

    +
    Code
    # Answer for Task 3
    @@ -621,7 +621,7 @@ 

    Task 4: Jo
  • Experiment with different kind parameters in the joint plot (e.g., ‘scatter’, ‘kde’, ‘hex’).
  • Create another joint plot, this time for ‘bill_length_mm’ and ‘bill_depth_mm’, colored by species.
  • -
    +
    Code
    # Answer for Task 4
    @@ -696,7 +696,7 @@ 

    Bonus Challenge

  • Customize the heatmap by adding annotations and adjusting the colormap.
  • Compare the insights from this heatmap with those from the pairplot. What additional information does each visualization provide?
  • -
    +
    Code
    # Answer for Bonus Challenge
    diff --git a/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-1.png b/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-1.png
    index d5a28f6..355e6a3 100644
    Binary files a/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-1.png and b/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-1.png differ
    diff --git a/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-2.png b/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-2.png
    index 2aeee46..ef2cb0f 100644
    Binary files a/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-2.png and b/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-2.png differ
    diff --git a/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-3.png b/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-3.png
    index 3e3232d..21b3190 100644
    Binary files a/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-3.png and b/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-3.png differ
    diff --git a/docs/course-materials/answer-keys/eod-day1-key.html b/docs/course-materials/answer-keys/eod-day1-key.html
    index 792dbc5..37dd881 100644
    --- a/docs/course-materials/answer-keys/eod-day1-key.html
    +++ b/docs/course-materials/answer-keys/eod-day1-key.html
    @@ -441,7 +441,7 @@ 

    Instructions

  • Import the necessary libraries to work with data (pandas) and create plots (matplotlib.pyplot). Use the standard python conventions that import pandas as pd and import matplotlib.pyplot as plt
  • -
    +
    import pandas as pd
     import matplotlib.pyplot as plt
    @@ -454,7 +454,7 @@

    Instructions

  • Create a variable called url that stores the URL provided above.
  • Use the pandas library’s read_csv() function from pandas to load the data from the URL into a new DataFrame called df. Any pandas function will always be called using the pd object and dot notation: pd.read_csv()
  • -
    +
    url = 'https://raw.githubusercontent.com/environmental-data-science/eds217-day0-comp/main/data/raw_data/toolik_weather.csv'
     df = pd.read_csv(url)
    @@ -467,7 +467,7 @@

    Instructions

    Note: Because the head() function is a method of a DataFrame, you will call it using dot notation and the dataframe you just created: df.head()

    -
    +
    df.head()
    @@ -635,7 +635,7 @@

    Instructions

  • Use the isnull() method combined with sum() to count missing values in each column.
  • -
    +
    df.isnull().sum()
    Year                                   0
    @@ -671,7 +671,7 @@ 

    Instructions

  • Use the info() method to get an overview of the DataFrame, including data types and non-null counts. Just like the head() function, these are methods associated with your df object, so you call them with dot notation.
  • -
    +
    df.describe()
     df.info()
    @@ -712,7 +712,7 @@

    Instructions

    - Choose a strategy to handle missing data in the columns. For example, fill missing values with the mean of the column using the `fillna()` method or drop rows with missing data using the `dropna()` method. -::: {#85e9c032 .cell execution_count=6} +::: {#2cdb04f6 .cell execution_count=6} ``` {.python .cell-code} df['Daily_AirTemp_Mean_C'].fillna(df['Daily_AirTemp_Mean_C'].mean(), inplace=True) df.dropna(subset=['Daily_globalrad_total_jcm2'], inplace=True) @@ -720,7 +720,7 @@

    Instructions

    ::: {.cell-output .cell-output-stderr} ``` -/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85158/1318736512.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. +/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_90091/1318736512.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. @@ -741,7 +741,7 @@

    Instructions

  • Calculate the mean of the ‘Daily_AirTemp_Mean_C’ column for each month in the monthly using the mean() function. Save this result to a new variable called monthly_means.
  • -
    +
    monthly = df.groupby('Month')
     monthly_means = monthly['Daily_AirTemp_Mean_C'].mean()
    @@ -755,7 +755,7 @@

    Instructions

    Syntax Similarity: Use plt.plot() or plot.bar() to create plots. In R, you would use ggplot().

    -
    +
    plt.plot(monthly_means)
    @@ -765,7 +765,7 @@

    Instructions

    -
    +
    months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
     plt.bar(months, monthly_means)
    @@ -784,7 +784,7 @@

    Instructions

    Hint: Similar to calculating monthly averages, group by the ‘Year’ column.

    -
    +
    year = df.groupby('Year')
     yearly_means = year['Daily_AirTemp_Mean_C'].mean()
     plt.plot(yearly_means)
    @@ -796,7 +796,7 @@

    Instructions

    -
    +
    year_list = df['Year'].unique()
     plt.bar(year_list, yearly_means)
    diff --git a/docs/course-materials/answer-keys/eod-day2-key.html b/docs/course-materials/answer-keys/eod-day2-key.html index ea4b6e1..9df42dd 100644 --- a/docs/course-materials/answer-keys/eod-day2-key.html +++ b/docs/course-materials/answer-keys/eod-day2-key.html @@ -464,7 +464,7 @@

    Learning Objectives

    Setup

    First, let’s import the necessary libraries:

    -
    +
    Code
    # We won't use the random library until the end of this exercise, 
    @@ -480,7 +480,7 @@ 

    Part 1: Data Collec

    Task 1: Create a List of Classmates

    Create a list containing the names of at least 4 of your classmates in this course.

    -
    +
    Code
    classmates = ["Alice", "Bob", "Charlie", "David", "Eve"]
    @@ -500,7 +500,7 @@ 

    +
    Code
    classmate_info = {
    @@ -549,7 +549,7 @@ 

    Task 3: List Operat
  • Sort the list alphabetically
  • Find and print the index of a specific classmate
  • -
    +
    Code
    # a) Add a new classmate
    @@ -584,7 +584,7 @@ 

    Task 4: Dicti
  • Update the “number of pets” for one classmate
  • Create a list of all the favorite colors your classmates mentioned
  • -
    +
    Code
    # a) Add favorite_study_spot
    @@ -643,7 +643,7 @@ 

    Task 5: Basic Stat
  • The average number of pets among your classmates
  • The name of the classmate who got the most sleep last night
  • -
    +
    Code
    # a) Average number of pets
    @@ -663,7 +663,7 @@ 

    Task 5: Basic Stat

    Task 6: Data Filtering

    Create a new list containing only the classmates who have at least one pet.

    -
    +
    Code
    classmates_with_pets = [name for name, info in classmate_info.items() if info["number_of_pets"] > 0]
    @@ -681,7 +681,7 @@ 

    Part 4:

    Example: Random Selection from a Dictionary

    Here’s a simple example of how to select random items from a dictionary:

    -
    +
    Code
    import random
    @@ -710,10 +710,10 @@ 

    print(f"Randomly selected {num_selections} fruits: {random_fruits}")

    -
    Randomly selected fruit: grape
    -Its color: purple
    -Another randomly selected fruit: kiwi
    -Randomly selected 3 fruits: ['orange', 'apple', 'banana']
    +
    Randomly selected fruit: orange
    +Its color: orange
    +Another randomly selected fruit: apple
    +Randomly selected 3 fruits: ['grape', 'orange', 'banana']

    This example demonstrates how to:

    @@ -734,7 +734,7 @@

    Task 7: Random # Test your function assign_random_snacks(classmate_info)

    -
    +
    Code
    def assign_random_snacks(classmate_info):
    @@ -746,7 +746,7 @@ 

    Task 7: Random assign_random_snacks(classmate_info)

    -
    Alice will share almonds with Charlie
    +
    Charlie will share carrots with Eve
    diff --git a/docs/course-materials/answer-keys/eod-day3-key.html b/docs/course-materials/answer-keys/eod-day3-key.html index 350fb3b..d1be420 100644 --- a/docs/course-materials/answer-keys/eod-day3-key.html +++ b/docs/course-materials/answer-keys/eod-day3-key.html @@ -447,7 +447,7 @@

    Introduction

    Setup

    First, let’s import the necessary libraries and set up our environment.

    -
    +
    Code
    import pandas as pd
    @@ -461,7 +461,7 @@ 

    Creating a Random Number Generator

    We can create a random number generator object like this:

    -
    +
    Code
    rng = np.random.default_rng()
    @@ -472,7 +472,7 @@

    Creatin

    Using a Seed for Reproducibility

    In data science, it’s often crucial to be able to reproduce our results. We can do this by setting a seed for our random number generator. Here’s how:

    -
    +
    Code
    rng = np.random.default_rng(seed=42)
    @@ -487,7 +487,7 @@

    Creating t
  • Create a series called scores that contains 10 elements representing monthly test scores. We’ll use random integers between 70 and 100 to generate the monthly scores, and set the index to be the month names from September to June:
  • months = ['Sep', 'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
    -
    +
    Code
    # Create the month list:
    @@ -505,7 +505,7 @@ 

    Analyzing the Te

    1. What is the student’s average test score for the entire year?

    Calculate the mean of all scores in the series.

    -
    +
    Code
    # 1. Average score for the entire year
    @@ -520,7 +520,7 @@ 

    2. What is the student’s average test score during the first half of the year?

    Calculate the mean of the first five months’ scores.

    -
    +
    Code
    # 2. Average score for the first half of the year
    @@ -537,7 +537,7 @@ 

    3. What is the student’s average test score during the second half of the year?

    Calculate the mean of the last five months’ scores.

    -
    +
    Code
    second_half_average = scores.iloc[5:].mean()
    @@ -553,7 +553,7 @@ 

    4. Did the student improve their performance in the second half? If so, by how much?

    Compare the average scores from the first and second half of the year.

    -
    +
    Code
    # 4. Performance improvement
    @@ -572,7 +572,7 @@ 

    Exploring Reproducibility

    To demonstrate the importance of seeding, try creating two series with different random number generators:

    -
    +
    Code
    rng1 = np.random.default_rng(seed=42)
    @@ -588,7 +588,7 @@ 

    Exploring Reprod

    Now try creating two series with random number generators that have different seeds:

    -
    +
    Code
    rng3 = np.random.default_rng(seed=42)
    diff --git a/docs/course-materials/answer-keys/eod-day4-key.html b/docs/course-materials/answer-keys/eod-day4-key.html
    index 8c63702..e9156c3 100644
    --- a/docs/course-materials/answer-keys/eod-day4-key.html
    +++ b/docs/course-materials/answer-keys/eod-day4-key.html
    @@ -455,7 +455,7 @@ 

    Introduction

    This end-of-day session is focused on using pandas for loading, visualizing, and analyzing marine microplastics data. This session is designed to help you become more comfortable with the pandas library, equipping you with the skills needed to perform data analysis effectively.

    The National Oceanic and Atmospheric Administration, via its National Centers for Environmental Information has an entire section related to marine microplastics – that is, microplastics found in water — at https://www.ncei.noaa.gov/products/microplastics.

    We will be working with a recent download of the entire marine microplastics dataset. The url for this data is located here:

    -
    +
    Code
    url = 'https://ucsb.box.com/shared/static/dnnu59jsnkymup6o8aaovdywrtxiy3a9.csv'
    @@ -468,7 +468,7 @@

    1. Loading the Data

    Objective: Learn to load data into a pandas DataFrame and display the first few records.

    Task 1.1 Import the pandas library.

    -
    +
    Code
    import pandas as pd
    @@ -490,7 +490,7 @@

    +
    Code
    df = pd.read_csv(url, parse_dates=['Date'], date_format='%m/%d/%Y %I:%M:%S %p')
    @@ -502,7 +502,7 @@

    Task 1.3:

    • Display the first five rows of the DataFrame to get an initial understanding of the data structure.
    -
    +
    Code
    print(df.head())
    @@ -562,7 +562,7 @@

    Task 2.1:

    • Display summary statistics of the dataset to understand the central tendency and variability.
    -
    +
    Code
    summary_statistics = df.describe()
    @@ -614,7 +614,7 @@ 

    Task 2.2:

    Note that the results of the built-in function - df['column'].isnull() need to be wrapped in ( ) for the ~ operator to work properly.

    -
    +
    Code
    print("DataFrame info:",df.info())
    @@ -668,7 +668,7 @@ 

    Task 3.1:

    • Create a groupby object called oceans that groups the data in df according to the value of the Oceans column.
    -
    +
    Code
    oceans = df.groupby(['Oceans'])
    @@ -680,7 +680,7 @@

    Task 3.2:

    • Determine the total number of Measurements taken from each Ocean.
    -
    +
    Code
    print(oceans['Measurement'].count())
    @@ -700,7 +700,7 @@

    Task 3.3:

    • Determine the average value of Measurement taken from each Ocean.
    -
    +
    Code
    print(oceans['Measurement'].mean())
    @@ -723,7 +723,7 @@

    Task 4.1:

    • Filter the data to a new df (called df2) that only contains rows where the Unit of measurement is pieces/m3
    -
    +
    Code
    df2 = df[df['Unit'] == 'pieces/m3']
    @@ -735,7 +735,7 @@

    Task 4.2:

    • Use the groupby and the max() command to determine the Maximum value of pieces/m3 measured for each Ocean
    -
    +
    Code
    # Instructor code
    @@ -759,7 +759,7 @@ 

    Task 5.1:

    • Make a histogram of the latitude of every sample in your filtered dataframe using the DataFrame plot command.
    -
    +
    Code
    df2['Latitude'].hist()
    @@ -791,7 +791,7 @@

    Task 5.2:

    Using .copy() when filtering a dataframe ensures that you’re working with a new DataFrame, not a view of the original. This is especially important when you’re filtering data and then modifying the result, which is common in data science workflows.

    -
    +
    Code
    df3 = df2[df2['Measurement'] > 0].copy()
    @@ -816,7 +816,7 @@

    Task 5.3

    The numpy library has a log10() function that you will find useful for this step!

    -
    +
    Code
    import numpy as np
    @@ -829,7 +829,7 @@ 

    Task 5.4

    • Make a histogram of the log-transformed values in df3
    -
    +
    Code
    df3['log10Measurement'].hist()
    diff --git a/docs/course-materials/answer-keys/eod-day5-key.html b/docs/course-materials/answer-keys/eod-day5-key.html index 2144727..20a0c32 100644 --- a/docs/course-materials/answer-keys/eod-day5-key.html +++ b/docs/course-materials/answer-keys/eod-day5-key.html @@ -453,7 +453,7 @@

    Reference:

    Setup

    First, let’s import the necessary libraries and load the data:

    -
    +
    Code
    import pandas as pd
    @@ -465,7 +465,7 @@ 

    Setup

    df = pd.read_csv(url)
    -
    +
    Code
    # Display the first few rows:
    @@ -508,7 +508,7 @@ 

    Setup

    4 3.775280 True 1 NaN NaN
    -
    +
    Code
    # Display the dataframe info:
    @@ -550,7 +550,7 @@ 

    1. Data Preparation

    1. Set the index of the DataFrame to be the ‘entity’ column.
    -
    +
    Code
    # The fastest way to set the index is when loading the dataframe:
    @@ -564,7 +564,7 @@ 

    1. Data Preparation

    1. Remove the ‘year’, ‘Banana values’, ‘type’, ‘Unnamed: 16’, and ‘Chart?’ columns.
    -
    +
    Code
    df = df.drop([
    @@ -581,7 +581,7 @@ 

    1. Data Preparation

    1. Display the first few rows of the modified DataFrame.
    -
    +
    Code
    print(df.head())
    @@ -634,7 +634,7 @@

    2. Exploring Banan
    1. For each of the pre-computed banana score columns (kg, calories, and protein), show the 10 highest-scoring food products.
    -
    +
    Code
    print("\nTop 10 for Bananas index (kg):")
    @@ -692,7 +692,7 @@ 

    2. Exploring Banan

    Note: We could also use the df.filter() command to select all the columns that contain ‘Bana’:

    -
    +
    Code
    for each_column in df.filter(like='Bana'):
    @@ -747,7 +747,7 @@ 

    2. Exploring Banan
    1. Create a function to return the top 10 scores for a given column.
    -
    +
    Code
    def return_top_10(df, column):
    @@ -758,7 +758,7 @@ 

    2. Exploring Banan
    1. Use your function to display the results for each of the three Banana index columns.
    -
    +
    Code
    banana_columns = [
    @@ -837,7 +837,7 @@ 

    3. Common High-S

    Python sets allow you to quickly determine intersections: in_all_three = set.intersection(seta, setb, setc), or you can use the * operator to unpack a list of sets directly: in_all_three = set.intersection(*list_of_sets)

    -
    +
    Code
    top_10_kg = set(return_top_10(df, 'Bananas index (kg)').index)
    @@ -855,7 +855,7 @@ 

    3. Common High-S
    
     Foods in top 10 for all three metrics:
    -{'Beef mince', 'Beef steak', 'Beef meatballs'}
    +{'Beef mince', 'Beef meatballs', 'Beef steak'}

    @@ -877,7 +877,7 @@

    4. Land Use Analysis

    The data on land_use_1000kcal for bananas is in the Bananas row.

    -
    +
    Code
    bananas_land_use_1000kcal = df.loc['Bananas', 'land_use_1000kcal']
    @@ -888,7 +888,7 @@ 

    4. Land Use Analysis

  • Display the 10 foods with the highest land use score.
  • -
    +
    Code
    print("\nTop 10 foods by land use score:")
    @@ -914,7 +914,7 @@ 

    4. Land Use Analysis

  • Compare this list with the previous top 10 lists. Are there any common foods?
  • -
    +
    Code
    # Use a list comprehension and df.filter to make a list of sets:
    @@ -933,7 +933,7 @@ 

    4. Land Use Analysis

    5. Cheese Analysis

    Identify the type of cheese with the highest banana score per 1,000 kcal. How does it compare to other cheeses in the dataset?

    -
    +
    Code
    # 5. Cheese Analysis
    @@ -976,7 +976,7 @@ 

    6. Correlation Analys
    1. Calculate and display the correlations among the four computed banana scores (including the new land use score).
    -
    +
    Code
    # 6a. Correlation Analysis
    @@ -987,7 +987,7 @@ 

    6. Correlation Analys
    1. Create a heatmap to visualize these correlations.
    -
    +
    Code
    plt.figure(figsize=(10, 8))
    @@ -1007,7 +1007,7 @@ 

    6. Correlation Analys

    7. Using Pandas styles

    Style your correlation dataframe to highlight values in the range between 0.8 and 0.99.

    -
    +
    Code
    # 7. Visualization
    @@ -1015,49 +1015,49 @@ 

    7. Using Pandas styles

    - +
    - - - - + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + +
     Bananas index (kg)Bananas index (1000 kcalories)Bananas index (100g protein)Bananas index (land use 1000 kcal)Bananas index (kg)Bananas index (1000 kcalories)Bananas index (100g protein)Bananas index (land use 1000 kcal)
    Bananas index (kg)1.0000000.8826390.2245550.926726Bananas index (kg)1.0000000.8826390.2245550.926726
    Bananas index (1000 kcalories)0.8826391.0000000.3680010.880739Bananas index (1000 kcalories)0.8826391.0000000.3680010.880739
    Bananas index (100g protein)0.2245550.3680011.0000000.224511Bananas index (100g protein)0.2245550.3680011.0000000.224511
    Bananas index (land use 1000 kcal)0.9267260.8807390.2245111.000000Bananas index (land use 1000 kcal)0.9267260.8807390.2245111.000000
    @@ -1068,7 +1068,7 @@

    7. Using Pandas styles

    Bonus Challenge

    If you finish early, try to create a “Banana Equivalence” calculator. This function should take a food item, an amount, and a metric (kg, calories, or protein) as input, and return how many bananas would have the same environmental impact.

    -
    +
    Code
    # Bonus Challenge
    diff --git a/docs/course-materials/answer-keys/eod-day6-key.html b/docs/course-materials/answer-keys/eod-day6-key.html
    index b0a749f..f5123f7 100644
    --- a/docs/course-materials/answer-keys/eod-day6-key.html
    +++ b/docs/course-materials/answer-keys/eod-day6-key.html
    @@ -430,7 +430,7 @@ 

    On this page

    Setup

    First, import the necessary libraries and load the dataset:

    -
    +
    Code
    import pandas as pd
    @@ -449,7 +449,7 @@ 

    Task
    1. Display the first few rows of the dataset.
    -
    +
    Code
    print(eurovision_df.head())
    @@ -503,7 +503,7 @@

    Task
    1. Check the data types of each column.
    -
    +
    Code
    print(eurovision_df.dtypes)
    @@ -536,7 +536,7 @@

    Task
    1. Identify and handle any missing values.
    -
    +
    Code
    print(eurovision_df.isnull().sum())
    @@ -570,7 +570,7 @@ 

    Task
    1. Convert the ‘year’ column to datetime type.
    -
    +
    Code
    eurovision_df['year'] = pd.to_datetime(eurovision_df['year'], format='%Y')
    @@ -595,7 +595,7 @@

    Task 2

    Use .copy() to make sure you create a new dataframe and not just a view.

    -
    +
    Code
    eurovision_1990 = eurovision_df[eurovision_df['year'].dt.year >= 1990].copy()
    @@ -604,7 +604,7 @@

    Task 2
    1. Calculate the difference between final points and semi-final points for each entry and make a histogram of these values using the builtin dataframe .hist() command.
    -
    +
    Code
    eurovision_1990['points_difference'] = eurovision_1990['points_final'] - eurovision_1990['points_sf']
    @@ -624,7 +624,7 @@ 

    Task 3: Sor
    1. Find the top 10 countries with the most Eurovision appearances (use the entire dataset for this calculation)
    -
    +
    Code
    top_10_countries = eurovision_df['to_country'].value_counts().head(10)
    @@ -648,7 +648,7 @@ 

    Task 3: Sor
    1. Calculate the average final points for each country across all years. Make a simple bar plot of these data.
    -
    +
    Code
    avg_points_by_country = eurovision_df.groupby('to_country')['points_final'].mean().sort_values(ascending=False)
    @@ -756,7 +756,7 @@ 

    Task 4: Group

    These methods create a new column that you can use with groupby() for aggregations across your chosen time intervals.

    -
    +
    Code
    eurovision_df['decade'] = (eurovision_df['year'].dt.year // 10) * 10
    @@ -782,13 +782,13 @@ 

    Task 5: Joining Data
  • Read in a new dataframe that contains population data stored at this url:
  • -
    +
    Code
    population_url = 'https://bit.ly/euro_pop'
    -
    +
    Code
    population_df = pd.read_csv(population_url)
    @@ -797,7 +797,7 @@

    Task 5: Joining Data
  • Join this data with the Eurovision dataframe.
  • -
    +
    Code
    merged_df = pd.merge(eurovision_df, population_df, left_on='to_country', right_on='country_name')
    @@ -825,7 +825,7 @@

    Task 5: Joining Data3d. Sort the results by entries per capita

    3e. Print the top 10 values

    -
    +
    Code
    # Step 1. Count the number of records for each country
    @@ -883,7 +883,7 @@ 

    Task 6: Time S
    1. Plot the trend of maximum final points awarded over the years.
    -
    +
    Code
    yearly_max_points = eurovision_df.groupby('year')['points_final'].max()
    diff --git a/docs/course-materials/answer-keys/eod-day7-key.html b/docs/course-materials/answer-keys/eod-day7-key.html
    index 71510cd..51f75b5 100644
    --- a/docs/course-materials/answer-keys/eod-day7-key.html
    +++ b/docs/course-materials/answer-keys/eod-day7-key.html
    @@ -460,7 +460,7 @@ 

    Tasks

    1. Setup

    First, import pandas, matplotlib, and seaborn and load the three datasets.

    -
    +
    Code
    import pandas as pd
    @@ -479,7 +479,7 @@ 

    1. Setup

    Next, display the first few rows and print out the dataset info to get an idea of the contents of each dataset.

    -
    +
    Code
    # Display the first few rows:
    @@ -529,7 +529,7 @@ 

    1. Setup

    4 0 0 NaN
    -
    +
    Code
    # Display the dataframe info:
    @@ -589,7 +589,7 @@ 

    1. Setup

    You may have noticed that the zipcodes were read in as integers rather than strings, and therefore might not be 5 digits long. Ensure the zipcode or zip column in all datasets is a 5-character string, filling in any zeros that were dropped.

    -
    +
    Code
    # Ensure 5-character string zipcodes
    @@ -599,7 +599,7 @@ 

    1. Setup

    Combine the 2012 and 2023 data together by adding a year column and then stacking them together.

    -
    +
    Code
    # Add the year column
    @@ -612,7 +612,7 @@ 

    1. Setup

    In the combined plant hardiness dataframe, create two new columns, trange_min and trange_max, containing the min and max temperatures of the trange column. Remove the original trange column.

    Hint: use str.split() to split the trange strings where they have spaces and retrieve the first and last components (min and max, respectively)

    -
    +
    Code
    # Split the trange string and get the first (min) and last (max) pieces of it
    @@ -629,7 +629,7 @@ 

    Tasks

    2. Exploration and visualization

    On average, how much has the minimum temperature in a zip code changed from 2012 to 2023?

    -
    +
    Code
    # Get the mean of the minimum temperatures 
    @@ -644,7 +644,7 @@ 

    2. Explorati

    Merge together the combined plant hardiness dataset and the zipcode dataset by zipcode. This will give us more informtaion in the plant hardiness dataset, such as the latitude and longitude for each zipcode.

    -
    +
    Code
    df = pd.merge(df, df_zipcodes, left_on='zipcode', right_on='zip')
    @@ -810,7 +810,7 @@ 

    2. Explorati

    Create two scatter plot where the x axis is the longitude, the y axis is the latitude, the color is based on the minimum temperature in 2012 for one and 2023 for the other. Only look at longitude < -60.

    -
    +
    Code
    # Filter the data for longitude less than -60
    @@ -852,7 +852,7 @@ 

    2. Explorati

    Now create a single scatter plot where you look at the difference between the minimum temperature in 2012 and 2023. Only look at longitude < -60. Color any zipcodes where you do not have information from both years in grey.

    -
    +
    Code
    # Find the difference in minimum temperature between 2023 and 2012
    @@ -879,7 +879,7 @@ 

    2. Explorati

    Create a bar plot showing the top 10 states where the average minimum temperature increased the most. Label your axes appropriately.

    -
    +
    Code
    # Filter the data for only 2012 and 2023
    @@ -905,7 +905,7 @@ 

    2. Explorati plt.show()

    -
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85403/2045411015.py:17: FutureWarning: 
    +
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_90366/2045411015.py:17: FutureWarning: 
     
     Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.
     
    diff --git a/docs/course-materials/cheatsheets/JupyterLab.html b/docs/course-materials/cheatsheets/JupyterLab.html
    index 8456980..2bdd3d5 100644
    --- a/docs/course-materials/cheatsheets/JupyterLab.html
    +++ b/docs/course-materials/cheatsheets/JupyterLab.html
    @@ -471,7 +471,7 @@ 

    Variable Inspector

    The variable inspector is not suitable for use with large dataframes or large arrays. You should use standard commands like df.head(), df.tail(), df.info(), df.describe() to inspect large dataframes.

    -
    +
    Code
    # Example variables
    @@ -489,7 +489,7 @@ 

    Essential Magic C

    Magic commands start with % (line magics) or %% (cell magics). Note that available magic commands may vary depending on your Jupyter environment and installed extensions.

    Viewing Variables

    -
    +
    Code
    # List all variables
    @@ -501,7 +501,7 @@ 

    Viewing Variables

    Variable     Type        Data/Info
     ----------------------------------
    -ojs_define   function    <function ojs_define at 0x1a6303e20>
    +ojs_define   function    <function ojs_define at 0x1a3ab3e20>
     x            int         5
     y            str         Hello
     z            list        n=3
    @@ -511,7 +511,7 @@ 

    Viewing Variables

    Running Shell Commands

    -
    +
    Code
    # Run a shell command
    diff --git a/docs/course-materials/cheatsheets/chart_customization.html b/docs/course-materials/cheatsheets/chart_customization.html
    index 8f76fae..17d5f4a 100644
    --- a/docs/course-materials/cheatsheets/chart_customization.html
    +++ b/docs/course-materials/cheatsheets/chart_customization.html
    @@ -439,7 +439,7 @@ 

    On this page

    Matplotlib Customization

    Basic Plot Setup

    -
    +
    Code
    import matplotlib.pyplot as plt
    @@ -463,7 +463,7 @@ 

    Basic Plot Setup

    Customizing Line Plots

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -485,7 +485,7 @@ 

    Customizing Line Pl

    Adjusting Axes

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -507,7 +507,7 @@ 

    Adjusting Axes

    Adding Legend

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -527,7 +527,7 @@ 

    Adding Legend

    Customizing Text and Annotations

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -552,7 +552,7 @@ 

    Customizi

    Seaborn Customization

    Setting the Style

    -
    +
    Code
    import seaborn as sns
    @@ -565,7 +565,7 @@ 

    Setting the Style

    Loading and Preparing Data

    -
    +
    Code
    # Load the tips dataset
    @@ -602,7 +602,7 @@ 

    Loading and Pre

    Customizing a Scatter Plot

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -621,7 +621,7 @@ 

    Customizing a S

    Customizing a Box Plot

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -630,7 +630,7 @@ 

    Customizing a Box P plt.show()

    -
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85582/2996973332.py:2: FutureWarning: 
    +
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_90593/2996973332.py:2: FutureWarning: 
     
     Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
     
    @@ -647,7 +647,7 @@ 

    Customizing a Box P

    Customizing a Heatmap (Correlation of Numeric Columns)

    -
    +
    Code
    corr = tips_numeric.corr()
    @@ -667,7 +667,7 @@ 

    Customizing a Pair Plot

    -
    +
    Code
    sns.pairplot(tips, hue="time", palette="husl", height=2.5, 
    @@ -686,7 +686,7 @@ 

    Customizing a Pair

    Customizing a Regression Plot

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -705,7 +705,7 @@ 

    Customizing

    Customizing a Categorical Plot

    -
    +
    Code
    plt.figure(figsize=(12, 6))
    diff --git a/docs/course-materials/cheatsheets/chart_customization_files/figure-html/cell-13-output-1.png b/docs/course-materials/cheatsheets/chart_customization_files/figure-html/cell-13-output-1.png
    index 1e6b146..b027a0d 100644
    Binary files a/docs/course-materials/cheatsheets/chart_customization_files/figure-html/cell-13-output-1.png and b/docs/course-materials/cheatsheets/chart_customization_files/figure-html/cell-13-output-1.png differ
    diff --git a/docs/course-materials/cheatsheets/comprehensions.html b/docs/course-materials/cheatsheets/comprehensions.html
    index b91d146..928d8b7 100644
    --- a/docs/course-materials/cheatsheets/comprehensions.html
    +++ b/docs/course-materials/cheatsheets/comprehensions.html
    @@ -439,7 +439,7 @@ 

    List Comprehensions

    Basic Syntax

    A list comprehension provides a concise way to create lists. The basic syntax is:

    -
    +
    Code
    # [expression for item in iterable]
    @@ -454,7 +454,7 @@ 

    Basic Syntax

    With Conditional Logic

    You can add a condition to include only certain items in the new list:

    -
    +
    Code
    # [expression for item in iterable if condition]
    @@ -469,7 +469,7 @@ 

    With Conditional Lo

    Nested List Comprehensions

    List comprehensions can be nested to handle more complex data structures:

    -
    +
    Code
    # [(expression1, expression2) for item1 in iterable1 for item2 in iterable2]
    @@ -484,7 +484,7 @@ 

    Nested List Com

    Evaluating Functions in a List Comprehension

    You can use list comprehensions to apply a function to each item in an iterable:

    -
    +
    Code
    # Function to evaluate
    @@ -507,7 +507,7 @@ 

    Dictionary Compr

    Basic Syntax

    Dictionary comprehensions provide a concise way to create dictionaries. The basic syntax is:

    -
    +
    Code
    # {key_expression: value_expression for item in iterable}
    @@ -524,7 +524,7 @@ 

    Basic Syntax

    Without zip

    You can create a dictionary without using zip by leveraging the index:

    -
    +
    Code
    # {key_expression: value_expression for index in range(len(list))}
    @@ -542,7 +542,7 @@ 

    Without zip

    With Conditional Logic

    You can include conditions to filter out key-value pairs:

    -
    +
    Code
    # {key_expression: value_expression for item in iterable if condition}
    @@ -560,7 +560,7 @@ 

    With Conditional

    Evaluating Functions in a Dictionary Comprehension

    You can use dictionary comprehensions to apply a function to values in an iterable:

    -
    +
    Code
    # Function to evaluate
    diff --git a/docs/course-materials/cheatsheets/control_flows.html b/docs/course-materials/cheatsheets/control_flows.html
    index 995d905..dab49fe 100644
    --- a/docs/course-materials/cheatsheets/control_flows.html
    +++ b/docs/course-materials/cheatsheets/control_flows.html
    @@ -444,7 +444,7 @@ 

    On this page

    Conditional Statements

    if-elif-else

    -
    +
    Code
    x = 10
    @@ -465,7 +465,7 @@ 

    if-elif-else

    Loops

    for loop

    -
    +
    Code
    fruits = ["apple", "banana", "cherry"]
    @@ -481,7 +481,7 @@ 

    for loop

    while loop

    -
    +
    Code
    count = 0
    @@ -503,7 +503,7 @@ 

    while loop

    Loop Control

    break

    -
    +
    Code
    for i in range(10):
    @@ -522,7 +522,7 @@ 

    break

    continue

    -
    +
    Code
    for i in range(5):
    @@ -543,7 +543,7 @@ 

    continue

    Comprehensions

    List Comprehension

    -
    +
    Code
    squares = [x**2 for x in range(5)]
    @@ -556,7 +556,7 @@ 

    List Comprehension

    Dictionary Comprehension

    -
    +
    Code
    squares_dict = {x: x**2 for x in range(5)}
    @@ -572,7 +572,7 @@ 

    Dictionary Compre

    Exception Handling

    try-except

    -
    +
    Code
    try:
    @@ -587,7 +587,7 @@ 

    try-except

    try-except-else-finally

    -
    +
    Code
    try:
    diff --git a/docs/course-materials/cheatsheets/data_grouping.html b/docs/course-materials/cheatsheets/data_grouping.html
    index 0318731..a916a5a 100644
    --- a/docs/course-materials/cheatsheets/data_grouping.html
    +++ b/docs/course-materials/cheatsheets/data_grouping.html
    @@ -431,7 +431,7 @@ 

    On this page

    Grouping Data

    Grouping data allows you to split your DataFrame into groups based on one or more columns.

    -
    +
    Code
    import pandas as pd
    @@ -455,7 +455,7 @@ 

    Grouping Data

    Creating a groupby object:

    -
    +
    Code
    # Group by 'category'
    @@ -469,7 +469,7 @@ 

    Aggregating Data

    After grouping, you can apply various aggregation functions to summarize the data within each group.

    Basic aggregation

    -
    +
    Code
    # Basic aggregations
    @@ -490,7 +490,7 @@ 

    Basic aggregation

    Doing multiple aggregations at the same time using agg()

    -
    +
    Code
    # Multiple aggregations
    @@ -506,7 +506,7 @@ 

    Aggregation using a custom function

    -
    +
    Code
    # Custom aggregation function
    @@ -541,7 +541,7 @@ 

    Grouped Operations

    You can apply operations to each group separately using transform() or apply().

    Using transform() to alter each group in a group by object

    -
    +
    Code
    # Transform: apply function to each group, return same-sized DataFrame
    @@ -554,7 +554,7 @@ 

    Using apply() to alter each group in a group by object

    -
    +
    Code
    # Apply: apply function to each group, return a DataFrame or Series
    @@ -564,7 +564,7 @@ 

    result = grouped.apply(group_range)

    -
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85349/114114075.py:5: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
    +
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_90307/114114075.py:5: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
       result = grouped.apply(group_range)
    @@ -575,7 +575,7 @@

    Pivot Tables

    Pivot tables are a powerful tool for reorganizing and summarizing data. They allow you to transform your data from a long format to a wide format, making it easier to analyze and visualize patterns.

    Working with Pivot Tables

    -
    +
    Code
    # Sample DataFrame
    @@ -596,7 +596,7 @@ 

    Working with Piv

    Pivot tables with a single aggregation function

    -
    +
    Code
    # Create a pivot table
    @@ -613,7 +613,7 @@ 

    Pivot tables with multiple aggregation

    -
    +
    Code
    # Pivot table with multiple aggregation functions
    @@ -629,9 +629,9 @@ 

    Piv 2023-01-02 120 180 120.0 180.0

    -
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85349/1326309547.py:2: FutureWarning: The provided callable <function sum at 0x11053b2e0> is currently using DataFrameGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
    +
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_90307/1326309547.py:2: FutureWarning: The provided callable <function sum at 0x111b7f2e0> is currently using DataFrameGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
       pivot_multi = pd.pivot_table(df, values='sales', index='date', columns='product',
    -/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85349/1326309547.py:2: FutureWarning: The provided callable <function mean at 0x11054c400> is currently using DataFrameGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
    +/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_90307/1326309547.py:2: FutureWarning: The provided callable <function mean at 0x111b90400> is currently using DataFrameGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
       pivot_multi = pd.pivot_table(df, values='sales', index='date', columns='product',
    diff --git a/docs/course-materials/cheatsheets/data_selection.html b/docs/course-materials/cheatsheets/data_selection.html index 66f671f..4a754f3 100644 --- a/docs/course-materials/cheatsheets/data_selection.html +++ b/docs/course-materials/cheatsheets/data_selection.html @@ -488,7 +488,7 @@

    Selection vs. 

    Setup

    First, let’s import pandas and load our dataset.

    -
    +
    Code
    import pandas as pd
    @@ -513,7 +513,7 @@ 

    Setup

    Basic Selection

    Select a Single Column

    -
    +
    Code
    # Using square brackets
    @@ -526,7 +526,7 @@ 

    Select a Single Col

    Select Multiple Columns

    -
    +
    Code
    # Select age and gpa columns
    @@ -536,7 +536,7 @@ 

    Select Multiple Co

    Select Rows by Index

    -
    +
    Code
    # Select first 5 rows
    @@ -549,7 +549,7 @@ 

    Select Rows by Index<

    Select Rows and Columns

    -
    +
    Code
    # Select first 3 rows and 'age', 'gpa' columns
    @@ -562,7 +562,7 @@ 

    Select Rows and Co

    Filtering

    Filter by a Single Condition

    -
    +
    Code
    # Students with age greater than 21
    @@ -572,7 +572,7 @@ 

    Filter by a S

    Filter by Multiple Conditions

    -
    +
    Code
    # Students with age > 21 and gpa > 3.5
    @@ -582,7 +582,7 @@ 

    Filter by Mu

    Filter Using .isin()

    -
    +
    Code
    # Students majoring in Computer Science or Biology
    @@ -592,7 +592,7 @@ 

    Filter Using .isin()

    Filter Using String Methods

    -
    +
    Code
    # Majors starting with 'E'
    @@ -603,7 +603,7 @@ 

    Filter Using S

    Combining Selection and Filtering

    -
    +
    Code
    # Select 'age' and 'gpa' for students with gpa > 3.5
    @@ -622,7 +622,7 @@ 

    .loc[] vs .iloc[]

    .query() Method

    -
    +
    Code
    # Filter using query method
    @@ -632,7 +632,7 @@ 

    .query() Method

    .where() Method

    -
    +
    Code
    # Replace values not meeting the condition with NaN
    diff --git a/docs/course-materials/cheatsheets/functions.html b/docs/course-materials/cheatsheets/functions.html
    index 84e0e67..cbdfcf7 100644
    --- a/docs/course-materials/cheatsheets/functions.html
    +++ b/docs/course-materials/cheatsheets/functions.html
    @@ -455,7 +455,7 @@ 

    Basics of Functions

    Defining a Function

    In Python, a function is defined using the def keyword, followed by the function name and parentheses () that may include parameters.

    -
    +
    Code
    def function_name(parameters):
    @@ -466,7 +466,7 @@ 

    Defining a Function

    Example: Convert Celsius to Fahrenheit

    -
    +
    Code
    def celsius_to_fahrenheit(celsius):
    @@ -479,7 +479,7 @@ 

    Exam

    Calling a Function

    Call a function by using its name followed by parentheses, and pass arguments if the function requires them.

    -
    +
    Code
    temperature_celsius = 25
    @@ -496,7 +496,7 @@ 

    Calling a Function

    Common Unit Conversions

    Example: Convert Kilometers to Miles

    -
    +
    Code
    def kilometers_to_miles(kilometers):
    diff --git a/docs/course-materials/cheatsheets/pandas_dataframes.html b/docs/course-materials/cheatsheets/pandas_dataframes.html
    index 2f96c1a..87f2b7e 100644
    --- a/docs/course-materials/cheatsheets/pandas_dataframes.html
    +++ b/docs/course-materials/cheatsheets/pandas_dataframes.html
    @@ -447,7 +447,7 @@ 

    Introduction

    Importing Pandas

    Always start by importing pandas:

    -
    +
    Code
    import pandas as pd
    @@ -458,7 +458,7 @@

    Importing Pandas

    Creating a DataFrame

    From a dictionary

    -
    +
    Code
    data = {'Name': ['Alice', 'Bob', 'Charlie'],
    @@ -477,7 +477,7 @@ 

    From a dictionary

    From a CSV file

    -
    +
    Code
    # Here's an example csv file we can use for read_csv:
    @@ -508,7 +508,7 @@ 

    From a CSV file

    Basic DataFrame Information

    -
    +
    Code
    # Display the first few rows
    @@ -560,7 +560,7 @@ 

    Basic DataFram

    Selecting Data

    Selecting columns

    -
    +
    Code
    # Select a single column
    @@ -573,7 +573,7 @@ 

    Selecting columns

    Selecting rows

    -
    +
    Code
    # Select rows by index
    @@ -589,7 +589,7 @@ 

    Selecting rows

    Basic Data Manipulation

    Adding a new column

    -
    +
    Code
    df['Is Adult'] = df['Age'] >= 18
    @@ -598,7 +598,7 @@

    Adding a new column

    Renaming columns

    -
    +
    Code
    df = df.rename(columns={'Name': 'Full Name'})
    @@ -607,7 +607,7 @@

    Renaming columns

    Handling missing values

    -
    +
    Code
    # Drop rows with any missing values
    @@ -621,7 +621,7 @@ 

    Handling missing v

    Basic Calculations

    -
    +
    Code
    # Calculate mean age
    @@ -634,7 +634,7 @@ 

    Basic Calculations

    Grouping and Aggregation

    -
    +
    Code
    # Group by city and calculate mean age
    @@ -644,7 +644,7 @@ 

    Grouping and Aggr

    Sorting

    -
    +
    Code
    # Sort by Age in descending order
    @@ -654,7 +654,7 @@ 

    Sorting

    Saving a DataFrame

    -
    +
    Code
    # Save to CSV
    diff --git a/docs/course-materials/cheatsheets/pandas_series.html b/docs/course-materials/cheatsheets/pandas_series.html
    index dc19cc5..e930d1e 100644
    --- a/docs/course-materials/cheatsheets/pandas_series.html
    +++ b/docs/course-materials/cheatsheets/pandas_series.html
    @@ -454,7 +454,7 @@ 

    Introduction

    Importing Pandas

    Always start by importing pandas:

    -
    +
    Code
    import pandas as pd
    @@ -465,7 +465,7 @@

    Importing Pandas

    Creating a Series

    From a list

    -
    +
    Code
    data = [1, 2, 3, 4, 5]
    @@ -484,7 +484,7 @@ 

    From a list

    From a dictionary

    -
    +
    Code
    data = {'a': 0., 'b': 1., 'c': 2.}
    @@ -501,7 +501,7 @@ 

    From a dictionary

    With custom index

    -
    +
    Code
    s = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
    @@ -520,7 +520,7 @@ 

    With custom index

    Basic Series Information

    -
    +
    Code
    # Display the first few elements
    @@ -572,7 +572,7 @@ 

    Basic Series Info

    Selecting Data

    By label

    -
    +
    Code
    # Select a single element
    @@ -585,7 +585,7 @@ 

    By label

    By position

    -
    +
    Code
    # Select by integer index (direct selection is being deprecated)
    @@ -601,7 +601,7 @@ 

    By position

    By condition

    -
    +
    Code
    # Select elements greater than 2
    @@ -614,7 +614,7 @@ 

    By condition

    Basic Data Manipulation

    Updating values

    -
    +
    Code
    s['a'] = 10
    @@ -623,7 +623,7 @@

    Updating values

    Removing elements

    -
    +
    Code
    s=s.drop(labels=['a'])
    @@ -632,7 +632,7 @@

    Removing elements

    Adding elements to a Series

    -
    +
    Code
    another_series = pd.Series(
    @@ -659,7 +659,7 @@ 

    Adding element

    Updating elements based on their value using mask

    -
    +
    Code
    print(s)
    @@ -689,7 +689,7 @@ 

    Replacing elements based on their value using where

    -
    +
    Code
    print(s)
    @@ -721,7 +721,7 @@ 

    Applying functions

    Applying a newly-defined function

    -
    +
    Code
    def squared(x):
    @@ -735,7 +735,7 @@ 

    Applying

    Applying a lambda (temporary) function

    -
    +
    Code
    s_squared = s.apply(lambda x: x**2)
    @@ -746,7 +746,7 @@

    Apply

    Handling missing values

    -
    +
    Code
    # Drop missing values
    @@ -760,7 +760,7 @@ 

    Handling missing v

    Basic Calculations

    -
    +
    Code
    # Calculate mean
    @@ -776,7 +776,7 @@ 

    Basic Calculations

    Sorting

    -
    +
    Code
    print(s)
    @@ -800,7 +800,7 @@ 

    Sorting

    Reindexing

    -
    +
    Code
    print(f"Original Series:\n{s}\n", sep='')
    @@ -839,7 +839,7 @@ 

    Reindexing

    Combining Series

    -
    +
    Code
    s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
    @@ -850,7 +850,7 @@ 

    Combining Series

    Converting to Other Data Types

    -
    +
    Code
    # To list
    diff --git a/docs/course-materials/cheatsheets/random_numbers.html b/docs/course-materials/cheatsheets/random_numbers.html
    index bb01aa8..381335b 100644
    --- a/docs/course-materials/cheatsheets/random_numbers.html
    +++ b/docs/course-materials/cheatsheets/random_numbers.html
    @@ -437,7 +437,7 @@ 

    Importing NumPy

    -
    +
    Code
    import numpy as np
    @@ -446,7 +446,7 @@

    Importing NumPy

    Creating a Generator

    -
    +
    Code
    # Create a Generator with the default BitGenerator
    @@ -461,7 +461,7 @@ 

    Creating a Generator<

    Basic Random Number Generation

    Uniform Distribution (0 to 1)

    -
    +
    Code
    # Single random float
    @@ -471,14 +471,14 @@ 

    Uniform Distri print(rng.random(5))

    -
    0.3278102376191233
    -[0.16307633 0.40392116 0.45921329 0.26359239 0.32981461]
    +
    0.6803022305128279
    +[0.17038781 0.23263899 0.38613713 0.05561754 0.72097601]

    Integers

    -
    +
    Code
    # Single random integer from 0 to 10 (inclusive)
    @@ -488,14 +488,14 @@ 

    Integers

    print(rng.integers(1, 101, size=5))
    -
    4
    -[73 74 35 82 86]
    +
    6
    +[55  6 33 55 48]

    Normal (Gaussian) Distribution

    -
    +
    Code
    # Single value from standard normal distribution
    @@ -505,15 +505,15 @@ 

    Normal (Gauss print(rng.normal(loc=0, scale=1, size=5))

    -
    -0.39813817964311515
    -[ 0.28262105 -1.23483313  0.42485523 -0.12518701 -0.53235252]
    +
    -1.8528418798684148
    +[ 0.34382396  0.14170714 -0.58774396 -0.12801022 -0.9131221 ]

    Sampling

    -
    +
    Code
    # Random choice from an array
    @@ -524,14 +524,14 @@ 

    Sampling

    print(rng.choice(arr, size=3, replace=False))
    -
    4
    -[2 5 4]
    +
    2
    +[1 3 2]

    Shuffling

    -
    +
    Code
    arr = np.arange(10)
    @@ -539,14 +539,14 @@ 

    Shuffling

    print(arr)
    -
    [4 1 6 5 8 2 3 7 9 0]
    +
    [3 1 7 0 6 9 8 2 4 5]

    Other Distributions

    Generators provide methods for many other distributions:

    -
    +
    Code
    # Poisson distribution
    @@ -559,16 +559,16 @@ 

    Other Distributionsprint(rng.binomial(n=10, p=0.5, size=3))

    -
    [8 3 4]
    -[1.16757058 0.27745692 0.73386612]
    -[6 5 5]
    +
    [2 4 4]
    +[0.33994151 0.5071583  0.27743568]
    +[5 3 6]

    Generating on Existing Arrays

    Generators can fill existing arrays, which can be more efficient:

    -
    +
    Code
    arr = np.empty(5)
    @@ -576,14 +576,14 @@ 

    Generating o print(arr)

    -
    [0.91186338 0.70114601 0.45616672 0.46383814 0.5871376 ]
    +
    [0.48613036 0.36527278 0.12150668 0.89769943 0.84216537]

    Bit Generators

    You can use different Bit Generators with varying statistical qualities:

    -
    +
    Code
    from numpy.random import PCG64, MT19937
    @@ -595,14 +595,14 @@ 

    Bit Generators

    print("MT19937:", rng_mt.random())
    -
    PCG64: 0.8210003140505849
    -MT19937: 0.41349136323428615
    +
    PCG64: 0.29959495316546103
    +MT19937: 0.35000413311606726

    Saving and Restoring State

    -
    +
    Code
    # Save state
    @@ -616,15 +616,15 @@ 

    Saving and Rest print("Restored:", rng.random(3))

    -
    Original: [0.8069469  0.67878297 0.48562037]
    -Restored: [0.8069469  0.67878297 0.48562037]
    +
    Original: [0.43138676 0.482716   0.89713589]
    +Restored: [0.43138676 0.482716   0.89713589]

    Spawning New Generators

    You can create independent generators from an existing one:

    -
    +
    Code
    child1, child2 = rng.spawn(2)
    @@ -632,22 +632,22 @@ 

    Spawning New Gener print("Child 2:", child2.random())

    -
    Child 1: 0.11547306758577636
    -Child 2: 0.8805861858890862
    +
    Child 1: 0.3963628299042784
    +Child 2: 0.7642472116582459

    Thread Safety and Jumping

    Generators are designed to be thread-safe and support “jumping” ahead in the sequence:

    -
    +
    Code
    rng = np.random.Generator(PCG64())
     rng.bit_generator.advance(1000)  # Jump ahead 1000 steps
    -
    <numpy.random._pcg64.PCG64 at 0x10dc90040>
    +
    <numpy.random._pcg64.PCG64 at 0x10f82c040>
    diff --git a/docs/course-materials/cheatsheets/read_csv.html b/docs/course-materials/cheatsheets/read_csv.html index 3418e6d..bc1649f 100644 --- a/docs/course-materials/cheatsheets/read_csv.html +++ b/docs/course-materials/cheatsheets/read_csv.html @@ -458,7 +458,7 @@

    On this page

    Basic Usage of pd.read_csv

    Reading a Simple CSV File

    -
    +
    Code
    import pandas as pd
    @@ -489,7 +489,7 @@ 

    Reading a Simple

    Selecting Specific Columns

    Using the usecols Parameter

    -
    +
    Code
    # Read only specific columns
    @@ -509,7 +509,7 @@ 

    Using the Naming Columns

    Using the names Parameter

    -
    +
    Code
    # Rename columns while reading
    @@ -530,7 +530,7 @@ 

    Using the

    Specifying an Index

    Using the index_col Parameter

    -
    +
    Code
    # Set 'name' column as index
    @@ -551,7 +551,7 @@ 

    Using the Parsing Dates

    Automatic Date Parsing

    -
    +
    Code
    csv_data_with_dates = """
    @@ -578,7 +578,7 @@ 

    Automatic Date Pars

    Custom Date Parsing

    -
    +
    Code
    csv_data_custom_dates = """
    @@ -608,7 +608,7 @@ 

    Custom Date ParsingHandling Headers

    CSV with Multi-line Header

    -
    +
    Code
    csv_data_with_header = """
    @@ -646,7 +646,7 @@ 

    CSV with Multi-

    CSV with No Header

    -
    +
    Code
    csv_data_no_header = """
    @@ -671,7 +671,7 @@ 

    CSV with No Header

    Dealing with Missing Data

    Customizing NA Values

    -
    +
    Code
    csv_data_missing = """
    @@ -697,7 +697,7 @@ 

    Customizing NA Value

    Coercing Columns to Specific Data Types

    Using the dtype Parameter

    -
    +
    Code
    csv_data_types = """
    @@ -728,7 +728,7 @@ 

    Using the

    Reading Large CSV Files

    Using chunksize for Memory Efficiency

    -
    +
    Code
    import numpy as np
    diff --git a/docs/course-materials/cheatsheets/seaborn.html b/docs/course-materials/cheatsheets/seaborn.html
    index 047040f..4ce29bb 100644
    --- a/docs/course-materials/cheatsheets/seaborn.html
    +++ b/docs/course-materials/cheatsheets/seaborn.html
    @@ -444,7 +444,7 @@ 

    Introduction to Se

    Setting up Seaborn

    To use Seaborn, you need to import it along with other necessary libraries:

    -
    +
    Code
    import seaborn as sns
    @@ -463,7 +463,7 @@ 

    1. Scatter Plots

    Useful for showing relationships between two continuous variables.

    -
    +
    Code
    # Load the tips dataset
    @@ -479,7 +479,7 @@ 

    1. Scatter Plots

    4 24.59 3.61 Female No Sun Dinner 4
    -
    +
    Code
    # Basic scatter plot
    @@ -496,7 +496,7 @@ 

    1. Scatter Plots

    -
    +
    Code
    # Add hue for a third variable
    @@ -518,7 +518,7 @@ 

    1. Scatter Plots

    2. Line Plots

    Ideal for time series data or showing trends.

    -
    +
    Code
    # Load the flights dataset
    @@ -534,7 +534,7 @@ 

    2. Line Plots

    4 1949 May 121
    -
    +
    Code
    # Basic line plot (uncertainty bounds calculated auto-magically by grouping rows containing the same year!)
    @@ -551,7 +551,7 @@ 

    2. Line Plots

    -
    +
    Code
    # Multiple lines with confidence intervals
    @@ -573,7 +573,7 @@ 

    2. Line Plots

    3. Bar Plots

    Great for comparing quantities across different categories.

    -
    +
    Code
    # Load the titanic dataset
    @@ -596,7 +596,7 @@ 

    3. Bar Plots

    4 man True NaN Southampton no True
    -
    +
    Code
    # Basic bar plot
    @@ -613,7 +613,7 @@ 

    3. Bar Plots

    -
    +
    Code
    # Grouped bar plot
    @@ -635,7 +635,7 @@ 

    3. Bar Plots

    4. Box Plots

    Useful for showing distribution of data across categories.

    -
    +
    Code
    # Basic box plot
    @@ -652,7 +652,7 @@ 

    4. Box Plots

    -
    +
    Code
    # Add individual data points
    @@ -674,7 +674,7 @@ 

    4. Box Plots

    5. Violin Plots

    Similar to box plots but show the full distribution of data.

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -694,7 +694,7 @@ 

    5. Violin Plots

    6. Heatmaps

    Excellent for visualizing correlation matrices or gridded data.

    -
    +
    Code
    # Load the penguins dataset
    @@ -717,7 +717,7 @@ 

    6. Heatmaps

    4 3450.0 Female
    -
    +
    Code
    # Correlation heatmap
    @@ -744,7 +744,7 @@ 

    Quick Data Overview

    Recall the structure of the penguins dataframe, which has a combination of measured and categorical values:

    -
    +
    Code
    print(penguins.head())
    @@ -766,7 +766,7 @@

    Quick Data Overview

    We can explore the distribution of every numerical variable as well as the pair-wise relationship between all the variables in a dataframe using pairplot and can use a categorical variable to further organize the data within each plot using the hue argument.

    -
    +
    Code
    # Get a quick overview of numerical variables
    @@ -782,7 +782,7 @@ 

    Quick Data Overview

    -
    +
    Code
    # Visualize distributions of all numerical variables
    @@ -801,7 +801,7 @@ 

    Quick Data Overview

    Exploring Relationships

    -
    +
    Code
    # Explore relationship between variables
    @@ -817,7 +817,7 @@ 

    Exploring Relation

    -
    +
    Code
    # Facet plots for multi-dimensional exploration
    @@ -838,7 +838,7 @@ 

    Exploring Relation

    Categorical Data Exploration

    -
    +
    Code
    # Compare distributions across categories
    @@ -854,7 +854,7 @@ 

    Categorical D

    -
    +
    Code
    # Count plots for categorical variables
    @@ -873,7 +873,7 @@ 

    Categorical D

    Time Series Exploration

    -
    +
    Code
    # Visualize trends over time
    diff --git a/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-10-output-1.png b/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-10-output-1.png
    index 41302ff..56d3ee5 100644
    Binary files a/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-10-output-1.png and b/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-10-output-1.png differ
    diff --git a/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-11-output-1.png b/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-11-output-1.png
    index db1074e..06a775f 100644
    Binary files a/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-11-output-1.png and b/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-11-output-1.png differ
    diff --git a/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-7-output-1.png b/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-7-output-1.png
    index 9229072..d1e1324 100644
    Binary files a/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-7-output-1.png and b/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-7-output-1.png differ
    diff --git a/docs/course-materials/cheatsheets/sets.html b/docs/course-materials/cheatsheets/sets.html
    index 923f4b7..e0497a5 100644
    --- a/docs/course-materials/cheatsheets/sets.html
    +++ b/docs/course-materials/cheatsheets/sets.html
    @@ -428,7 +428,7 @@ 

    On this page

    Creating Sets

    -
    +
    Code
    # Empty set
    @@ -452,7 +452,7 @@ 

    Creating Sets

    Basic Operations

    -
    +
    Code
    s = {1, 2, 3, 4, 5}
    @@ -495,7 +495,7 @@ 

    Basic Operations

    Set Methods

    -
    +
    Code
    a = {1, 2, 3}
    @@ -510,7 +510,7 @@ 

    Set Methods

    Union

    -
    +
    Code
    union_set = a.union(b)
    @@ -523,7 +523,7 @@ 

    Union

    Intersection

    -
    +
    Code
    intersection_set = a.intersection(b)
    @@ -536,7 +536,7 @@ 

    Intersection

    Difference

    -
    +
    Code
    difference_set = a.difference(b)
    @@ -549,7 +549,7 @@ 

    Difference

    Symmetric difference

    -
    +
    Code
    symmetric_difference_set = a.symmetric_difference(b)
    @@ -562,7 +562,7 @@ 

    Symmetric difference<

    Subset and superset

    -
    +
    Code
    is_subset = a.issubset(b)
    diff --git a/docs/course-materials/coding-colabs/3b_control_flows.html b/docs/course-materials/coding-colabs/3b_control_flows.html
    index e82dc40..649d62e 100644
    --- a/docs/course-materials/coding-colabs/3b_control_flows.html
    +++ b/docs/course-materials/coding-colabs/3b_control_flows.html
    @@ -470,7 +470,7 @@ 

    Task 1: Simpl
  • Otherwise, print “Enjoy the pleasant weather!”
  • -
    +
    temperature = 20
     
     # Your code here
    @@ -491,7 +491,7 @@ 

    Task 2: Grade Clas
  • Below 60: “F”
  • -
    +
    score = 85
     
     # Your code here
    @@ -508,7 +508,7 @@ 

    Task 3: Counting She
  • Use a for loop with the range() function
  • Print each number followed by “sheep”
  • -
    +
    # Your code here
     # Use a for loop to count sheep
    @@ -521,7 +521,7 @@

    Task 4: Sum of Numbe
  • Use a for loop with the range() function to add each number to total
  • After the loop, print the total
  • -
    +
    total = 0
     
     # Your code here
    @@ -540,7 +540,7 @@ 

    Task 5: Countdown

  • After each print, decrease the countdown by 1
  • When the countdown reaches 0, print “Blast off!”
  • -
    +
    countdown = 5
     
     # Your code here
    diff --git a/docs/course-materials/coding-colabs/3d_pandas_series.html b/docs/course-materials/coding-colabs/3d_pandas_series.html
    index c92fe65..1e2f16b 100644
    --- a/docs/course-materials/coding-colabs/3d_pandas_series.html
    +++ b/docs/course-materials/coding-colabs/3d_pandas_series.html
    @@ -440,7 +440,7 @@ 

    Resources

    Setup

    First, let’s import the necessary libraries and create a sample Series.

    -
    +
    import pandas as pd
     import numpy as np
     
    @@ -460,7 +460,7 @@ 

    Setup

    Exercise 1: Creating a Series

    Work together to create a Series representing the prices of the fruits in our fruits Series.

    -
    +
    # Your code here
     # Create a Series called 'prices' with the same index as 'fruits'
     # Use these prices: apple: $0.5, banana: $0.3, cherry: $1.0, date: $1.5, elderberry: $2.0
    @@ -474,7 +474,7 @@

    Exercise 2: S
  • Find the most expensive fruit.
  • Apply a 10% discount to all fruits priced over $1.0.
  • -
    +
    # Your code here
     # 1. Calculate the total price of all fruits
     # 2. Find the most expensive fruit
    @@ -489,7 +489,7 @@ 

    Exercise 3: Ser
  • How many fruits cost less than $1.0?
  • What is the price range (difference between max and min prices)?
  • -
    +
    # Your code here
     # 1. Calculate the average price of the fruits
     # 2. Count how many fruits cost less than $1.0
    @@ -504,7 +504,7 @@ 

    Exercise 4:
  • Remove ‘banana’ from both Series.
  • Sort both Series by fruit name (alphabetically).
  • -
    +
    # Your code here
     # 1. Add 'fig' to both Series (price: $1.2)
     # 2. Remove 'banana' from both Series
    diff --git a/docs/course-materials/coding-colabs/4b_pandas_dataframes.html b/docs/course-materials/coding-colabs/4b_pandas_dataframes.html
    index eb218d2..4e49b07 100644
    --- a/docs/course-materials/coding-colabs/4b_pandas_dataframes.html
    +++ b/docs/course-materials/coding-colabs/4b_pandas_dataframes.html
    @@ -436,7 +436,7 @@ 

    Introduction

    Setup

    First, let’s import the necessary libraries and load our dataset.

    -
    +
    Code
    import pandas as pd
    diff --git a/docs/course-materials/coding-colabs/5c_cleaning_data.html b/docs/course-materials/coding-colabs/5c_cleaning_data.html
    index 9f81bfe..bcd144e 100644
    --- a/docs/course-materials/coding-colabs/5c_cleaning_data.html
    +++ b/docs/course-materials/coding-colabs/5c_cleaning_data.html
    @@ -435,7 +435,7 @@ 

    Resources

    Setup

    First, let’s import the necessary libraries and load an example messy dataframe.

    -
    +
    import pandas as pd
     import numpy as np
     
    diff --git a/docs/course-materials/coding-colabs/6b_advanced_data_manipulation.html b/docs/course-materials/coding-colabs/6b_advanced_data_manipulation.html
    index 2a86f72..b4a9a6e 100644
    --- a/docs/course-materials/coding-colabs/6b_advanced_data_manipulation.html
    +++ b/docs/course-materials/coding-colabs/6b_advanced_data_manipulation.html
    @@ -440,7 +440,7 @@ 

    Learning Objectives

    Setup

    Let’s start by importing necessary libraries and loading our datasets:

    -
    +
    Code
    import pandas as pd
    diff --git a/docs/course-materials/coding-colabs/7c_visualizations.html b/docs/course-materials/coding-colabs/7c_visualizations.html
    index 979d778..4dde60b 100644
    --- a/docs/course-materials/coding-colabs/7c_visualizations.html
    +++ b/docs/course-materials/coding-colabs/7c_visualizations.html
    @@ -437,7 +437,7 @@ 

    Introduction

    Setup

    First, let’s import the necessary libraries and load our dataset.

    -
    +
    Code
    import pandas as pd
    diff --git a/docs/course-materials/eod-practice/eod-day2.html b/docs/course-materials/eod-practice/eod-day2.html
    index 9f3074f..f30d8bb 100644
    --- a/docs/course-materials/eod-practice/eod-day2.html
    +++ b/docs/course-materials/eod-practice/eod-day2.html
    @@ -459,7 +459,7 @@ 

    Learning Objectives

    Setup

    First, let’s import the necessary libraries:

    -
    +
    Code
    # We won't use the random library until the end of this exercise, 
    @@ -474,7 +474,7 @@ 

    Part 1: Data Collec

    Task 1: Create a List of Classmates

    Create a list containing the names of at least 4 of your classmates in this course.

    -
    +
    Code
    # Your code here
    @@ -489,7 +489,7 @@

    +
    Code
    # Your code here
    @@ -509,7 +509,7 @@

    Task 3: List Operat
  • Sort the list alphabetically
  • Find and print the index of a specific classmate
  • -
    +
    Code
    # Your code here
    @@ -524,7 +524,7 @@

    Task 4: Dicti
  • Update the “number of pets” for one classmate
  • Create a list of all the favorite colors your classmates mentioned
  • -
    +
    Code
    # Your code here
    @@ -538,7 +538,7 @@

    Part 3:

    Example: Random Selection from a Dictionary

    Here’s a simple example of how to select random items from a dictionary:

    -
    +
    Code
    import random
    @@ -567,10 +567,10 @@ 

    print(f"Randomly selected {num_selections} fruits: {random_fruits}")

    -
    Randomly selected fruit: grape
    -Its color: purple
    -Another randomly selected fruit: apple
    -Randomly selected 3 fruits: ['kiwi', 'grape', 'apple']
    +
    Randomly selected fruit: orange
    +Its color: orange
    +Another randomly selected fruit: orange
    +Randomly selected 3 fruits: ['kiwi', 'apple', 'banana']

    This example demonstrates how to:

    diff --git a/docs/course-materials/eod-practice/eod-day3.html b/docs/course-materials/eod-practice/eod-day3.html index f675f9e..7791191 100644 --- a/docs/course-materials/eod-practice/eod-day3.html +++ b/docs/course-materials/eod-practice/eod-day3.html @@ -447,7 +447,7 @@

    Introduction

    Setup

    First, let’s import the necessary libraries and set up our environment.

    -
    +
    Code
    import pandas as pd
    @@ -461,7 +461,7 @@ 

    Creating a Random Number Generator

    We can create a random number generator object like this:

    -
    +
    Code
    rng = np.random.default_rng()
    @@ -472,7 +472,7 @@

    Creatin

    Using a Seed for Reproducibility

    In data science, it’s often crucial to be able to reproduce our results. We can do this by setting a seed for our random number generator. Here’s how:

    -
    +
    Code
    rng = np.random.default_rng(seed=42)
    @@ -511,7 +511,7 @@

    Exploring Reproducibility

    To demonstrate the importance of seeding, try creating two series with different random number generators:

    -
    +
    Code
    rng1 = np.random.default_rng(seed=42)
    @@ -527,7 +527,7 @@ 

    Exploring Reprod

    Now try creating two series with random number generators that have different seeds:

    -
    +
    Code
    rng3 = np.random.default_rng(seed=42)
    diff --git a/docs/course-materials/eod-practice/eod-day4.html b/docs/course-materials/eod-practice/eod-day4.html
    index 99c54a9..509cca3 100644
    --- a/docs/course-materials/eod-practice/eod-day4.html
    +++ b/docs/course-materials/eod-practice/eod-day4.html
    @@ -455,7 +455,7 @@ 

    Introduction

    This end-of-day session is focused on using pandas for loading, visualizing, and analyzing marine microplastics data. This session is designed to help you become more comfortable with the pandas library, equipping you with the skills needed to perform data analysis effectively.

    The National Oceanic and Atmospheric Administration, via its National Centers for Environmental Information has an entire section related to marine microplastics – that is, microplastics found in water — at https://www.ncei.noaa.gov/products/microplastics.

    We will be working with a recent download of the entire marine microplastics dataset. The url for this data is located here:

    -
    +
    Code
    url = 'https://ucsb.box.com/shared/static/dnnu59jsnkymup6o8aaovdywrtxiy3a9.csv'
    diff --git a/docs/course-materials/eod-practice/eod-day5.html b/docs/course-materials/eod-practice/eod-day5.html index 18e0233..4836aba 100644 --- a/docs/course-materials/eod-practice/eod-day5.html +++ b/docs/course-materials/eod-practice/eod-day5.html @@ -447,7 +447,7 @@

    Reference:

    Setup

    First, let’s import the necessary libraries and load the data:

    -
    +
    Code
    import pandas as pd
    @@ -459,7 +459,7 @@ 

    Setup

    df = pd.read_csv(url)
    -
    +
    Code
    # Display the first few rows:
    @@ -502,7 +502,7 @@ 

    Setup

    4 3.775280 True 1 NaN NaN
    -
    +
    Code
    # Display the dataframe info:
    @@ -553,7 +553,7 @@ 

    2. Exploring Banan
  • For each of the pre-computed banana score columns (kg, calories, and protein), show the 10 highest-scoring food products.

  • Edit the function below so that is returns the top 10 scores for a given column:

  • -
    +
    Code
    def return_top_ten(df, column):
    diff --git a/docs/course-materials/eod-practice/eod-day6.html b/docs/course-materials/eod-practice/eod-day6.html
    index 41f95fb..807a6fc 100644
    --- a/docs/course-materials/eod-practice/eod-day6.html
    +++ b/docs/course-materials/eod-practice/eod-day6.html
    @@ -430,7 +430,7 @@ 

    On this page

    Setup

    First, import the necessary libraries and load the dataset:

    -
    +
    Code
    import pandas as pd
    @@ -532,7 +532,7 @@ 

    Task 5: Joining Data
  • Read in a new dataframe that contains population data stored at this url:
  • -
    +
    Code
    population_url = 'https://bit.ly/euro_pop'
    diff --git a/docs/course-materials/final_project.html b/docs/course-materials/final_project.html index 0addc86..fc7eb49 100644 --- a/docs/course-materials/final_project.html +++ b/docs/course-materials/final_project.html @@ -508,7 +508,7 @@

    +
    Code
    import pandas as pd
    diff --git a/docs/course-materials/interactive-sessions/1b_Jupyter_Notebooks.html b/docs/course-materials/interactive-sessions/1b_Jupyter_Notebooks.html
    index cdd867d..c27a76c 100644
    --- a/docs/course-materials/interactive-sessions/1b_Jupyter_Notebooks.html
    +++ b/docs/course-materials/interactive-sessions/1b_Jupyter_Notebooks.html
    @@ -572,7 +572,7 @@ 

    Rendering Images

    Jupyter Notebooks can render images directly in the output cells, which is particularly useful for data visualization.

    Example: Displaying an Image

    -
    +
    Code
    from IPython.display import Image, display
    @@ -593,7 +593,7 @@ 

    Interactive Features<

    Example: Using Interactive Widgets

    Widgets allow users to interact with your code and visualize results dynamically.

    -
    +
    Code
    import ipywidgets as widgets
    @@ -604,7 +604,7 @@ 

    Example:

    @@ -698,7 +698,7 @@

    6. Google Colab