diff --git a/course-materials/day8.qmd b/course-materials/day8.qmd index 1a5f63c..45adf44 100644 --- a/course-materials/day8.qmd +++ b/course-materials/day8.qmd @@ -21,6 +21,10 @@ description: "Thursday September 12^th^, 2024" : {.hover .bordered tbl-colwidths="[20,40,40]"} +## Syncing your classwork to Github + +[Here](interactive-sessions/8a_github.qmd) are some directions for syncing your classwork with a GitHub repository + ## End-of-day practice There are no additional end-of-day tasks / activities today! diff --git a/course-materials/day9.qmd b/course-materials/day9.qmd index c95738d..5f34007 100644 --- a/course-materials/day9.qmd +++ b/course-materials/day9.qmd @@ -20,6 +20,11 @@ description: "Friday September 13^th^, 2024" | day 9 / afternoon | Data Science Project Presentations
(all afternoon) | | : {.hover .bordered tbl-colwidths="[20,40,40]"} +## Syncing your classwork to Github + +[Here](interactive-sessions/8a_github.qmd) are some directions for syncing your classwork with a GitHub repository + + ## End-of-day practice End of Class! Congratulations!! diff --git a/course-materials/final_project.qmd b/course-materials/final_project.qmd index f841319..e6af52b 100644 --- a/course-materials/final_project.qmd +++ b/course-materials/final_project.qmd @@ -37,11 +37,15 @@ You can include visualizations as part of your data exploration (step 2), or any Additional figures and graphics are also welcome - you are encouraged to make your notebooks as engaging and visually interesting as possible. -Here are some links to potential data resources that you can use to develop your analyses: +## Syncing your data to Github + +[Here](interactive-sessions/8a_github.qmd) are some directions for syncing your classwork with GitHub ### General places to find fun data +Here are some links to potential data resources that you can use to develop your analyses: + - [Kaggle](https://www.kaggle.com/datasets?fileType=csv) - [Data is Plural](https://www.data-is-plural.com/) - [US Data.gov](https://data.gov/) diff --git a/course-materials/interactive-sessions/8a_github.qmd b/course-materials/interactive-sessions/8a_github.qmd new file mode 100644 index 0000000..0e98629 --- /dev/null +++ b/course-materials/interactive-sessions/8a_github.qmd @@ -0,0 +1,147 @@ +--- +title: "Interactive Session" +subtitle: "Creating a GitHub Repository for Your EDS 217 Course Work" +jupyter: eds217_2024 +format: + html: + toc: true + toc-depth: 3 + code-fold: show +--- + +## Introduction + +This session contains detailed instructions for creating a new GitHub repository and pushing your EDS 217 course work to it. It also covers how to clean your Jupyter notebooks before committing them to ensure your repository is clean and professional. + +:::{.callout-warning} +Jupyter notebooks files can be hard to use in github because they contain information about code execution order and output in the file. For this reason, you should clean notebooks before pushing them to a github repo. We will clean them using the `nbstripout` python package +::: + +## Steps to Setting up a GitHub repo for your coursework: + +1. Create a new repository on GitHub +2. Initialize a local Git repository +3. Clean Jupyter notebooks of output and execution data +4. Add, commit, and push files to a GitHub repository + +## Creating a New GitHub Repository + +Let's start by creating a new repository on GitHub: + +1. Log in to your GitHub account +2. Click the '+' icon in the top-right corner and select 'New repository' +3. Name your repository (e.g., "EDS-217-Course-Work") +4. Add a description (optional) +5. Choose to make the repository public or private +6. Don't initialize the repository with a README, .gitignore, or license +7. Click 'Create repository' + +After creating the repository, you'll see a page with instructions. We'll use these in the next steps. + +## Initializing a Local Git Repository + +Now, let's set up your local directory as a Git repository: + + +1. Open a terminal on the class `workbench` server +2. Navigate to your course work directory: + +```{python} +#| echo: true +#| eval: false + +cd path/to/your/EDS-217 +``` + +3. Initialize the repository: + +```{python} +#| echo: true +#| eval: false + +git init +``` + +4. Add your GitHub repository as the remote origin: + +```{python} +#| echo: true +#| eval: false + +git remote add origin https://github.com/your-username/EDS-217-Course-Work.git +``` + +Replace `your-username` with your actual GitHub username. + +## Cleaning Jupyter Notebooks + +Before we commit our notebooks, let's clean them to remove output cells and execution data: + +1. Install the `nbstripout` tool if you haven't already: + +```{python} +#| echo: true +#| eval: false + +pip install nbstripout +``` + +2. Configure `nbstripout` for your repository: + +```{python} +#| echo: true +#| eval: false + +nbstripout --install --attributes .gitattributes +``` + +This sets up `nbstripout` to automatically clean your notebooks when you commit them. + +## Adding, Committing, and Pushing Files + +Now we're ready to add our files to the repository: + +1. Add all files in the directory: + +```{python} +#| echo: true +#| eval: false + +git add . +``` + +2. Commit the files: + +```{python} +#| echo: true +#| eval: false + +git commit -m "Initial commit: Adding EDS 217 course work" +``` + +3. Push the files to GitHub: + +```{python} +#| echo: true +#| eval: false + +git push -u origin main +``` + +Note: If your default branch is named "master" instead of "main", use `git push -u origin master`. + +## Verifying Your Repository + +1. Go to your GitHub repository page in your web browser +2. Refresh the page +3. You should now see all your course files listed in the repository + +## Conclusion + +Congratulations! You've successfully created a GitHub repository for your EDS 217 course work, cleaned your Jupyter notebooks, and pushed your files to GitHub. This process helps you maintain a clean, professional repository of your work that you can easily share or refer back to in the future. + +## Additional Resources + +- [GitHub Docs: Creating a new repository](https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-new-repository) +- [Git Documentation](https://git-scm.com/doc) +- [nbstripout Documentation](https://github.com/kynan/nbstripout) diff --git a/docs/course-materials/answer-keys/2c_lists_dictionaries_sets-key.html b/docs/course-materials/answer-keys/2c_lists_dictionaries_sets-key.html index 11e7c1f..bf8fa11 100644 --- a/docs/course-materials/answer-keys/2c_lists_dictionaries_sets-key.html +++ b/docs/course-materials/answer-keys/2c_lists_dictionaries_sets-key.html @@ -483,7 +483,7 @@

Task 1: List Operat
  • Remove the second fruit from the list.
  • Print the final list.
  • -
    +
    # Example code for instructor
     fruits = ["apple", "banana", "cherry"]
     print("Original list:", fruits)
    @@ -512,7 +512,7 @@ 

    Task 2: Dicti
  • Update the quantity of an existing item.
  • Print the final inventory.
  • -
    +
    # Example code for instructor
     inventory = {
         "apples": 50,
    @@ -549,7 +549,7 @@ 

    Task
  • Find and print the intersection of the two sets.
  • Add a new element to the evens set.
  • -
    +
    # Example code for instructor
     evens = {2, 4, 6, 8, 10}
     odds = {1, 3, 5, 7, 9}
    @@ -583,7 +583,7 @@ 

  • Use a list comprehension to remove duplicates.
  • Print the results of both methods.
  • -
    +
    # Example code for instructor
     numbers = [1, 2, 2, 3, 3, 3, 4, 4, 5]
     
    diff --git a/docs/course-materials/answer-keys/3b_control_flows-key.html b/docs/course-materials/answer-keys/3b_control_flows-key.html
    index 81bd3eb..5de93ae 100644
    --- a/docs/course-materials/answer-keys/3b_control_flows-key.html
    +++ b/docs/course-materials/answer-keys/3b_control_flows-key.html
    @@ -470,10 +470,10 @@ 

    Task 1: Simpl
  • Otherwise, print “Enjoy the pleasant weather!”
  • -
    +
    temperature = 20
    -
    +
    if temperature > 25:
         print("It's a hot day, stay hydrated!")
     else:
    @@ -497,10 +497,10 @@ 

    Task 2: Grade Clas
  • Below 60: “F”
  • -
    +
    score = 85
    -
    +
    if score >= 90:
         grade = 'A'
     elif score >= 80:
    @@ -528,7 +528,7 @@ 

    Task 3: Counting She
  • Use a for loop with the range() function
  • Print each number followed by “sheep”
  • -
    +
    for i in range(1,6):
         print(f"{i} sheep")
    @@ -548,10 +548,10 @@

    Task 4: Sum of Numbe
  • Use a for loop with the range() function to add each number to total
  • After the loop, print the total
  • -
    +
    total = 0
    -
    +
    for i in range(1,11):
         total = total + i
     
    @@ -573,10 +573,10 @@ 

    Task 5: Countdown

  • After each print, decrease the countdown by 1
  • When the countdown reaches 0, print “Blast off!”
  • -
    +
    countdown = 5
    -
    +
    while countdown > 0:
         print(countdown)
         # (-= is a python syntax shortcut inherited from C)
    diff --git a/docs/course-materials/answer-keys/3d_pandas_series-key.html b/docs/course-materials/answer-keys/3d_pandas_series-key.html
    index ff2f163..a092e0c 100644
    --- a/docs/course-materials/answer-keys/3d_pandas_series-key.html
    +++ b/docs/course-materials/answer-keys/3d_pandas_series-key.html
    @@ -440,7 +440,7 @@ 

    Resources

    Setup

    First, let’s import the necessary libraries and create a sample Series.

    -
    +
    import pandas as pd
     import numpy as np
     
    @@ -463,7 +463,7 @@ 

    Exercise 1: C

    apple: $0.5, banana: $0.3, cherry: $1.0, date: $1.5, elderberry: $2.0

    -
    +
    # Create a Series called 'prices' with the same index as 'fruits'
     # Use these prices: apple: $0.5, banana: $0.3, cherry: $1.0, date: $1.5, elderberry: $2.0
     prices = pd.Series([0.5, 0.3, 1.0, 2.5, 3.0], index=fruits.values, name='Prices')
    @@ -486,7 +486,7 @@ 

    Exercise 2: S
  • Find the most expensive fruit.
  • Apply a 10% discount to all fruits priced over 1.0.
  • -
    +
    # 1. Calculate the total price of all fruits
     total_price = prices.sum()
     
    @@ -522,7 +522,7 @@ 

    Exercise 3: Ser
  • How many fruits cost less than $1.0?
  • What is the price range (difference between max and min prices)?
  • -
    +
    # 1. Calculate the average price of the fruits
     average_price = prices.mean()
     
    @@ -550,7 +550,7 @@ 

    Exercise 4:
  • Remove ‘banana’ from both Series.
  • Sort both Series by fruit name (alphabetically).
  • -
    +
    # 1. Add 'fig' to both Series (price: $1.2)
     fruits = pd.concat([fruits, pd.Series(['fig'], name='Fruits')])
     prices = pd.concat([prices, pd.Series([1.2], index=['fig'], name='Prices')])
    diff --git a/docs/course-materials/answer-keys/5c_cleaning_data-key.html b/docs/course-materials/answer-keys/5c_cleaning_data-key.html
    index dfe2b9c..9821089 100644
    --- a/docs/course-materials/answer-keys/5c_cleaning_data-key.html
    +++ b/docs/course-materials/answer-keys/5c_cleaning_data-key.html
    @@ -435,7 +435,7 @@ 

    Resources

    Setup

    First, let’s import the necessary libraries and load an example messy dataframe.

    -
    +
    import pandas as pd
     import numpy as np
     
    @@ -450,36 +450,36 @@ 

  • Removing duplicates
  • -
    +
    messy_df.drop_duplicates(inplace=True)
    1. Handling missing values (either fill or dropna to remove rows with missing data)
    -
    +
    messy_df = messy_df.dropna()
    1. Ensuring consistent data types (dates, strings)
    -
    +
    messy_df['site'] = messy_df['site'].astype('string')
     messy_df['collection date'] = pd.to_datetime(messy_df['collection date'])
    1. Formatting the ‘site’ column for consistency
    -
    +
    messy_df['site'] = messy_df['site'].str.lower().replace('sitec','site_c')
    1. Making sure all column names are lower case, without whitespace.
    -
    +
    messy_df.rename(columns={'collection date': 'collection_date'}, inplace=True)

    Try to implement these steps using the techniques we’ve learned.

    -
    +
    cleaned_df = messy_df.copy()
     
     print("Cleaned DataFrame:")
    diff --git a/docs/course-materials/answer-keys/7c_visualizations-key.html b/docs/course-materials/answer-keys/7c_visualizations-key.html
    index a7a1e2b..72df041 100644
    --- a/docs/course-materials/answer-keys/7c_visualizations-key.html
    +++ b/docs/course-materials/answer-keys/7c_visualizations-key.html
    @@ -437,7 +437,7 @@ 

    Introduction

    Setup

    First, let’s import the necessary libraries and load our dataset.

    -
    +
    Code
    import pandas as pd
    @@ -495,7 +495,7 @@ 

    +
    Code
    # Answer for Task 1
    @@ -536,7 +536,7 @@ 

    Task 2: Exam
  • Modify the pairplot to show the species information using different colors.
  • Interpret the pairplot: which variables seem to be most strongly correlated? Do you notice any patterns related to species?
  • -
    +
    Code
    # Answer for Task 2
    @@ -572,7 +572,7 @@ 

    +
    Code
    # Answer for Task 3
    @@ -621,7 +621,7 @@ 

    Task 4: Jo
  • Experiment with different kind parameters in the joint plot (e.g., ‘scatter’, ‘kde’, ‘hex’).
  • Create another joint plot, this time for ‘bill_length_mm’ and ‘bill_depth_mm’, colored by species.
  • -
    +
    Code
    # Answer for Task 4
    @@ -696,7 +696,7 @@ 

    Bonus Challenge

  • Customize the heatmap by adding annotations and adjusting the colormap.
  • Compare the insights from this heatmap with those from the pairplot. What additional information does each visualization provide?
  • -
    +
    Code
    # Answer for Bonus Challenge
    diff --git a/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-1.png b/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-1.png
    index c095130..d5a28f6 100644
    Binary files a/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-1.png and b/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-1.png differ
    diff --git a/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-2.png b/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-2.png
    index f21785e..2aeee46 100644
    Binary files a/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-2.png and b/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-2.png differ
    diff --git a/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-3.png b/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-3.png
    index 1eea260..3e3232d 100644
    Binary files a/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-3.png and b/docs/course-materials/answer-keys/7c_visualizations-key_files/figure-html/cell-5-output-3.png differ
    diff --git a/docs/course-materials/answer-keys/eod-day1-key.html b/docs/course-materials/answer-keys/eod-day1-key.html
    index 976bb04..792dbc5 100644
    --- a/docs/course-materials/answer-keys/eod-day1-key.html
    +++ b/docs/course-materials/answer-keys/eod-day1-key.html
    @@ -441,7 +441,7 @@ 

    Instructions

  • Import the necessary libraries to work with data (pandas) and create plots (matplotlib.pyplot). Use the standard python conventions that import pandas as pd and import matplotlib.pyplot as plt
  • -
    +
    import pandas as pd
     import matplotlib.pyplot as plt
    @@ -454,7 +454,7 @@

    Instructions

  • Create a variable called url that stores the URL provided above.
  • Use the pandas library’s read_csv() function from pandas to load the data from the URL into a new DataFrame called df. Any pandas function will always be called using the pd object and dot notation: pd.read_csv()
  • -
    +
    url = 'https://raw.githubusercontent.com/environmental-data-science/eds217-day0-comp/main/data/raw_data/toolik_weather.csv'
     df = pd.read_csv(url)
    @@ -467,7 +467,7 @@

    Instructions

    Note: Because the head() function is a method of a DataFrame, you will call it using dot notation and the dataframe you just created: df.head()

    -
    +
    df.head()
    @@ -635,7 +635,7 @@

    Instructions

  • Use the isnull() method combined with sum() to count missing values in each column.
  • -
    +
    df.isnull().sum()
    Year                                   0
    @@ -671,7 +671,7 @@ 

    Instructions

  • Use the info() method to get an overview of the DataFrame, including data types and non-null counts. Just like the head() function, these are methods associated with your df object, so you call them with dot notation.
  • -
    +
    df.describe()
     df.info()
    @@ -712,7 +712,7 @@

    Instructions

    - Choose a strategy to handle missing data in the columns. For example, fill missing values with the mean of the column using the `fillna()` method or drop rows with missing data using the `dropna()` method. -::: {#a87d2b0b .cell execution_count=6} +::: {#85e9c032 .cell execution_count=6} ``` {.python .cell-code} df['Daily_AirTemp_Mean_C'].fillna(df['Daily_AirTemp_Mean_C'].mean(), inplace=True) df.dropna(subset=['Daily_globalrad_total_jcm2'], inplace=True) @@ -720,7 +720,7 @@

    Instructions

    ::: {.cell-output .cell-output-stderr} ``` -/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_61613/1318736512.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. +/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85158/1318736512.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. @@ -741,7 +741,7 @@

    Instructions

  • Calculate the mean of the ‘Daily_AirTemp_Mean_C’ column for each month in the monthly using the mean() function. Save this result to a new variable called monthly_means.
  • -
    +
    monthly = df.groupby('Month')
     monthly_means = monthly['Daily_AirTemp_Mean_C'].mean()
    @@ -755,7 +755,7 @@

    Instructions

    Syntax Similarity: Use plt.plot() or plot.bar() to create plots. In R, you would use ggplot().

    -
    +
    plt.plot(monthly_means)
    @@ -765,7 +765,7 @@

    Instructions

    -
    +
    months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
     plt.bar(months, monthly_means)
    @@ -784,7 +784,7 @@

    Instructions

    Hint: Similar to calculating monthly averages, group by the ‘Year’ column.

    -
    +
    year = df.groupby('Year')
     yearly_means = year['Daily_AirTemp_Mean_C'].mean()
     plt.plot(yearly_means)
    @@ -796,7 +796,7 @@

    Instructions

    -
    +
    year_list = df['Year'].unique()
     plt.bar(year_list, yearly_means)
    diff --git a/docs/course-materials/answer-keys/eod-day2-key.html b/docs/course-materials/answer-keys/eod-day2-key.html index f5c36d4..ea4b6e1 100644 --- a/docs/course-materials/answer-keys/eod-day2-key.html +++ b/docs/course-materials/answer-keys/eod-day2-key.html @@ -464,7 +464,7 @@

    Learning Objectives

    Setup

    First, let’s import the necessary libraries:

    -
    +
    Code
    # We won't use the random library until the end of this exercise, 
    @@ -480,7 +480,7 @@ 

    Part 1: Data Collec

    Task 1: Create a List of Classmates

    Create a list containing the names of at least 4 of your classmates in this course.

    -
    +
    Code
    classmates = ["Alice", "Bob", "Charlie", "David", "Eve"]
    @@ -500,7 +500,7 @@ 

    +
    Code
    classmate_info = {
    @@ -549,7 +549,7 @@ 

    Task 3: List Operat
  • Sort the list alphabetically
  • Find and print the index of a specific classmate
  • -
    +
    Code
    # a) Add a new classmate
    @@ -584,7 +584,7 @@ 

    Task 4: Dicti
  • Update the “number of pets” for one classmate
  • Create a list of all the favorite colors your classmates mentioned
  • -
    +
    Code
    # a) Add favorite_study_spot
    @@ -643,7 +643,7 @@ 

    Task 5: Basic Stat
  • The average number of pets among your classmates
  • The name of the classmate who got the most sleep last night
  • -
    +
    Code
    # a) Average number of pets
    @@ -663,7 +663,7 @@ 

    Task 5: Basic Stat

    Task 6: Data Filtering

    Create a new list containing only the classmates who have at least one pet.

    -
    +
    Code
    classmates_with_pets = [name for name, info in classmate_info.items() if info["number_of_pets"] > 0]
    @@ -681,7 +681,7 @@ 

    Part 4:

    Example: Random Selection from a Dictionary

    Here’s a simple example of how to select random items from a dictionary:

    -
    +
    Code
    import random
    @@ -710,10 +710,10 @@ 

    print(f"Randomly selected {num_selections} fruits: {random_fruits}")

    -
    Randomly selected fruit: banana
    -Its color: yellow
    +
    Randomly selected fruit: grape
    +Its color: purple
     Another randomly selected fruit: kiwi
    -Randomly selected 3 fruits: ['grape', 'apple', 'kiwi']
    +Randomly selected 3 fruits: ['orange', 'apple', 'banana']

    This example demonstrates how to:

    @@ -734,7 +734,7 @@

    Task 7: Random # Test your function assign_random_snacks(classmate_info)

    -
    +
    Code
    def assign_random_snacks(classmate_info):
    @@ -746,7 +746,7 @@ 

    Task 7: Random assign_random_snacks(classmate_info)

    -
    Alice will share almonds with Bob
    +
    Alice will share almonds with Charlie
    diff --git a/docs/course-materials/answer-keys/eod-day3-key.html b/docs/course-materials/answer-keys/eod-day3-key.html index 2fc130c..350fb3b 100644 --- a/docs/course-materials/answer-keys/eod-day3-key.html +++ b/docs/course-materials/answer-keys/eod-day3-key.html @@ -447,7 +447,7 @@

    Introduction

    Setup

    First, let’s import the necessary libraries and set up our environment.

    -
    +
    Code
    import pandas as pd
    @@ -461,7 +461,7 @@ 

    Creating a Random Number Generator

    We can create a random number generator object like this:

    -
    +
    Code
    rng = np.random.default_rng()
    @@ -472,7 +472,7 @@

    Creatin

    Using a Seed for Reproducibility

    In data science, it’s often crucial to be able to reproduce our results. We can do this by setting a seed for our random number generator. Here’s how:

    -
    +
    Code
    rng = np.random.default_rng(seed=42)
    @@ -487,7 +487,7 @@

    Creating t
  • Create a series called scores that contains 10 elements representing monthly test scores. We’ll use random integers between 70 and 100 to generate the monthly scores, and set the index to be the month names from September to June:
  • months = ['Sep', 'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
    -
    +
    Code
    # Create the month list:
    @@ -505,7 +505,7 @@ 

    Analyzing the Te

    1. What is the student’s average test score for the entire year?

    Calculate the mean of all scores in the series.

    -
    +
    Code
    # 1. Average score for the entire year
    @@ -520,7 +520,7 @@ 

    2. What is the student’s average test score during the first half of the year?

    Calculate the mean of the first five months’ scores.

    -
    +
    Code
    # 2. Average score for the first half of the year
    @@ -537,7 +537,7 @@ 

    3. What is the student’s average test score during the second half of the year?

    Calculate the mean of the last five months’ scores.

    -
    +
    Code
    second_half_average = scores.iloc[5:].mean()
    @@ -553,7 +553,7 @@ 

    4. Did the student improve their performance in the second half? If so, by how much?

    Compare the average scores from the first and second half of the year.

    -
    +
    Code
    # 4. Performance improvement
    @@ -572,7 +572,7 @@ 

    Exploring Reproducibility

    To demonstrate the importance of seeding, try creating two series with different random number generators:

    -
    +
    Code
    rng1 = np.random.default_rng(seed=42)
    @@ -588,7 +588,7 @@ 

    Exploring Reprod

    Now try creating two series with random number generators that have different seeds:

    -
    +
    Code
    rng3 = np.random.default_rng(seed=42)
    diff --git a/docs/course-materials/answer-keys/eod-day4-key.html b/docs/course-materials/answer-keys/eod-day4-key.html
    index 425cfd2..8c63702 100644
    --- a/docs/course-materials/answer-keys/eod-day4-key.html
    +++ b/docs/course-materials/answer-keys/eod-day4-key.html
    @@ -455,7 +455,7 @@ 

    Introduction

    This end-of-day session is focused on using pandas for loading, visualizing, and analyzing marine microplastics data. This session is designed to help you become more comfortable with the pandas library, equipping you with the skills needed to perform data analysis effectively.

    The National Oceanic and Atmospheric Administration, via its National Centers for Environmental Information has an entire section related to marine microplastics – that is, microplastics found in water — at https://www.ncei.noaa.gov/products/microplastics.

    We will be working with a recent download of the entire marine microplastics dataset. The url for this data is located here:

    -
    +
    Code
    url = 'https://ucsb.box.com/shared/static/dnnu59jsnkymup6o8aaovdywrtxiy3a9.csv'
    @@ -468,7 +468,7 @@

    1. Loading the Data

    Objective: Learn to load data into a pandas DataFrame and display the first few records.

    Task 1.1 Import the pandas library.

    -
    +
    Code
    import pandas as pd
    @@ -490,7 +490,7 @@

    +
    Code
    df = pd.read_csv(url, parse_dates=['Date'], date_format='%m/%d/%Y %I:%M:%S %p')
    @@ -502,7 +502,7 @@

    Task 1.3:

    • Display the first five rows of the DataFrame to get an initial understanding of the data structure.
    -
    +
    Code
    print(df.head())
    @@ -562,7 +562,7 @@

    Task 2.1:

    • Display summary statistics of the dataset to understand the central tendency and variability.
    -
    +
    Code
    summary_statistics = df.describe()
    @@ -614,7 +614,7 @@ 

    Task 2.2:

    Note that the results of the built-in function - df['column'].isnull() need to be wrapped in ( ) for the ~ operator to work properly.

    -
    +
    Code
    print("DataFrame info:",df.info())
    @@ -668,7 +668,7 @@ 

    Task 3.1:

    • Create a groupby object called oceans that groups the data in df according to the value of the Oceans column.
    -
    +
    Code
    oceans = df.groupby(['Oceans'])
    @@ -680,7 +680,7 @@

    Task 3.2:

    • Determine the total number of Measurements taken from each Ocean.
    -
    +
    Code
    print(oceans['Measurement'].count())
    @@ -700,7 +700,7 @@

    Task 3.3:

    • Determine the average value of Measurement taken from each Ocean.
    -
    +
    Code
    print(oceans['Measurement'].mean())
    @@ -723,7 +723,7 @@

    Task 4.1:

    • Filter the data to a new df (called df2) that only contains rows where the Unit of measurement is pieces/m3
    -
    +
    Code
    df2 = df[df['Unit'] == 'pieces/m3']
    @@ -735,7 +735,7 @@

    Task 4.2:

    • Use the groupby and the max() command to determine the Maximum value of pieces/m3 measured for each Ocean
    -
    +
    Code
    # Instructor code
    @@ -759,7 +759,7 @@ 

    Task 5.1:

    • Make a histogram of the latitude of every sample in your filtered dataframe using the DataFrame plot command.
    -
    +
    Code
    df2['Latitude'].hist()
    @@ -791,7 +791,7 @@

    Task 5.2:

    Using .copy() when filtering a dataframe ensures that you’re working with a new DataFrame, not a view of the original. This is especially important when you’re filtering data and then modifying the result, which is common in data science workflows.

    -
    +
    Code
    df3 = df2[df2['Measurement'] > 0].copy()
    @@ -816,7 +816,7 @@

    Task 5.3

    The numpy library has a log10() function that you will find useful for this step!

    -
    +
    Code
    import numpy as np
    @@ -829,7 +829,7 @@ 

    Task 5.4

    • Make a histogram of the log-transformed values in df3
    -
    +
    Code
    df3['log10Measurement'].hist()
    diff --git a/docs/course-materials/answer-keys/eod-day5-key.html b/docs/course-materials/answer-keys/eod-day5-key.html index ac69c22..2144727 100644 --- a/docs/course-materials/answer-keys/eod-day5-key.html +++ b/docs/course-materials/answer-keys/eod-day5-key.html @@ -453,7 +453,7 @@

    Reference:

    Setup

    First, let’s import the necessary libraries and load the data:

    -
    +
    Code
    import pandas as pd
    @@ -465,7 +465,7 @@ 

    Setup

    df = pd.read_csv(url)
    -
    +
    Code
    # Display the first few rows:
    @@ -508,7 +508,7 @@ 

    Setup

    4 3.775280 True 1 NaN NaN
    -
    +
    Code
    # Display the dataframe info:
    @@ -550,7 +550,7 @@ 

    1. Data Preparation

    1. Set the index of the DataFrame to be the ‘entity’ column.
    -
    +
    Code
    # The fastest way to set the index is when loading the dataframe:
    @@ -564,7 +564,7 @@ 

    1. Data Preparation

    1. Remove the ‘year’, ‘Banana values’, ‘type’, ‘Unnamed: 16’, and ‘Chart?’ columns.
    -
    +
    Code
    df = df.drop([
    @@ -581,7 +581,7 @@ 

    1. Data Preparation

    1. Display the first few rows of the modified DataFrame.
    -
    +
    Code
    print(df.head())
    @@ -634,7 +634,7 @@

    2. Exploring Banan
    1. For each of the pre-computed banana score columns (kg, calories, and protein), show the 10 highest-scoring food products.
    -
    +
    Code
    print("\nTop 10 for Bananas index (kg):")
    @@ -692,7 +692,7 @@ 

    2. Exploring Banan

    Note: We could also use the df.filter() command to select all the columns that contain ‘Bana’:

    -
    +
    Code
    for each_column in df.filter(like='Bana'):
    @@ -747,7 +747,7 @@ 

    2. Exploring Banan
    1. Create a function to return the top 10 scores for a given column.
    -
    +
    Code
    def return_top_10(df, column):
    @@ -758,7 +758,7 @@ 

    2. Exploring Banan
    1. Use your function to display the results for each of the three Banana index columns.
    -
    +
    Code
    banana_columns = [
    @@ -837,7 +837,7 @@ 

    3. Common High-S

    Python sets allow you to quickly determine intersections: in_all_three = set.intersection(seta, setb, setc), or you can use the * operator to unpack a list of sets directly: in_all_three = set.intersection(*list_of_sets)

    -
    +
    Code
    top_10_kg = set(return_top_10(df, 'Bananas index (kg)').index)
    @@ -855,7 +855,7 @@ 

    3. Common High-S
    
     Foods in top 10 for all three metrics:
    -{'Beef meatballs', 'Beef steak', 'Beef mince'}
    +{'Beef mince', 'Beef steak', 'Beef meatballs'}

    @@ -877,7 +877,7 @@

    4. Land Use Analysis

    The data on land_use_1000kcal for bananas is in the Bananas row.

    -
    +
    Code
    bananas_land_use_1000kcal = df.loc['Bananas', 'land_use_1000kcal']
    @@ -888,7 +888,7 @@ 

    4. Land Use Analysis

  • Display the 10 foods with the highest land use score.
  • -
    +
    Code
    print("\nTop 10 foods by land use score:")
    @@ -914,7 +914,7 @@ 

    4. Land Use Analysis

  • Compare this list with the previous top 10 lists. Are there any common foods?
  • -
    +
    Code
    # Use a list comprehension and df.filter to make a list of sets:
    @@ -926,14 +926,14 @@ 

    4. Land Use Analysis

    
     Foods in top 10 for all four metrics:
    -{'Beef meatballs', 'Beef steak', 'Beef mince'}
    +{'Beef mince', 'Beef steak', 'Beef meatballs'}

    5. Cheese Analysis

    Identify the type of cheese with the highest banana score per 1,000 kcal. How does it compare to other cheeses in the dataset?

    -
    +
    Code
    # 5. Cheese Analysis
    @@ -976,7 +976,7 @@ 

    6. Correlation Analys
    1. Calculate and display the correlations among the four computed banana scores (including the new land use score).
    -
    +
    Code
    # 6a. Correlation Analysis
    @@ -987,7 +987,7 @@ 

    6. Correlation Analys
    1. Create a heatmap to visualize these correlations.
    -
    +
    Code
    plt.figure(figsize=(10, 8))
    @@ -1007,7 +1007,7 @@ 

    6. Correlation Analys

    7. Using Pandas styles

    Style your correlation dataframe to highlight values in the range between 0.8 and 0.99.

    -
    +
    Code
    # 7. Visualization
    @@ -1015,49 +1015,49 @@ 

    7. Using Pandas styles

    - +
    - - - - + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + +
     Bananas index (kg)Bananas index (1000 kcalories)Bananas index (100g protein)Bananas index (land use 1000 kcal)Bananas index (kg)Bananas index (1000 kcalories)Bananas index (100g protein)Bananas index (land use 1000 kcal)
    Bananas index (kg)1.0000000.8826390.2245550.926726Bananas index (kg)1.0000000.8826390.2245550.926726
    Bananas index (1000 kcalories)0.8826391.0000000.3680010.880739Bananas index (1000 kcalories)0.8826391.0000000.3680010.880739
    Bananas index (100g protein)0.2245550.3680011.0000000.224511Bananas index (100g protein)0.2245550.3680011.0000000.224511
    Bananas index (land use 1000 kcal)0.9267260.8807390.2245111.000000Bananas index (land use 1000 kcal)0.9267260.8807390.2245111.000000
    @@ -1068,7 +1068,7 @@

    7. Using Pandas styles

    Bonus Challenge

    If you finish early, try to create a “Banana Equivalence” calculator. This function should take a food item, an amount, and a metric (kg, calories, or protein) as input, and return how many bananas would have the same environmental impact.

    -
    +
    Code
    # Bonus Challenge
    diff --git a/docs/course-materials/answer-keys/eod-day6-key.html b/docs/course-materials/answer-keys/eod-day6-key.html
    index 8a06c8c..b0a749f 100644
    --- a/docs/course-materials/answer-keys/eod-day6-key.html
    +++ b/docs/course-materials/answer-keys/eod-day6-key.html
    @@ -430,7 +430,7 @@ 

    On this page

    Setup

    First, import the necessary libraries and load the dataset:

    -
    +
    Code
    import pandas as pd
    @@ -449,7 +449,7 @@ 

    Task
    1. Display the first few rows of the dataset.
    -
    +
    Code
    print(eurovision_df.head())
    @@ -503,7 +503,7 @@

    Task
    1. Check the data types of each column.
    -
    +
    Code
    print(eurovision_df.dtypes)
    @@ -536,7 +536,7 @@

    Task
    1. Identify and handle any missing values.
    -
    +
    Code
    print(eurovision_df.isnull().sum())
    @@ -570,7 +570,7 @@ 

    Task
    1. Convert the ‘year’ column to datetime type.
    -
    +
    Code
    eurovision_df['year'] = pd.to_datetime(eurovision_df['year'], format='%Y')
    @@ -595,7 +595,7 @@

    Task 2

    Use .copy() to make sure you create a new dataframe and not just a view.

    -
    +
    Code
    eurovision_1990 = eurovision_df[eurovision_df['year'].dt.year >= 1990].copy()
    @@ -604,7 +604,7 @@

    Task 2
    1. Calculate the difference between final points and semi-final points for each entry and make a histogram of these values using the builtin dataframe .hist() command.
    -
    +
    Code
    eurovision_1990['points_difference'] = eurovision_1990['points_final'] - eurovision_1990['points_sf']
    @@ -624,7 +624,7 @@ 

    Task 3: Sor
    1. Find the top 10 countries with the most Eurovision appearances (use the entire dataset for this calculation)
    -
    +
    Code
    top_10_countries = eurovision_df['to_country'].value_counts().head(10)
    @@ -648,7 +648,7 @@ 

    Task 3: Sor
    1. Calculate the average final points for each country across all years. Make a simple bar plot of these data.
    -
    +
    Code
    avg_points_by_country = eurovision_df.groupby('to_country')['points_final'].mean().sort_values(ascending=False)
    @@ -756,7 +756,7 @@ 

    Task 4: Group

    These methods create a new column that you can use with groupby() for aggregations across your chosen time intervals.

    -
    +
    Code
    eurovision_df['decade'] = (eurovision_df['year'].dt.year // 10) * 10
    @@ -782,13 +782,13 @@ 

    Task 5: Joining Data
  • Read in a new dataframe that contains population data stored at this url:
  • -
    +
    Code
    population_url = 'https://bit.ly/euro_pop'
    -
    +
    Code
    population_df = pd.read_csv(population_url)
    @@ -797,7 +797,7 @@

    Task 5: Joining Data
  • Join this data with the Eurovision dataframe.
  • -
    +
    Code
    merged_df = pd.merge(eurovision_df, population_df, left_on='to_country', right_on='country_name')
    @@ -825,7 +825,7 @@

    Task 5: Joining Data3d. Sort the results by entries per capita

    3e. Print the top 10 values

    -
    +
    Code
    # Step 1. Count the number of records for each country
    @@ -883,7 +883,7 @@ 

    Task 6: Time S
    1. Plot the trend of maximum final points awarded over the years.
    -
    +
    Code
    yearly_max_points = eurovision_df.groupby('year')['points_final'].max()
    diff --git a/docs/course-materials/answer-keys/eod-day7-key.html b/docs/course-materials/answer-keys/eod-day7-key.html
    index 2120405..71510cd 100644
    --- a/docs/course-materials/answer-keys/eod-day7-key.html
    +++ b/docs/course-materials/answer-keys/eod-day7-key.html
    @@ -460,7 +460,7 @@ 

    Tasks

    1. Setup

    First, import pandas, matplotlib, and seaborn and load the three datasets.

    -
    +
    Code
    import pandas as pd
    @@ -479,7 +479,7 @@ 

    1. Setup

    Next, display the first few rows and print out the dataset info to get an idea of the contents of each dataset.

    -
    +
    Code
    # Display the first few rows:
    @@ -529,7 +529,7 @@ 

    1. Setup

    4 0 0 NaN
    -
    +
    Code
    # Display the dataframe info:
    @@ -589,7 +589,7 @@ 

    1. Setup

    You may have noticed that the zipcodes were read in as integers rather than strings, and therefore might not be 5 digits long. Ensure the zipcode or zip column in all datasets is a 5-character string, filling in any zeros that were dropped.

    -
    +
    Code
    # Ensure 5-character string zipcodes
    @@ -599,7 +599,7 @@ 

    1. Setup

    Combine the 2012 and 2023 data together by adding a year column and then stacking them together.

    -
    +
    Code
    # Add the year column
    @@ -612,7 +612,7 @@ 

    1. Setup

    In the combined plant hardiness dataframe, create two new columns, trange_min and trange_max, containing the min and max temperatures of the trange column. Remove the original trange column.

    Hint: use str.split() to split the trange strings where they have spaces and retrieve the first and last components (min and max, respectively)

    -
    +
    Code
    # Split the trange string and get the first (min) and last (max) pieces of it
    @@ -629,7 +629,7 @@ 

    Tasks

    2. Exploration and visualization

    On average, how much has the minimum temperature in a zip code changed from 2012 to 2023?

    -
    +
    Code
    # Get the mean of the minimum temperatures 
    @@ -644,7 +644,7 @@ 

    2. Explorati

    Merge together the combined plant hardiness dataset and the zipcode dataset by zipcode. This will give us more informtaion in the plant hardiness dataset, such as the latitude and longitude for each zipcode.

    -
    +
    Code
    df = pd.merge(df, df_zipcodes, left_on='zipcode', right_on='zip')
    @@ -810,7 +810,7 @@ 

    2. Explorati

    Create two scatter plot where the x axis is the longitude, the y axis is the latitude, the color is based on the minimum temperature in 2012 for one and 2023 for the other. Only look at longitude < -60.

    -
    +
    Code
    # Filter the data for longitude less than -60
    @@ -852,7 +852,7 @@ 

    2. Explorati

    Now create a single scatter plot where you look at the difference between the minimum temperature in 2012 and 2023. Only look at longitude < -60. Color any zipcodes where you do not have information from both years in grey.

    -
    +
    Code
    # Find the difference in minimum temperature between 2023 and 2012
    @@ -879,7 +879,7 @@ 

    2. Explorati

    Create a bar plot showing the top 10 states where the average minimum temperature increased the most. Label your axes appropriately.

    -
    +
    Code
    # Filter the data for only 2012 and 2023
    @@ -905,7 +905,7 @@ 

    2. Explorati plt.show()

    -
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_61959/2045411015.py:17: FutureWarning: 
    +
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85403/2045411015.py:17: FutureWarning: 
     
     Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.
     
    diff --git a/docs/course-materials/cheatsheets/JupyterLab.html b/docs/course-materials/cheatsheets/JupyterLab.html
    index b192013..8456980 100644
    --- a/docs/course-materials/cheatsheets/JupyterLab.html
    +++ b/docs/course-materials/cheatsheets/JupyterLab.html
    @@ -471,7 +471,7 @@ 

    Variable Inspector

    The variable inspector is not suitable for use with large dataframes or large arrays. You should use standard commands like df.head(), df.tail(), df.info(), df.describe() to inspect large dataframes.

    -
    +
    Code
    # Example variables
    @@ -489,7 +489,7 @@ 

    Essential Magic C

    Magic commands start with % (line magics) or %% (cell magics). Note that available magic commands may vary depending on your Jupyter environment and installed extensions.

    Viewing Variables

    -
    +
    Code
    # List all variables
    @@ -501,7 +501,7 @@ 

    Viewing Variables

    Variable     Type        Data/Info
     ----------------------------------
    -ojs_define   function    <function ojs_define at 0x1a6637e20>
    +ojs_define   function    <function ojs_define at 0x1a6303e20>
     x            int         5
     y            str         Hello
     z            list        n=3
    @@ -511,7 +511,7 @@ 

    Viewing Variables

    Running Shell Commands

    -
    +
    Code
    # Run a shell command
    diff --git a/docs/course-materials/cheatsheets/chart_customization.html b/docs/course-materials/cheatsheets/chart_customization.html
    index 0ff5e22..8f76fae 100644
    --- a/docs/course-materials/cheatsheets/chart_customization.html
    +++ b/docs/course-materials/cheatsheets/chart_customization.html
    @@ -439,7 +439,7 @@ 

    On this page

    Matplotlib Customization

    Basic Plot Setup

    -
    +
    Code
    import matplotlib.pyplot as plt
    @@ -463,7 +463,7 @@ 

    Basic Plot Setup

    Customizing Line Plots

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -485,7 +485,7 @@ 

    Customizing Line Pl

    Adjusting Axes

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -507,7 +507,7 @@ 

    Adjusting Axes

    Adding Legend

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -527,7 +527,7 @@ 

    Adding Legend

    Customizing Text and Annotations

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -552,7 +552,7 @@ 

    Customizi

    Seaborn Customization

    Setting the Style

    -
    +
    Code
    import seaborn as sns
    @@ -565,7 +565,7 @@ 

    Setting the Style

    Loading and Preparing Data

    -
    +
    Code
    # Load the tips dataset
    @@ -602,7 +602,7 @@ 

    Loading and Pre

    Customizing a Scatter Plot

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -621,7 +621,7 @@ 

    Customizing a S

    Customizing a Box Plot

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -630,7 +630,7 @@ 

    Customizing a Box P plt.show()

    -
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_62194/2996973332.py:2: FutureWarning: 
    +
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85582/2996973332.py:2: FutureWarning: 
     
     Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
     
    @@ -647,7 +647,7 @@ 

    Customizing a Box P

    Customizing a Heatmap (Correlation of Numeric Columns)

    -
    +
    Code
    corr = tips_numeric.corr()
    @@ -667,7 +667,7 @@ 

    Customizing a Pair Plot

    -
    +
    Code
    sns.pairplot(tips, hue="time", palette="husl", height=2.5, 
    @@ -686,7 +686,7 @@ 

    Customizing a Pair

    Customizing a Regression Plot

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -705,7 +705,7 @@ 

    Customizing

    Customizing a Categorical Plot

    -
    +
    Code
    plt.figure(figsize=(12, 6))
    diff --git a/docs/course-materials/cheatsheets/chart_customization_files/figure-html/cell-13-output-1.png b/docs/course-materials/cheatsheets/chart_customization_files/figure-html/cell-13-output-1.png
    index 7e8ec1f..1e6b146 100644
    Binary files a/docs/course-materials/cheatsheets/chart_customization_files/figure-html/cell-13-output-1.png and b/docs/course-materials/cheatsheets/chart_customization_files/figure-html/cell-13-output-1.png differ
    diff --git a/docs/course-materials/cheatsheets/comprehensions.html b/docs/course-materials/cheatsheets/comprehensions.html
    index f085cb3..b91d146 100644
    --- a/docs/course-materials/cheatsheets/comprehensions.html
    +++ b/docs/course-materials/cheatsheets/comprehensions.html
    @@ -439,7 +439,7 @@ 

    List Comprehensions

    Basic Syntax

    A list comprehension provides a concise way to create lists. The basic syntax is:

    -
    +
    Code
    # [expression for item in iterable]
    @@ -454,7 +454,7 @@ 

    Basic Syntax

    With Conditional Logic

    You can add a condition to include only certain items in the new list:

    -
    +
    Code
    # [expression for item in iterable if condition]
    @@ -469,7 +469,7 @@ 

    With Conditional Lo

    Nested List Comprehensions

    List comprehensions can be nested to handle more complex data structures:

    -
    +
    Code
    # [(expression1, expression2) for item1 in iterable1 for item2 in iterable2]
    @@ -484,7 +484,7 @@ 

    Nested List Com

    Evaluating Functions in a List Comprehension

    You can use list comprehensions to apply a function to each item in an iterable:

    -
    +
    Code
    # Function to evaluate
    @@ -507,7 +507,7 @@ 

    Dictionary Compr

    Basic Syntax

    Dictionary comprehensions provide a concise way to create dictionaries. The basic syntax is:

    -
    +
    Code
    # {key_expression: value_expression for item in iterable}
    @@ -524,7 +524,7 @@ 

    Basic Syntax

    Without zip

    You can create a dictionary without using zip by leveraging the index:

    -
    +
    Code
    # {key_expression: value_expression for index in range(len(list))}
    @@ -542,7 +542,7 @@ 

    Without zip

    With Conditional Logic

    You can include conditions to filter out key-value pairs:

    -
    +
    Code
    # {key_expression: value_expression for item in iterable if condition}
    @@ -560,7 +560,7 @@ 

    With Conditional

    Evaluating Functions in a Dictionary Comprehension

    You can use dictionary comprehensions to apply a function to values in an iterable:

    -
    +
    Code
    # Function to evaluate
    diff --git a/docs/course-materials/cheatsheets/control_flows.html b/docs/course-materials/cheatsheets/control_flows.html
    index 1ba91d7..995d905 100644
    --- a/docs/course-materials/cheatsheets/control_flows.html
    +++ b/docs/course-materials/cheatsheets/control_flows.html
    @@ -444,7 +444,7 @@ 

    On this page

    Conditional Statements

    if-elif-else

    -
    +
    Code
    x = 10
    @@ -465,7 +465,7 @@ 

    if-elif-else

    Loops

    for loop

    -
    +
    Code
    fruits = ["apple", "banana", "cherry"]
    @@ -481,7 +481,7 @@ 

    for loop

    while loop

    -
    +
    Code
    count = 0
    @@ -503,7 +503,7 @@ 

    while loop

    Loop Control

    break

    -
    +
    Code
    for i in range(10):
    @@ -522,7 +522,7 @@ 

    break

    continue

    -
    +
    Code
    for i in range(5):
    @@ -543,7 +543,7 @@ 

    continue

    Comprehensions

    List Comprehension

    -
    +
    Code
    squares = [x**2 for x in range(5)]
    @@ -556,7 +556,7 @@ 

    List Comprehension

    Dictionary Comprehension

    -
    +
    Code
    squares_dict = {x: x**2 for x in range(5)}
    @@ -572,7 +572,7 @@ 

    Dictionary Compre

    Exception Handling

    try-except

    -
    +
    Code
    try:
    @@ -587,7 +587,7 @@ 

    try-except

    try-except-else-finally

    -
    +
    Code
    try:
    diff --git a/docs/course-materials/cheatsheets/data_grouping.html b/docs/course-materials/cheatsheets/data_grouping.html
    index 68ed649..0318731 100644
    --- a/docs/course-materials/cheatsheets/data_grouping.html
    +++ b/docs/course-materials/cheatsheets/data_grouping.html
    @@ -431,7 +431,7 @@ 

    On this page

    Grouping Data

    Grouping data allows you to split your DataFrame into groups based on one or more columns.

    -
    +
    Code
    import pandas as pd
    @@ -455,7 +455,7 @@ 

    Grouping Data

    Creating a groupby object:

    -
    +
    Code
    # Group by 'category'
    @@ -469,7 +469,7 @@ 

    Aggregating Data

    After grouping, you can apply various aggregation functions to summarize the data within each group.

    Basic aggregation

    -
    +
    Code
    # Basic aggregations
    @@ -490,7 +490,7 @@ 

    Basic aggregation

    Doing multiple aggregations at the same time using agg()

    -
    +
    Code
    # Multiple aggregations
    @@ -506,7 +506,7 @@ 

    Aggregation using a custom function

    -
    +
    Code
    # Custom aggregation function
    @@ -541,7 +541,7 @@ 

    Grouped Operations

    You can apply operations to each group separately using transform() or apply().

    Using transform() to alter each group in a group by object

    -
    +
    Code
    # Transform: apply function to each group, return same-sized DataFrame
    @@ -554,7 +554,7 @@ 

    Using apply() to alter each group in a group by object

    -
    +
    Code
    # Apply: apply function to each group, return a DataFrame or Series
    @@ -564,7 +564,7 @@ 

    result = grouped.apply(group_range)

    -
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_61889/114114075.py:5: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
    +
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85349/114114075.py:5: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
       result = grouped.apply(group_range)
    @@ -575,7 +575,7 @@

    Pivot Tables

    Pivot tables are a powerful tool for reorganizing and summarizing data. They allow you to transform your data from a long format to a wide format, making it easier to analyze and visualize patterns.

    Working with Pivot Tables

    -
    +
    Code
    # Sample DataFrame
    @@ -596,7 +596,7 @@ 

    Working with Piv

    Pivot tables with a single aggregation function

    -
    +
    Code
    # Create a pivot table
    @@ -613,7 +613,7 @@ 

    Pivot tables with multiple aggregation

    -
    +
    Code
    # Pivot table with multiple aggregation functions
    @@ -629,9 +629,9 @@ 

    Piv 2023-01-02 120 180 120.0 180.0

    -
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_61889/1326309547.py:2: FutureWarning: The provided callable <function sum at 0x10e2c72e0> is currently using DataFrameGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
    +
    /var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85349/1326309547.py:2: FutureWarning: The provided callable <function sum at 0x11053b2e0> is currently using DataFrameGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
       pivot_multi = pd.pivot_table(df, values='sales', index='date', columns='product',
    -/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_61889/1326309547.py:2: FutureWarning: The provided callable <function mean at 0x10e2d8400> is currently using DataFrameGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
    +/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85349/1326309547.py:2: FutureWarning: The provided callable <function mean at 0x11054c400> is currently using DataFrameGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
       pivot_multi = pd.pivot_table(df, values='sales', index='date', columns='product',
    diff --git a/docs/course-materials/cheatsheets/data_selection.html b/docs/course-materials/cheatsheets/data_selection.html index d3da2bf..66f671f 100644 --- a/docs/course-materials/cheatsheets/data_selection.html +++ b/docs/course-materials/cheatsheets/data_selection.html @@ -488,7 +488,7 @@

    Selection vs. 

    Setup

    First, let’s import pandas and load our dataset.

    -
    +
    Code
    import pandas as pd
    @@ -513,7 +513,7 @@ 

    Setup

    Basic Selection

    Select a Single Column

    -
    +
    Code
    # Using square brackets
    @@ -526,7 +526,7 @@ 

    Select a Single Col

    Select Multiple Columns

    -
    +
    Code
    # Select age and gpa columns
    @@ -536,7 +536,7 @@ 

    Select Multiple Co

    Select Rows by Index

    -
    +
    Code
    # Select first 5 rows
    @@ -549,7 +549,7 @@ 

    Select Rows by Index<

    Select Rows and Columns

    -
    +
    Code
    # Select first 3 rows and 'age', 'gpa' columns
    @@ -562,7 +562,7 @@ 

    Select Rows and Co

    Filtering

    Filter by a Single Condition

    -
    +
    Code
    # Students with age greater than 21
    @@ -572,7 +572,7 @@ 

    Filter by a S

    Filter by Multiple Conditions

    -
    +
    Code
    # Students with age > 21 and gpa > 3.5
    @@ -582,7 +582,7 @@ 

    Filter by Mu

    Filter Using .isin()

    -
    +
    Code
    # Students majoring in Computer Science or Biology
    @@ -592,7 +592,7 @@ 

    Filter Using .isin()

    Filter Using String Methods

    -
    +
    Code
    # Majors starting with 'E'
    @@ -603,7 +603,7 @@ 

    Filter Using S

    Combining Selection and Filtering

    -
    +
    Code
    # Select 'age' and 'gpa' for students with gpa > 3.5
    @@ -622,7 +622,7 @@ 

    .loc[] vs .iloc[]

    .query() Method

    -
    +
    Code
    # Filter using query method
    @@ -632,7 +632,7 @@ 

    .query() Method

    .where() Method

    -
    +
    Code
    # Replace values not meeting the condition with NaN
    diff --git a/docs/course-materials/cheatsheets/functions.html b/docs/course-materials/cheatsheets/functions.html
    index aca2580..84e0e67 100644
    --- a/docs/course-materials/cheatsheets/functions.html
    +++ b/docs/course-materials/cheatsheets/functions.html
    @@ -455,7 +455,7 @@ 

    Basics of Functions

    Defining a Function

    In Python, a function is defined using the def keyword, followed by the function name and parentheses () that may include parameters.

    -
    +
    Code
    def function_name(parameters):
    @@ -466,7 +466,7 @@ 

    Defining a Function

    Example: Convert Celsius to Fahrenheit

    -
    +
    Code
    def celsius_to_fahrenheit(celsius):
    @@ -479,7 +479,7 @@ 

    Exam

    Calling a Function

    Call a function by using its name followed by parentheses, and pass arguments if the function requires them.

    -
    +
    Code
    temperature_celsius = 25
    @@ -496,7 +496,7 @@ 

    Calling a Function

    Common Unit Conversions

    Example: Convert Kilometers to Miles

    -
    +
    Code
    def kilometers_to_miles(kilometers):
    diff --git a/docs/course-materials/cheatsheets/pandas_dataframes.html b/docs/course-materials/cheatsheets/pandas_dataframes.html
    index b85893b..2f96c1a 100644
    --- a/docs/course-materials/cheatsheets/pandas_dataframes.html
    +++ b/docs/course-materials/cheatsheets/pandas_dataframes.html
    @@ -447,7 +447,7 @@ 

    Introduction

    Importing Pandas

    Always start by importing pandas:

    -
    +
    Code
    import pandas as pd
    @@ -458,7 +458,7 @@

    Importing Pandas

    Creating a DataFrame

    From a dictionary

    -
    +
    Code
    data = {'Name': ['Alice', 'Bob', 'Charlie'],
    @@ -477,7 +477,7 @@ 

    From a dictionary

    From a CSV file

    -
    +
    Code
    # Here's an example csv file we can use for read_csv:
    @@ -508,7 +508,7 @@ 

    From a CSV file

    Basic DataFrame Information

    -
    +
    Code
    # Display the first few rows
    @@ -560,7 +560,7 @@ 

    Basic DataFram

    Selecting Data

    Selecting columns

    -
    +
    Code
    # Select a single column
    @@ -573,7 +573,7 @@ 

    Selecting columns

    Selecting rows

    -
    +
    Code
    # Select rows by index
    @@ -589,7 +589,7 @@ 

    Selecting rows

    Basic Data Manipulation

    Adding a new column

    -
    +
    Code
    df['Is Adult'] = df['Age'] >= 18
    @@ -598,7 +598,7 @@

    Adding a new column

    Renaming columns

    -
    +
    Code
    df = df.rename(columns={'Name': 'Full Name'})
    @@ -607,7 +607,7 @@

    Renaming columns

    Handling missing values

    -
    +
    Code
    # Drop rows with any missing values
    @@ -621,7 +621,7 @@ 

    Handling missing v

    Basic Calculations

    -
    +
    Code
    # Calculate mean age
    @@ -634,7 +634,7 @@ 

    Basic Calculations

    Grouping and Aggregation

    -
    +
    Code
    # Group by city and calculate mean age
    @@ -644,7 +644,7 @@ 

    Grouping and Aggr

    Sorting

    -
    +
    Code
    # Sort by Age in descending order
    @@ -654,7 +654,7 @@ 

    Sorting

    Saving a DataFrame

    -
    +
    Code
    # Save to CSV
    diff --git a/docs/course-materials/cheatsheets/pandas_series.html b/docs/course-materials/cheatsheets/pandas_series.html
    index afd158a..dc19cc5 100644
    --- a/docs/course-materials/cheatsheets/pandas_series.html
    +++ b/docs/course-materials/cheatsheets/pandas_series.html
    @@ -454,7 +454,7 @@ 

    Introduction

    Importing Pandas

    Always start by importing pandas:

    -
    +
    Code
    import pandas as pd
    @@ -465,7 +465,7 @@

    Importing Pandas

    Creating a Series

    From a list

    -
    +
    Code
    data = [1, 2, 3, 4, 5]
    @@ -484,7 +484,7 @@ 

    From a list

    From a dictionary

    -
    +
    Code
    data = {'a': 0., 'b': 1., 'c': 2.}
    @@ -501,7 +501,7 @@ 

    From a dictionary

    With custom index

    -
    +
    Code
    s = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
    @@ -520,7 +520,7 @@ 

    With custom index

    Basic Series Information

    -
    +
    Code
    # Display the first few elements
    @@ -572,7 +572,7 @@ 

    Basic Series Info

    Selecting Data

    By label

    -
    +
    Code
    # Select a single element
    @@ -585,7 +585,7 @@ 

    By label

    By position

    -
    +
    Code
    # Select by integer index (direct selection is being deprecated)
    @@ -601,7 +601,7 @@ 

    By position

    By condition

    -
    +
    Code
    # Select elements greater than 2
    @@ -614,7 +614,7 @@ 

    By condition

    Basic Data Manipulation

    Updating values

    -
    +
    Code
    s['a'] = 10
    @@ -623,7 +623,7 @@

    Updating values

    Removing elements

    -
    +
    Code
    s=s.drop(labels=['a'])
    @@ -632,7 +632,7 @@

    Removing elements

    Adding elements to a Series

    -
    +
    Code
    another_series = pd.Series(
    @@ -659,7 +659,7 @@ 

    Adding element

    Updating elements based on their value using mask

    -
    +
    Code
    print(s)
    @@ -689,7 +689,7 @@ 

    Replacing elements based on their value using where

    -
    +
    Code
    print(s)
    @@ -721,7 +721,7 @@ 

    Applying functions

    Applying a newly-defined function

    -
    +
    Code
    def squared(x):
    @@ -735,7 +735,7 @@ 

    Applying

    Applying a lambda (temporary) function

    -
    +
    Code
    s_squared = s.apply(lambda x: x**2)
    @@ -746,7 +746,7 @@

    Apply

    Handling missing values

    -
    +
    Code
    # Drop missing values
    @@ -760,7 +760,7 @@ 

    Handling missing v

    Basic Calculations

    -
    +
    Code
    # Calculate mean
    @@ -776,7 +776,7 @@ 

    Basic Calculations

    Sorting

    -
    +
    Code
    print(s)
    @@ -800,7 +800,7 @@ 

    Sorting

    Reindexing

    -
    +
    Code
    print(f"Original Series:\n{s}\n", sep='')
    @@ -839,7 +839,7 @@ 

    Reindexing

    Combining Series

    -
    +
    Code
    s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
    @@ -850,7 +850,7 @@ 

    Combining Series

    Converting to Other Data Types

    -
    +
    Code
    # To list
    diff --git a/docs/course-materials/cheatsheets/random_numbers.html b/docs/course-materials/cheatsheets/random_numbers.html
    index 581e9c8..bb01aa8 100644
    --- a/docs/course-materials/cheatsheets/random_numbers.html
    +++ b/docs/course-materials/cheatsheets/random_numbers.html
    @@ -437,7 +437,7 @@ 

    Importing NumPy

    -
    +
    Code
    import numpy as np
    @@ -446,7 +446,7 @@

    Importing NumPy

    Creating a Generator

    -
    +
    Code
    # Create a Generator with the default BitGenerator
    @@ -461,7 +461,7 @@ 

    Creating a Generator<

    Basic Random Number Generation

    Uniform Distribution (0 to 1)

    -
    +
    Code
    # Single random float
    @@ -471,14 +471,14 @@ 

    Uniform Distri print(rng.random(5))

    -
    0.22092102589868978
    -[0.28943778 0.09584537 0.78135092 0.97740274 0.44528883]
    +
    0.3278102376191233
    +[0.16307633 0.40392116 0.45921329 0.26359239 0.32981461]

    Integers

    -
    +
    Code
    # Single random integer from 0 to 10 (inclusive)
    @@ -489,13 +489,13 @@ 

    Integers

    4
    -[74 44 73 84 90]
    +[73 74 35 82 86]

    Normal (Gaussian) Distribution

    -
    +
    Code
    # Single value from standard normal distribution
    @@ -505,15 +505,15 @@ 

    Normal (Gauss print(rng.normal(loc=0, scale=1, size=5))

    -
    -0.8129275335805446
    -[ 1.1815606  -1.15590009  0.26582165  0.43263691  0.75292281]
    +
    -0.39813817964311515
    +[ 0.28262105 -1.23483313  0.42485523 -0.12518701 -0.53235252]

    Sampling

    -
    +
    Code
    # Random choice from an array
    @@ -525,13 +525,13 @@ 

    Sampling

    4
    -[4 3 1]
    +[2 5 4]

    Shuffling

    -
    +
    Code
    arr = np.arange(10)
    @@ -539,14 +539,14 @@ 

    Shuffling

    print(arr)
    -
    [9 5 7 3 1 2 4 8 6 0]
    +
    [4 1 6 5 8 2 3 7 9 0]

    Other Distributions

    Generators provide methods for many other distributions:

    -
    +
    Code
    # Poisson distribution
    @@ -559,16 +559,16 @@ 

    Other Distributionsprint(rng.binomial(n=10, p=0.5, size=3))

    -
    [11  4  5]
    -[0.16471365 0.67393881 0.09664034]
    -[4 6 6]
    +
    [8 3 4]
    +[1.16757058 0.27745692 0.73386612]
    +[6 5 5]

    Generating on Existing Arrays

    Generators can fill existing arrays, which can be more efficient:

    -
    +
    Code
    arr = np.empty(5)
    @@ -576,14 +576,14 @@ 

    Generating o print(arr)

    -
    [0.51132826 0.60823193 0.59053934 0.87320215 0.3752571 ]
    +
    [0.91186338 0.70114601 0.45616672 0.46383814 0.5871376 ]

    Bit Generators

    You can use different Bit Generators with varying statistical qualities:

    -
    +
    Code
    from numpy.random import PCG64, MT19937
    @@ -595,14 +595,14 @@ 

    Bit Generators

    print("MT19937:", rng_mt.random())
    -
    PCG64: 0.47047396600491753
    -MT19937: 0.7397295290585331
    +
    PCG64: 0.8210003140505849
    +MT19937: 0.41349136323428615

    Saving and Restoring State

    -
    +
    Code
    # Save state
    @@ -616,15 +616,15 @@ 

    Saving and Rest print("Restored:", rng.random(3))

    -
    Original: [0.45477352 0.27105728 0.16726136]
    -Restored: [0.45477352 0.27105728 0.16726136]
    +
    Original: [0.8069469  0.67878297 0.48562037]
    +Restored: [0.8069469  0.67878297 0.48562037]

    Spawning New Generators

    You can create independent generators from an existing one:

    -
    +
    Code
    child1, child2 = rng.spawn(2)
    @@ -632,22 +632,22 @@ 

    Spawning New Gener print("Child 2:", child2.random())

    -
    Child 1: 0.5815447472196176
    -Child 2: 0.23081422202004886
    +
    Child 1: 0.11547306758577636
    +Child 2: 0.8805861858890862

    Thread Safety and Jumping

    Generators are designed to be thread-safe and support “jumping” ahead in the sequence:

    -
    +
    Code
    rng = np.random.Generator(PCG64())
     rng.bit_generator.advance(1000)  # Jump ahead 1000 steps
    -
    <numpy.random._pcg64.PCG64 at 0x10e2b0040>
    +
    <numpy.random._pcg64.PCG64 at 0x10dc90040>
    diff --git a/docs/course-materials/cheatsheets/read_csv.html b/docs/course-materials/cheatsheets/read_csv.html index 022fc19..3418e6d 100644 --- a/docs/course-materials/cheatsheets/read_csv.html +++ b/docs/course-materials/cheatsheets/read_csv.html @@ -458,7 +458,7 @@

    On this page

    Basic Usage of pd.read_csv

    Reading a Simple CSV File

    -
    +
    Code
    import pandas as pd
    @@ -489,7 +489,7 @@ 

    Reading a Simple

    Selecting Specific Columns

    Using the usecols Parameter

    -
    +
    Code
    # Read only specific columns
    @@ -509,7 +509,7 @@ 

    Using the Naming Columns

    Using the names Parameter

    -
    +
    Code
    # Rename columns while reading
    @@ -530,7 +530,7 @@ 

    Using the

    Specifying an Index

    Using the index_col Parameter

    -
    +
    Code
    # Set 'name' column as index
    @@ -551,7 +551,7 @@ 

    Using the Parsing Dates

    Automatic Date Parsing

    -
    +
    Code
    csv_data_with_dates = """
    @@ -578,7 +578,7 @@ 

    Automatic Date Pars

    Custom Date Parsing

    -
    +
    Code
    csv_data_custom_dates = """
    @@ -608,7 +608,7 @@ 

    Custom Date ParsingHandling Headers

    CSV with Multi-line Header

    -
    +
    Code
    csv_data_with_header = """
    @@ -646,7 +646,7 @@ 

    CSV with Multi-

    CSV with No Header

    -
    +
    Code
    csv_data_no_header = """
    @@ -671,7 +671,7 @@ 

    CSV with No Header

    Dealing with Missing Data

    Customizing NA Values

    -
    +
    Code
    csv_data_missing = """
    @@ -697,7 +697,7 @@ 

    Customizing NA Value

    Coercing Columns to Specific Data Types

    Using the dtype Parameter

    -
    +
    Code
    csv_data_types = """
    @@ -728,7 +728,7 @@ 

    Using the

    Reading Large CSV Files

    Using chunksize for Memory Efficiency

    -
    +
    Code
    import numpy as np
    diff --git a/docs/course-materials/cheatsheets/seaborn.html b/docs/course-materials/cheatsheets/seaborn.html
    index 3640656..047040f 100644
    --- a/docs/course-materials/cheatsheets/seaborn.html
    +++ b/docs/course-materials/cheatsheets/seaborn.html
    @@ -444,7 +444,7 @@ 

    Introduction to Se

    Setting up Seaborn

    To use Seaborn, you need to import it along with other necessary libraries:

    -
    +
    Code
    import seaborn as sns
    @@ -463,7 +463,7 @@ 

    1. Scatter Plots

    Useful for showing relationships between two continuous variables.

    -
    +
    Code
    # Load the tips dataset
    @@ -479,7 +479,7 @@ 

    1. Scatter Plots

    4 24.59 3.61 Female No Sun Dinner 4
    -
    +
    Code
    # Basic scatter plot
    @@ -496,7 +496,7 @@ 

    1. Scatter Plots

    -
    +
    Code
    # Add hue for a third variable
    @@ -518,7 +518,7 @@ 

    1. Scatter Plots

    2. Line Plots

    Ideal for time series data or showing trends.

    -
    +
    Code
    # Load the flights dataset
    @@ -534,7 +534,7 @@ 

    2. Line Plots

    4 1949 May 121
    -
    +
    Code
    # Basic line plot (uncertainty bounds calculated auto-magically by grouping rows containing the same year!)
    @@ -551,7 +551,7 @@ 

    2. Line Plots

    -
    +
    Code
    # Multiple lines with confidence intervals
    @@ -573,7 +573,7 @@ 

    2. Line Plots

    3. Bar Plots

    Great for comparing quantities across different categories.

    -
    +
    Code
    # Load the titanic dataset
    @@ -596,7 +596,7 @@ 

    3. Bar Plots

    4 man True NaN Southampton no True
    -
    +
    Code
    # Basic bar plot
    @@ -613,7 +613,7 @@ 

    3. Bar Plots

    -
    +
    Code
    # Grouped bar plot
    @@ -635,7 +635,7 @@ 

    3. Bar Plots

    4. Box Plots

    Useful for showing distribution of data across categories.

    -
    +
    Code
    # Basic box plot
    @@ -652,7 +652,7 @@ 

    4. Box Plots

    -
    +
    Code
    # Add individual data points
    @@ -674,7 +674,7 @@ 

    4. Box Plots

    5. Violin Plots

    Similar to box plots but show the full distribution of data.

    -
    +
    Code
    plt.figure(figsize=(10, 6))
    @@ -694,7 +694,7 @@ 

    5. Violin Plots

    6. Heatmaps

    Excellent for visualizing correlation matrices or gridded data.

    -
    +
    Code
    # Load the penguins dataset
    @@ -717,7 +717,7 @@ 

    6. Heatmaps

    4 3450.0 Female
    -
    +
    Code
    # Correlation heatmap
    @@ -744,7 +744,7 @@ 

    Quick Data Overview

    Recall the structure of the penguins dataframe, which has a combination of measured and categorical values:

    -
    +
    Code
    print(penguins.head())
    @@ -766,7 +766,7 @@

    Quick Data Overview

    We can explore the distribution of every numerical variable as well as the pair-wise relationship between all the variables in a dataframe using pairplot and can use a categorical variable to further organize the data within each plot using the hue argument.

    -
    +
    Code
    # Get a quick overview of numerical variables
    @@ -782,7 +782,7 @@ 

    Quick Data Overview

    -
    +
    Code
    # Visualize distributions of all numerical variables
    @@ -801,7 +801,7 @@ 

    Quick Data Overview

    Exploring Relationships

    -
    +
    Code
    # Explore relationship between variables
    @@ -817,7 +817,7 @@ 

    Exploring Relation

    -
    +
    Code
    # Facet plots for multi-dimensional exploration
    @@ -838,7 +838,7 @@ 

    Exploring Relation

    Categorical Data Exploration

    -
    +
    Code
    # Compare distributions across categories
    @@ -854,7 +854,7 @@ 

    Categorical D

    -
    +
    Code
    # Count plots for categorical variables
    @@ -873,7 +873,7 @@ 

    Categorical D

    Time Series Exploration

    -
    +
    Code
    # Visualize trends over time
    diff --git a/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-10-output-1.png b/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-10-output-1.png
    index 509edd3..41302ff 100644
    Binary files a/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-10-output-1.png and b/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-10-output-1.png differ
    diff --git a/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-11-output-1.png b/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-11-output-1.png
    index f0261c5..db1074e 100644
    Binary files a/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-11-output-1.png and b/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-11-output-1.png differ
    diff --git a/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-7-output-1.png b/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-7-output-1.png
    index db90035..9229072 100644
    Binary files a/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-7-output-1.png and b/docs/course-materials/cheatsheets/seaborn_files/figure-html/cell-7-output-1.png differ
    diff --git a/docs/course-materials/cheatsheets/sets.html b/docs/course-materials/cheatsheets/sets.html
    index d7f080c..923f4b7 100644
    --- a/docs/course-materials/cheatsheets/sets.html
    +++ b/docs/course-materials/cheatsheets/sets.html
    @@ -428,7 +428,7 @@ 

    On this page

    Creating Sets

    -
    +
    Code
    # Empty set
    @@ -452,7 +452,7 @@ 

    Creating Sets

    Basic Operations

    -
    +
    Code
    s = {1, 2, 3, 4, 5}
    @@ -495,7 +495,7 @@ 

    Basic Operations

    Set Methods

    -
    +
    Code
    a = {1, 2, 3}
    @@ -510,7 +510,7 @@ 

    Set Methods

    Union

    -
    +
    Code
    union_set = a.union(b)
    @@ -523,7 +523,7 @@ 

    Union

    Intersection

    -
    +
    Code
    intersection_set = a.intersection(b)
    @@ -536,7 +536,7 @@ 

    Intersection

    Difference

    -
    +
    Code
    difference_set = a.difference(b)
    @@ -549,7 +549,7 @@ 

    Difference

    Symmetric difference

    -
    +
    Code
    symmetric_difference_set = a.symmetric_difference(b)
    @@ -562,7 +562,7 @@ 

    Symmetric difference<

    Subset and superset

    -
    +
    Code
    is_subset = a.issubset(b)
    diff --git a/docs/course-materials/coding-colabs/3b_control_flows.html b/docs/course-materials/coding-colabs/3b_control_flows.html
    index a0f9802..e82dc40 100644
    --- a/docs/course-materials/coding-colabs/3b_control_flows.html
    +++ b/docs/course-materials/coding-colabs/3b_control_flows.html
    @@ -470,7 +470,7 @@ 

    Task 1: Simpl
  • Otherwise, print “Enjoy the pleasant weather!”
  • -
    +
    temperature = 20
     
     # Your code here
    @@ -491,7 +491,7 @@ 

    Task 2: Grade Clas
  • Below 60: “F”
  • -
    +
    score = 85
     
     # Your code here
    @@ -508,7 +508,7 @@ 

    Task 3: Counting She
  • Use a for loop with the range() function
  • Print each number followed by “sheep”
  • -
    +
    # Your code here
     # Use a for loop to count sheep
    @@ -521,7 +521,7 @@

    Task 4: Sum of Numbe
  • Use a for loop with the range() function to add each number to total
  • After the loop, print the total
  • -
    +
    total = 0
     
     # Your code here
    @@ -540,7 +540,7 @@ 

    Task 5: Countdown

  • After each print, decrease the countdown by 1
  • When the countdown reaches 0, print “Blast off!”
  • -
    +
    countdown = 5
     
     # Your code here
    diff --git a/docs/course-materials/coding-colabs/3d_pandas_series.html b/docs/course-materials/coding-colabs/3d_pandas_series.html
    index eaf5da7..c92fe65 100644
    --- a/docs/course-materials/coding-colabs/3d_pandas_series.html
    +++ b/docs/course-materials/coding-colabs/3d_pandas_series.html
    @@ -440,7 +440,7 @@ 

    Resources

    Setup

    First, let’s import the necessary libraries and create a sample Series.

    -
    +
    import pandas as pd
     import numpy as np
     
    @@ -460,7 +460,7 @@ 

    Setup

    Exercise 1: Creating a Series

    Work together to create a Series representing the prices of the fruits in our fruits Series.

    -
    +
    # Your code here
     # Create a Series called 'prices' with the same index as 'fruits'
     # Use these prices: apple: $0.5, banana: $0.3, cherry: $1.0, date: $1.5, elderberry: $2.0
    @@ -474,7 +474,7 @@

    Exercise 2: S
  • Find the most expensive fruit.
  • Apply a 10% discount to all fruits priced over $1.0.
  • -
    +
    # Your code here
     # 1. Calculate the total price of all fruits
     # 2. Find the most expensive fruit
    @@ -489,7 +489,7 @@ 

    Exercise 3: Ser
  • How many fruits cost less than $1.0?
  • What is the price range (difference between max and min prices)?
  • -
    +
    # Your code here
     # 1. Calculate the average price of the fruits
     # 2. Count how many fruits cost less than $1.0
    @@ -504,7 +504,7 @@ 

    Exercise 4:
  • Remove ‘banana’ from both Series.
  • Sort both Series by fruit name (alphabetically).
  • -
    +
    # Your code here
     # 1. Add 'fig' to both Series (price: $1.2)
     # 2. Remove 'banana' from both Series
    diff --git a/docs/course-materials/coding-colabs/4b_pandas_dataframes.html b/docs/course-materials/coding-colabs/4b_pandas_dataframes.html
    index 412ed40..eb218d2 100644
    --- a/docs/course-materials/coding-colabs/4b_pandas_dataframes.html
    +++ b/docs/course-materials/coding-colabs/4b_pandas_dataframes.html
    @@ -436,7 +436,7 @@ 

    Introduction

    Setup

    First, let’s import the necessary libraries and load our dataset.

    -
    +
    Code
    import pandas as pd
    diff --git a/docs/course-materials/coding-colabs/5c_cleaning_data.html b/docs/course-materials/coding-colabs/5c_cleaning_data.html
    index d3b8d63..9f81bfe 100644
    --- a/docs/course-materials/coding-colabs/5c_cleaning_data.html
    +++ b/docs/course-materials/coding-colabs/5c_cleaning_data.html
    @@ -435,7 +435,7 @@ 

    Resources

    Setup

    First, let’s import the necessary libraries and load an example messy dataframe.

    -
    +
    import pandas as pd
     import numpy as np
     
    diff --git a/docs/course-materials/coding-colabs/6b_advanced_data_manipulation.html b/docs/course-materials/coding-colabs/6b_advanced_data_manipulation.html
    index f9a393d..2a86f72 100644
    --- a/docs/course-materials/coding-colabs/6b_advanced_data_manipulation.html
    +++ b/docs/course-materials/coding-colabs/6b_advanced_data_manipulation.html
    @@ -440,7 +440,7 @@ 

    Learning Objectives

    Setup

    Let’s start by importing necessary libraries and loading our datasets:

    -
    +
    Code
    import pandas as pd
    diff --git a/docs/course-materials/coding-colabs/7c_visualizations.html b/docs/course-materials/coding-colabs/7c_visualizations.html
    index a664646..979d778 100644
    --- a/docs/course-materials/coding-colabs/7c_visualizations.html
    +++ b/docs/course-materials/coding-colabs/7c_visualizations.html
    @@ -437,7 +437,7 @@ 

    Introduction

    Setup

    First, let’s import the necessary libraries and load our dataset.

    -
    +
    Code
    import pandas as pd
    diff --git a/docs/course-materials/day8.html b/docs/course-materials/day8.html
    index 3fb6c77..17e99c7 100644
    --- a/docs/course-materials/day8.html
    +++ b/docs/course-materials/day8.html
    @@ -379,6 +379,7 @@ 

    On this page

    @@ -426,6 +427,10 @@

    Class materials

    +
    +

    Syncing your classwork to Github

    +

    Here are some directions for syncing your classwork with a GitHub repository

    +

    End-of-day practice

    There are no additional end-of-day tasks / activities today!

    diff --git a/docs/course-materials/day9.html b/docs/course-materials/day9.html index 96d9974..46c5983 100644 --- a/docs/course-materials/day9.html +++ b/docs/course-materials/day9.html @@ -379,6 +379,7 @@

    On this page

    @@ -426,6 +427,10 @@

    Class materials

    +
    +

    Syncing your classwork to Github

    +

    Here are some directions for syncing your classwork with a GitHub repository

    +

    End-of-day practice

    End of Class! Congratulations!!

    diff --git a/docs/course-materials/eod-practice/eod-day2.html b/docs/course-materials/eod-practice/eod-day2.html index 55a1189..9f3074f 100644 --- a/docs/course-materials/eod-practice/eod-day2.html +++ b/docs/course-materials/eod-practice/eod-day2.html @@ -459,7 +459,7 @@

    Learning Objectives

    Setup

    First, let’s import the necessary libraries:

    -
    +
    Code
    # We won't use the random library until the end of this exercise, 
    @@ -474,7 +474,7 @@ 

    Part 1: Data Collec

    Task 1: Create a List of Classmates

    Create a list containing the names of at least 4 of your classmates in this course.

    -
    +
    Code
    # Your code here
    @@ -489,7 +489,7 @@

    +
    Code
    # Your code here
    @@ -509,7 +509,7 @@

    Task 3: List Operat
  • Sort the list alphabetically
  • Find and print the index of a specific classmate
  • -
    +
    Code
    # Your code here
    @@ -524,7 +524,7 @@

    Task 4: Dicti
  • Update the “number of pets” for one classmate
  • Create a list of all the favorite colors your classmates mentioned
  • -
    +
    Code
    # Your code here
    @@ -538,7 +538,7 @@

    Part 3:

    Example: Random Selection from a Dictionary

    Here’s a simple example of how to select random items from a dictionary:

    -
    +
    Code
    import random
    @@ -567,10 +567,10 @@ 

    print(f"Randomly selected {num_selections} fruits: {random_fruits}")

    -
    Randomly selected fruit: apple
    -Its color: red
    -Another randomly selected fruit: banana
    -Randomly selected 3 fruits: ['kiwi', 'banana', 'orange']
    +
    Randomly selected fruit: grape
    +Its color: purple
    +Another randomly selected fruit: apple
    +Randomly selected 3 fruits: ['kiwi', 'grape', 'apple']

    This example demonstrates how to:

    diff --git a/docs/course-materials/eod-practice/eod-day3.html b/docs/course-materials/eod-practice/eod-day3.html index 05a65ab..f675f9e 100644 --- a/docs/course-materials/eod-practice/eod-day3.html +++ b/docs/course-materials/eod-practice/eod-day3.html @@ -447,7 +447,7 @@

    Introduction

    Setup

    First, let’s import the necessary libraries and set up our environment.

    -
    +
    Code
    import pandas as pd
    @@ -461,7 +461,7 @@ 

    Creating a Random Number Generator

    We can create a random number generator object like this:

    -
    +
    Code
    rng = np.random.default_rng()
    @@ -472,7 +472,7 @@

    Creatin

    Using a Seed for Reproducibility

    In data science, it’s often crucial to be able to reproduce our results. We can do this by setting a seed for our random number generator. Here’s how:

    -
    +
    Code
    rng = np.random.default_rng(seed=42)
    @@ -511,7 +511,7 @@

    Exploring Reproducibility

    To demonstrate the importance of seeding, try creating two series with different random number generators:

    -
    +
    Code
    rng1 = np.random.default_rng(seed=42)
    @@ -527,7 +527,7 @@ 

    Exploring Reprod

    Now try creating two series with random number generators that have different seeds:

    -
    +
    Code
    rng3 = np.random.default_rng(seed=42)
    diff --git a/docs/course-materials/eod-practice/eod-day4.html b/docs/course-materials/eod-practice/eod-day4.html
    index 4a55afd..99c54a9 100644
    --- a/docs/course-materials/eod-practice/eod-day4.html
    +++ b/docs/course-materials/eod-practice/eod-day4.html
    @@ -455,7 +455,7 @@ 

    Introduction

    This end-of-day session is focused on using pandas for loading, visualizing, and analyzing marine microplastics data. This session is designed to help you become more comfortable with the pandas library, equipping you with the skills needed to perform data analysis effectively.

    The National Oceanic and Atmospheric Administration, via its National Centers for Environmental Information has an entire section related to marine microplastics – that is, microplastics found in water — at https://www.ncei.noaa.gov/products/microplastics.

    We will be working with a recent download of the entire marine microplastics dataset. The url for this data is located here:

    -
    +
    Code
    url = 'https://ucsb.box.com/shared/static/dnnu59jsnkymup6o8aaovdywrtxiy3a9.csv'
    diff --git a/docs/course-materials/eod-practice/eod-day5.html b/docs/course-materials/eod-practice/eod-day5.html index ec7ecac..18e0233 100644 --- a/docs/course-materials/eod-practice/eod-day5.html +++ b/docs/course-materials/eod-practice/eod-day5.html @@ -447,7 +447,7 @@

    Reference:

    Setup

    First, let’s import the necessary libraries and load the data:

    -
    +
    Code
    import pandas as pd
    @@ -459,7 +459,7 @@ 

    Setup

    df = pd.read_csv(url)
    -
    +
    Code
    # Display the first few rows:
    @@ -502,7 +502,7 @@ 

    Setup

    4 3.775280 True 1 NaN NaN
    -
    +
    Code
    # Display the dataframe info:
    @@ -553,7 +553,7 @@ 

    2. Exploring Banan
  • For each of the pre-computed banana score columns (kg, calories, and protein), show the 10 highest-scoring food products.

  • Edit the function below so that is returns the top 10 scores for a given column:

  • -
    +
    Code
    def return_top_ten(df, column):
    diff --git a/docs/course-materials/eod-practice/eod-day6.html b/docs/course-materials/eod-practice/eod-day6.html
    index cd8a584..41f95fb 100644
    --- a/docs/course-materials/eod-practice/eod-day6.html
    +++ b/docs/course-materials/eod-practice/eod-day6.html
    @@ -430,7 +430,7 @@ 

    On this page

    Setup

    First, import the necessary libraries and load the dataset:

    -
    +
    Code
    import pandas as pd
    @@ -532,7 +532,7 @@ 

    Task 5: Joining Data
  • Read in a new dataframe that contains population data stored at this url:
  • -
    +
    Code
    population_url = 'https://bit.ly/euro_pop'
    diff --git a/docs/course-materials/final_project.html b/docs/course-materials/final_project.html index 634bfb3..0addc86 100644 --- a/docs/course-materials/final_project.html +++ b/docs/course-materials/final_project.html @@ -410,7 +410,8 @@

    On this page

    Additional figures and graphics are also welcome - you are encouraged to make your notebooks as engaging and visually interesting as possible.

    -

    Here are some links to potential data resources that you can use to develop your analyses:

    +

    +
    +

    Syncing your data to Github

    +

    Here are some directions for syncing your classwork with GitHub

    General places to find fun data

    +

    Here are some links to potential data resources that you can use to develop your analyses:

    • Kaggle
    • Data is Plural
    • @@ -503,7 +508,7 @@

      +
      Code
      import pandas as pd
      diff --git a/docs/course-materials/interactive-sessions/1b_Jupyter_Notebooks.html b/docs/course-materials/interactive-sessions/1b_Jupyter_Notebooks.html
      index a99ebe3..cdd867d 100644
      --- a/docs/course-materials/interactive-sessions/1b_Jupyter_Notebooks.html
      +++ b/docs/course-materials/interactive-sessions/1b_Jupyter_Notebooks.html
      @@ -572,7 +572,7 @@ 

      Rendering Images

      Jupyter Notebooks can render images directly in the output cells, which is particularly useful for data visualization.

      Example: Displaying an Image

      -
      +
      Code
      from IPython.display import Image, display
      @@ -593,7 +593,7 @@ 

      Interactive Features<

      Example: Using Interactive Widgets

      Widgets allow users to interact with your code and visualize results dynamically.

      -
      +
      Code
      import ipywidgets as widgets
      @@ -604,7 +604,7 @@ 

      Example:

      @@ -698,7 +698,7 @@

      6. Google Colab + + + + + + + + + + + + + + + + + + + + + + + + + +
      +
      + +
      + +
      +
      +
      +

      Interactive Session

      +

      Creating a GitHub Repository for Your EDS 217 Course Work

      +
      +
      + + +
      + + + + +
      + + +
      +
      + +
      + + + + + \ No newline at end of file diff --git a/docs/course-materials/lectures/05_Drop_or_Impute.html b/docs/course-materials/lectures/05_Drop_or_Impute.html index 241c209..c10aaa9 100644 --- a/docs/course-materials/lectures/05_Drop_or_Impute.html +++ b/docs/course-materials/lectures/05_Drop_or_Impute.html @@ -433,7 +433,7 @@

      Missing Completely at Random (MCAR)

      MCAR Example (Assigning nan randomly)

      -
      +
      import pandas as pd
       import numpy as np
       
      @@ -460,7 +460,7 @@ 

      Missing at Random (MAR)

      MAR Example (Assigning nan randomly, filtered on column value)

      -
      +
      # Create sample data with MAR
       np.random.seed(42)
       df = pd.DataFrame({
      @@ -508,7 +508,7 @@ 

      Dropping Missing Data

      Drop Example

      -
      +
      # Dropping missing data
       df_dropped = df.dropna()
       print(f"Original shape: {df.shape}, After dropping: {df_dropped.shape}")
      @@ -538,7 +538,7 @@

      Imputation

      Imputation Example

      -
      +
      # Simple mean imputation
       df_imputed = df.fillna(df.mean())
       print(f"Original missing: {df['Income'].isnull().sum()}, After imputation: {df_imputed['Income'].isnull().sum()}")
      diff --git a/docs/course-materials/lectures/05_Session_1A.html b/docs/course-materials/lectures/05_Session_1A.html index 6072853..f0fe6c0 100644 --- a/docs/course-materials/lectures/05_Session_1A.html +++ b/docs/course-materials/lectures/05_Session_1A.html @@ -430,7 +430,7 @@

      Structure of a GroupBy Object

      GroupBy Example

      -
      +
      import pandas as pd
       
       df = pd.DataFrame({
      @@ -495,7 +495,7 @@ 

      When to Use .copy()

      .copy() Example

      -
      +
      # Filtering creates a view
       df_view = df[df['Category'] == 'A']
       df_view['Value'] += 10  # This modifies the original df!
      diff --git a/docs/course-materials/lectures/pandas_workflow.html b/docs/course-materials/lectures/pandas_workflow.html
      index 6ddcaa4..7835383 100644
      --- a/docs/course-materials/lectures/pandas_workflow.html
      +++ b/docs/course-materials/lectures/pandas_workflow.html
      @@ -441,7 +441,7 @@ 

      Common Operations

      Example: Loading a Dataframe

      -
      +
      import pandas as pd
       
       # URL of the CSV file (using parentheses to span multiple lines)
      @@ -455,7 +455,7 @@ 

      Example: Loading a Dataframe

      -
      +
      # Display the first few rows of the DataFrame
       print(df.head())
      @@ -498,7 +498,7 @@

      Common Operations

      An example DataFrame

      -
      +
      import pandas as pd
       
       data = {
      @@ -515,12 +515,12 @@ 

      An example DataFrame

      Example: Renaming a Column

      Code:

      -
      +
      cleaned = df.rename(columns={'weight': 'weight_kg'})

      🐍🧠

      Result:

      -
      +
        type      name  age  weight_kg
       0  Dog       Rex    5       20.0
      @@ -535,12 +535,12 @@ 

      Result:

      Example: Filling missing data

      Code:

      -
      +
      cleaned = cleaned.fillna({'weight_kg': 7})

      🐍🧠

      Result:

      -
      +
        type      name  age  weight_kg
       0  Dog       Rex    5       20.0
      @@ -567,12 +567,12 @@ 

      Common Operations

      Transformation Example: Adding a New Column for Age in Months

      Code:

      -
      +
      transformed = cleaned.assign(age_months=cleaned['age'] * 12)

      🐍🧠

      Result:

      -
      +
        type      name  age  weight_kg  age_months
       0  Dog       Rex    5       20.0          60
      @@ -598,7 +598,7 @@ 

      Common Operations

      Combining Example: Merging with a Second Dataframe

      Code:

      -
      +
      toys = pd.DataFrame({
           'name': ['Spot', 'Mittens', 'Buddy', 'Whiskers', 'Rex'],
           'number_of_toys': [6, 3, 5, 8, 2]
      @@ -608,7 +608,7 @@ 

      Code:

      🐍🧠

      Result:

      -
      +
        type      name  age  weight_kg  age_months  number_of_toys
       0  Dog       Rex    5       20.0          60               2
      @@ -635,13 +635,13 @@ 

      Common Operations

      Grouping Example: Grouping by Type

      Code:

      -
      +
      grouped = combined.groupby('type')

      🐍🧠

      Result:

      Note: GroupBy objects cannot be directly visualized.

      -
      +
      Group: Cat
         type      name  age  weight_kg  age_months  number_of_toys
      @@ -662,12 +662,12 @@ 

      Result:

      Aggregation Example: Who Has More Toys? Cats or Dogs?

      -
      +
      aggregated = grouped.agg(average_toys=('number_of_toys', 'mean'))

      🐍🧠

      Result:

      -
      +
            average_toys
       type              
      @@ -690,11 +690,11 @@ 

      Common Operations

      Example: Descriptive Statistics, describe()

      Code

      -
      +
      summary_stats = combined.describe()

      Result

      -
      +
                  age  weight_kg  age_months  number_of_toys
       count  5.000000   5.000000    5.000000        5.000000
      @@ -712,11 +712,11 @@ 

      Result

      Example: Descriptive Statistics, value_counts()

      Code

      -
      +
      counts = combined['type'].value_counts()

      Result

      -
      +
      type
       Dog    3
      @@ -750,7 +750,7 @@ 

      Introduction to Method Chaining

      A Complete Workflow Example Using “Method Chaining”

      1. Setting up our data (you should already have created these)

      -
      +
      import pandas as pd
       from pandas import DataFrame, Series
       
      @@ -771,7 +771,7 @@ 

      1. Setting up

      Complete Workflow Example Using Method Chaining

      Using method chaining to combine operations

      -
      +
      df = DataFrame(data) # read    
       processed = (df        
                       .rename(columns={'weight': 'weight_kg'})     # clean
      @@ -784,7 +784,7 @@ 

      Using method chaining to co

      🐍🧠

      Result

      -
      +
            average_toys
       type              
      diff --git a/docs/course-materials/lectures/seaborn.html b/docs/course-materials/lectures/seaborn.html
      index 42f2f23..375cf5b 100644
      --- a/docs/course-materials/lectures/seaborn.html
      +++ b/docs/course-materials/lectures/seaborn.html
      @@ -464,7 +464,7 @@ 

      Themes (continued)

      Getting ready to Seaborn

      Import the library and set a style

      -
      +
      import seaborn as sns # (but now you know it should have been ssn 🤓)
       sns.set(style="darkgrid") # This is the default, so skip it if wanted
      @@ -472,7 +472,7 @@

      Import the library and set a style

      Relational Plots (relplot)

      -
      +
      # Load the built-in example tips dataset
       tips = sns.load_dataset("tips")
       
      @@ -490,7 +490,7 @@ 

      Relational Plots (relplot)

      Categorical Plots (catplot)

      -
      +
      # Load the example titanic dataset
       titanic = sns.load_dataset("titanic")
       # Create a categorical plot
      @@ -508,7 +508,7 @@ 

      Categorical Plots (catplot)

      Categorical Plots (catplot)

      The palette keyword specifies the category color scale

      -
      +
      # Create a categorical plot
       sns.catplot(x="deck", kind="count", palette="coolwarm", data=titanic)
      @@ -523,7 +523,7 @@

      Categorical Plots (catplot)

      Distribution Plots (displot)

      -
      +
      # Create a distribution plot
       sns.set(style="ticks") 
       sns.displot(tips['total_bill'], kde=True)
      @@ -539,7 +539,7 @@

      Distribution Plots (displot)

      Regression Plots (regplot)

      -
      +
      sns.set(style="dark") 
       # Create a regression plot
       sns.regplot(x="total_bill", y="tip", data=tips)
      diff --git a/docs/course-materials/lectures/seaborn_files/figure-revealjs/cell-7-output-1.png b/docs/course-materials/lectures/seaborn_files/figure-revealjs/cell-7-output-1.png index 30d2c7a..b3df462 100644 Binary files a/docs/course-materials/lectures/seaborn_files/figure-revealjs/cell-7-output-1.png and b/docs/course-materials/lectures/seaborn_files/figure-revealjs/cell-7-output-1.png differ diff --git a/docs/course-materials/live-coding/2d_list_comprehensions_notes.html b/docs/course-materials/live-coding/2d_list_comprehensions_notes.html index 3a38839..599bdf4 100644 --- a/docs/course-materials/live-coding/2d_list_comprehensions_notes.html +++ b/docs/course-materials/live-coding/2d_list_comprehensions_notes.html @@ -489,7 +489,7 @@

      Example 1: Creating a List with a for Loop

      -
      +
      # Traditional approach to creating a list of squares
       squares = []
       for i in range(1, 6):
      @@ -506,7 +506,7 @@ 

      Using zip

      -
      +
      # Lists of Roman numerals and their Arabic equivalents
       roman = ['I', 'II', 'III', 'IV', 'V']
       arabic = [1, 2, 3, 4, 5]
      @@ -524,7 +524,7 @@ 

      Using zip

      Without Using zip

      -
      +
      # Traditional approach without zip, using index
       roman_to_arabic = {}
       for i in range(len(roman)):
      @@ -544,7 +544,7 @@ 

      Example 1: List Comprehension for Squares

      -
      +
      # List comprehension for generating squares
       squares = [i ** 2 for i in range(1, 6)]
       
      @@ -558,7 +558,7 @@ 

      E

      Practice List Comprehension

      Instruction: Ask students to write a list comprehension that generates a list of cubes for numbers from 1 to 5.

      Answer Code:

      -
      +
      # Answer: List comprehension for generating cubes
       cubes = [i ** 3 for i in range(1, 6)]
       
      @@ -577,7 +577,7 @@ 

      Example 1: Dictionary Comprehension for Roman to Arabic Conversion

      Using zip

      -
      +
      # Dictionary comprehension for mapping Roman numerals to Arabic numbers using zip
       roman_to_arabic = {r: a for r, a in zip(roman, arabic)}
       
      @@ -589,7 +589,7 @@ 

      Using zip

      Without Using zip

      -
      +
      # Dictionary comprehension without zip, using index
       roman_to_arabic = {roman[i]: arabic[i] for i in range(len(roman))}
       
      @@ -604,7 +604,7 @@ 

      Without Using zi

      Practice Dictionary Comprehension

      Instruction: Ask students to create a dictionary comprehension that maps the first five letters of the alphabet (‘A’, ‘B’, ‘C’, ‘D’, ‘E’) to their corresponding positions in the alphabet (1, 2, 3, 4, 5).

      Answer Code:

      -
      +
      # Answer: Dictionary comprehension for mapping letters to positions
       letters = ['A', 'B', 'C', 'D', 'E']
       positions = [1, 2, 3, 4, 5]
      @@ -630,7 +630,7 @@ 

      Example 1: List Comprehension with a Condition

      -
      +
      # List comprehension with a condition for even squares
       squares_gt_10 = [i ** 2 for i in range(1, 6) if i > 10]
       
      @@ -642,7 +642,7 @@ 

      Example 2: Dictionary Comprehension with a Condition

      -
      +
      # Dictionary comprehension with a condition for selecting even numbers
       no_v_roman_to_arabic = {r: a for r, a in zip(roman, arabic) if 'V' not in r}
       
      @@ -656,7 +656,7 @@ 

      Practice with Conditions

      Instruction: Ask students to write a list comprehension that generates a list of cubes for odd numbers from 1 to 5.

      Answer Code:

      -
      +
      # Answer: List comprehension for generating cubes of odd numbers
       odd_cubes = [i ** 3 for i in range(1, 6) if i % 2 != 0]
       
      diff --git a/docs/course-materials/live-coding/3a_control_flows_notes.html b/docs/course-materials/live-coding/3a_control_flows_notes.html
      index e680fda..1d44425 100644
      --- a/docs/course-materials/live-coding/3a_control_flows_notes.html
      +++ b/docs/course-materials/live-coding/3a_control_flows_notes.html
      @@ -453,7 +453,7 @@ 

      Conditionals

      Basic If Statement

      Teacher’s Note: Start with a simple if statement to check a numerical condition.

      -
      +
      number = 5
       if number > 0:
           print("Positive number")
      @@ -466,7 +466,7 @@

      Basic If Statement

      Adding Else

      Teacher’s Note: Introduce the else statement to handle cases not met by the if condition.

      -
      +
      if number > 0:
           print("Positive number")
       else:
      @@ -480,7 +480,7 @@ 

      Adding Else

      Using Elif

      Teacher’s Note: Use elif to introduce a third logical condition.

      -
      +
      number = 0
       if number > 0:
           print("Positive number")
      @@ -501,7 +501,7 @@ 

      Loops

      For Loops

      Teacher’s Note: Demonstrate a for loop with a list.

      -
      +
      fruits = ['apple', 'banana', 'cherry']
       for fruit in fruits:
           print(fruit)
      @@ -516,7 +516,7 @@

      For Loops

      While Loops

      Teacher’s Note: Show a while loop with a countdown.

      -
      +
      count = 5
       while count > 0:
           print(count)
      @@ -538,7 +538,7 @@ 

      App

      Example: Filtering Data

      Teacher’s Note: Apply a practical data filtering example using pandas.

      -
      +
      import pandas as pd
       data = pd.DataFrame({
           'Temperature': [18, 21, 24, 19, 17],
      diff --git a/docs/course-materials/live-coding/4d_data_import_export.html b/docs/course-materials/live-coding/4d_data_import_export.html
      index 921e458..7ee5f03 100644
      --- a/docs/course-materials/live-coding/4d_data_import_export.html
      +++ b/docs/course-materials/live-coding/4d_data_import_export.html
      @@ -471,7 +471,7 @@ 

      Step 1: Creat

      Step 2: Import Required Libraries

      In the first cell of your notebook, import the necessary libraries:

      -
      +
      import pandas as pd
       import numpy as np
      @@ -479,7 +479,7 @@

      Step 2: I

      Step 3: Set Up Data URLs

      To ensure we’re all working with the same data, copy and paste the following URLs into a new code cell and run the cell (SHIFT-ENTER):

      -
      +
      # URLs for different CSV files we'll be using
       url_basic = 'https://bit.ly/eds217-basic'
       url_missing = 'https://bit.ly/eds217-missing'
      diff --git a/docs/course-materials/live-coding/4d_data_import_export_notes.html b/docs/course-materials/live-coding/4d_data_import_export_notes.html
      index bcc1ed8..d1506ea 100644
      --- a/docs/course-materials/live-coding/4d_data_import_export_notes.html
      +++ b/docs/course-materials/live-coding/4d_data_import_export_notes.html
      @@ -429,7 +429,7 @@ 

      On this page

      Introduction (5 minutes)

      Good morning/afternoon, everyone! Today, we’re going to dive deep into one of the most fundamental functions in pandas: pd.read_csv(). This function is your gateway to working with CSV data in Python, and mastering it will significantly boost your data analysis capabilities.

      Let’s start by importing pandas and setting up our environment:

      -
      +
      Code
      import pandas as pd
      @@ -440,7 +440,7 @@ 

      Introduction (5 min

      Basic Usage and Column Selection (10 minutes)

      Let’s begin with the basics. We’ll use the ‘basic_data.csv’ file for this part.

      -
      +
      Code
      # URL for the basic data CSV
      @@ -476,7 +476,7 @@ 

      +
      Code
      # Read only the 'Name' and 'Age' columns
      @@ -496,7 +496,7 @@ 

      Handling Missing Data (10 minutes)

      Next, we’ll look at how to handle missing data using the ‘missing_values.csv’ file.

      -
      +
      Code
      # URL for the missing values CSV
      @@ -539,7 +539,7 @@ 

      Handling

      Let’s see how we can handle these missing values:

      -
      +
      Code
      # Fill NA values in the 'Age' column with the median age
      @@ -563,7 +563,7 @@ 

      Handling

      Parsing Dates (10 minutes)

      Now, let’s work with dates using the ‘date_data.csv’ file.

      -
      +
      Code
      # URL for the date data CSV
      @@ -589,7 +589,7 @@ 

      Parsing Dates (10

      Let’s do some date-based analysis:

      -
      +
      Code
      # Extract year and month from the Date column
      @@ -613,7 +613,7 @@ 

      Parsing Dates (10

      Working with Files Without Headers (5 minutes)

      Now, let’s look at how to handle files without headers using the ‘no_header.csv’ file.

      -
      +
      Code
      # URL for the no header CSV
      @@ -637,7 +637,7 @@ 

      Working with Tab-Separated Values (TSV) Files (5 minutes)

      Sometimes, you’ll encounter files that use tabs as separators instead of commas. Let’s see how to handle TSV files:

      -
      +
      Code
      # URL for the TSV data
      @@ -674,7 +674,7 @@ 

      Handling Large Files: Reading a Subset of Data (5 minutes)

      When dealing with very large files, you might want to read only a portion of the data to get a quick overview or to work with a manageable subset. Let’s see how to do this:

      -
      +
      Code
      # URL for the large data file
      @@ -712,7 +712,7 @@ 

      +
      Code
      # Read 1000 rows, starting from row 5001
      @@ -783,7 +783,7 @@ 

      Conclusion and

      Bonus: Combining Techniques (if time permits)

      If we have extra time, let’s combine some of these techniques:

      -
      +
      Code
      # Read the first 100 rows of the dates CSV file, selecting specific columns and parsing dates
      diff --git a/docs/course-materials/live-coding/5a_selecting_and_filtering.html b/docs/course-materials/live-coding/5a_selecting_and_filtering.html
      index b87ba56..4269133 100644
      --- a/docs/course-materials/live-coding/5a_selecting_and_filtering.html
      +++ b/docs/course-materials/live-coding/5a_selecting_and_filtering.html
      @@ -440,7 +440,7 @@ 

      Objectives

      Getting Started

      We will be using the data stored in the csv at this url:

      -
      +
      url = 'https://bit.ly/eds217-studentdata'

      To get the most out of this session, please follow these guidelines:

      diff --git a/docs/course-materials/live-coding/5a_selecting_and_filtering_notes.html b/docs/course-materials/live-coding/5a_selecting_and_filtering_notes.html index 7b07692..9b08aa0 100644 --- a/docs/course-materials/live-coding/5a_selecting_and_filtering_notes.html +++ b/docs/course-materials/live-coding/5a_selecting_and_filtering_notes.html @@ -479,7 +479,7 @@

      On this page

      1. Setup

      First, let’s import pandas and load our dataset.

      -
      +
      Code
      import pandas as pd
      @@ -518,7 +518,7 @@ 

      1. Setup

      2. Basic Selection

      Let’s start with some basic selection techniques.

      -
      +
      Code
      # Select a single column
      @@ -577,7 +577,7 @@ 

      3. Filter

      Now, let’s explore filtering techniques.

      3a. Single Condition Filtering

      -
      +
      Code
      # Filter students with GPA above 3.7
      @@ -606,7 +606,7 @@ 

      3a. Single C

      3b. Multiple Conditions with Logical Operators

      -
      +
      Code
      # Filter students who are under 20 AND majoring in Mathematics
      @@ -650,7 +650,7 @@ 

      3c. Using the

      Filtering using like.

      Using like selects columns or rows based on the occurrence of a substring anywhere within a column or row (depending on the axis argument) name.

      -
      +
      Code
      # Filtering columns that contain 'student' in their name:
      @@ -697,7 +697,7 @@ 

      Filtering using

      Filtering with regex

      Regular Expressions (regex) can also be used in the filter command.

      -
      +
      Code
      # Filtering columns that end with 'e'
      @@ -736,7 +736,7 @@ 

      Filtering with regex<

      4. Combining Selection and Filtering

      Let’s combine selection and filtering techniques.

      -
      +
      Code
      # Select 'student_id' and 'major' for students with age < 21
      @@ -788,7 +788,7 @@ 

      4. Combi

      5. Using .isin() for Multiple Values

      The .isin() method is useful when we want to filter based on multiple possible values.

      -
      +
      Code
      # Filter students majoring in Engineering, Chemistry, or Physics
      @@ -851,7 +851,7 @@ 

      5. Using .

      6. Filtering with String Methods

      Pandas provides string methods that can be used for filtering text data.

      -
      +
      Code
      # Filter majors that contain the word 'Science'
      @@ -893,7 +893,7 @@ 

      7. Advanc
    • .loc - Label-based indexing: Uses labels/index values to select data Can use labels, boolean arrays, or slices of labels

    • .iloc - Integer-based indexing: Uses integer positions to select data Works with integer position-based indexing only

    -
    +
    Code
    # Using .loc with labels
    diff --git a/docs/course-materials/live-coding/dictionaries.html b/docs/course-materials/live-coding/dictionaries.html
    index b2e852e..38911bd 100644
    --- a/docs/course-materials/live-coding/dictionaries.html
    +++ b/docs/course-materials/live-coding/dictionaries.html
    @@ -443,7 +443,7 @@ 

    I. Introdu
  • Unordered and indexed by keys, making data access fast and efficient.
  • Live Code Example:

    -
    +
    example_dict = {'name': 'Earth', 'moons': 1}
     print("Example dictionary:", example_dict)
    @@ -462,7 +462,7 @@

    II
  • Accessing values using keys, demonstrating safe access with .get().
  • Live Code Example:

    -
    +
    # Creating a dictionary using dict()
     another_dict = dict(name='Mars', moons=2)
     print("Another dictionary (dict()):", another_dict)
    @@ -497,7 +497,7 @@ 

    III. Manipu
  • Removing items using del and pop().
  • Live Code Example:

    -
    +
    # Adding a new key-value pair
     another_dict['atmosphere'] = 'thin'
     print("Updated with atmosphere:", another_dict)
    @@ -528,7 +528,7 @@ 

    IV. Iterat
  • Using .keys(), .values(), and .items() for different iteration needs.
  • Live Code Example:

    -
    +
    # Creating a new dictionary for iteration examples
     iteration_dict = {'planet': 'Earth', 'moons': 1, 'orbit': 'Sun'}
     
    @@ -603,7 +603,7 @@ 

    +
    # Nested dictionary for environmental data
     environmental_data = {
       'Location A': {'temperature': 19, 'conditions': ['sunny', 'dry']},
    @@ -625,7 +625,7 @@ 

  • Using dictionaries to count occurrences and summarize data.
  • Live Code Example:

    -
    +
    weather_log = ['sunny', 'rainy', 'sunny', 'cloudy', 'sunny', 'rainy']
     weather_count = {}
     for condition in weather_log:
    diff --git a/docs/index.html b/docs/index.html
    index 2133427..09cd03a 100644
    --- a/docs/index.html
    +++ b/docs/index.html
    @@ -401,6 +401,10 @@ 

    Course Description

  • Collaborate with peers to solve group programming tasks, and communicate the process and results to the rest of the class

  • +
    +

    Syncing your classwork to Github

    +

    Here are some directions for syncing your classwork with a GitHub repository

    +

    Teaching Team


    diff --git a/docs/search.json b/docs/search.json index 483aac2..092aa66 100644 --- a/docs/search.json +++ b/docs/search.json @@ -312,7 +312,7 @@ "href": "course-materials/interactive-sessions/6a_grouping_joining_sorting.html#working-with-dates", "title": "Interactive Session 6A", "section": "4. Working with Dates", - "text": "4. Working with Dates\nDate manipulation is crucial for analyzing seasonal patterns, long-term trends, and time-sensitive events.\n\n\nCode\n# Set date as index\ndf.set_index('date', inplace=True)\nprint(df.head())\n\n# Resample to monthly data\nmonthly_counts = df.resample('M')['count'].sum()\nprint(monthly_counts.head())\n\n\n site species count temperature\ndate \n2023-01-01 Wetland Birch 7 9.043483\n2023-01-02 Forest Birch 1 18.282768\n2023-01-03 Wetland Birch 1 10.126592\n2023-01-04 Wetland Pine 13 18.935423\n2023-01-05 Forest Pine 9 20.792978\ndate\n2023-01-31 275\n2023-02-28 232\n2023-03-31 297\n2023-04-30 91\nFreq: ME, Name: count, dtype: int64\n\n\n/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_62317/881548494.py:6: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.\n monthly_counts = df.resample('M')['count'].sum()\n\n\n\nUnderstanding inplace=True\nThe inplace=True parameter modifies the original DataFrame directly:\n\n\nCode\n# Without inplace=True (creates a new DataFrame)\ndf_new = df.reset_index()\nprint(\"\\nNew DataFrame with reset index:\")\nprint(df_new.head())\nprint(\"\\nOriginal DataFrame (unchanged):\")\nprint(df.head())\n\n# With inplace=True (modifies the original DataFrame)\ndf.reset_index(inplace=True)\nprint(\"\\nOriginal DataFrame after reset_index(inplace=True):\")\nprint(df.head())\n\n\n\nNew DataFrame with reset index:\n date site species count temperature\n0 2023-01-01 Wetland Birch 7 9.043483\n1 2023-01-02 Forest Birch 1 18.282768\n2 2023-01-03 Wetland Birch 1 10.126592\n3 2023-01-04 Wetland Pine 13 18.935423\n4 2023-01-05 Forest Pine 9 20.792978\n\nOriginal DataFrame (unchanged):\n site species count temperature\ndate \n2023-01-01 Wetland Birch 7 9.043483\n2023-01-02 Forest Birch 1 18.282768\n2023-01-03 Wetland Birch 1 10.126592\n2023-01-04 Wetland Pine 13 18.935423\n2023-01-05 Forest Pine 9 20.792978\n\nOriginal DataFrame after reset_index(inplace=True):\n date site species count temperature\n0 2023-01-01 Wetland Birch 7 9.043483\n1 2023-01-02 Forest Birch 1 18.282768\n2 2023-01-03 Wetland Birch 1 10.126592\n3 2023-01-04 Wetland Pine 13 18.935423\n4 2023-01-05 Forest Pine 9 20.792978\n\n\nWhen to use inplace=True: - When preprocessing large datasets to save memory - In data cleaning pipelines for time series - When you’re sure you won’t need the original version of the data\nWhen not to use inplace=True: - When you need to preserve the original dataset for comparison - In functions where you want to return a modified copy without altering the input - When working with shared datasets that other parts of your analysis might depend on\n\n\nDate Filtering and Analysis\n\n\nCode\n# Filter by date range (e.g., spring season)\nspring_data = df[(df['date'] >= '2023-03-01') & (df['date'] < '2023-06-01')]\nprint(spring_data.head())\n\n# Extract date components\ndf['month'] = df['date'].dt.month\ndf['day_of_year'] = df['date'].dt.dayofyear\nprint(df.head())\n\n\n date site species count temperature\n59 2023-03-01 Wetland Birch 8 13.815907\n60 2023-03-02 Grassland Pine 7 12.573182\n61 2023-03-03 Forest Oak 18 15.409371\n62 2023-03-04 Grassland Birch 8 26.573293\n63 2023-03-05 Grassland Oak 1 5.663674\n date site species count temperature month day_of_year\n0 2023-01-01 Wetland Birch 7 9.043483 1 1\n1 2023-01-02 Forest Birch 1 18.282768 1 2\n2 2023-01-03 Wetland Birch 1 10.126592 1 3\n3 2023-01-04 Wetland Pine 13 18.935423 1 4\n4 2023-01-05 Forest Pine 9 20.792978 1 5\n\n\nWhen to use date manipulation: - Analyzing seasonal patterns - Studying time-specific events (e.g., flowering times, migration patterns) - Creating time-based features for models - Aligning climate data with other observations (resampling)" + "text": "4. Working with Dates\nDate manipulation is crucial for analyzing seasonal patterns, long-term trends, and time-sensitive events.\n\n\nCode\n# Set date as index\ndf.set_index('date', inplace=True)\nprint(df.head())\n\n# Resample to monthly data\nmonthly_counts = df.resample('M')['count'].sum()\nprint(monthly_counts.head())\n\n\n site species count temperature\ndate \n2023-01-01 Wetland Birch 7 9.043483\n2023-01-02 Forest Birch 1 18.282768\n2023-01-03 Wetland Birch 1 10.126592\n2023-01-04 Wetland Pine 13 18.935423\n2023-01-05 Forest Pine 9 20.792978\ndate\n2023-01-31 275\n2023-02-28 232\n2023-03-31 297\n2023-04-30 91\nFreq: ME, Name: count, dtype: int64\n\n\n/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85716/881548494.py:6: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.\n monthly_counts = df.resample('M')['count'].sum()\n\n\n\nUnderstanding inplace=True\nThe inplace=True parameter modifies the original DataFrame directly:\n\n\nCode\n# Without inplace=True (creates a new DataFrame)\ndf_new = df.reset_index()\nprint(\"\\nNew DataFrame with reset index:\")\nprint(df_new.head())\nprint(\"\\nOriginal DataFrame (unchanged):\")\nprint(df.head())\n\n# With inplace=True (modifies the original DataFrame)\ndf.reset_index(inplace=True)\nprint(\"\\nOriginal DataFrame after reset_index(inplace=True):\")\nprint(df.head())\n\n\n\nNew DataFrame with reset index:\n date site species count temperature\n0 2023-01-01 Wetland Birch 7 9.043483\n1 2023-01-02 Forest Birch 1 18.282768\n2 2023-01-03 Wetland Birch 1 10.126592\n3 2023-01-04 Wetland Pine 13 18.935423\n4 2023-01-05 Forest Pine 9 20.792978\n\nOriginal DataFrame (unchanged):\n site species count temperature\ndate \n2023-01-01 Wetland Birch 7 9.043483\n2023-01-02 Forest Birch 1 18.282768\n2023-01-03 Wetland Birch 1 10.126592\n2023-01-04 Wetland Pine 13 18.935423\n2023-01-05 Forest Pine 9 20.792978\n\nOriginal DataFrame after reset_index(inplace=True):\n date site species count temperature\n0 2023-01-01 Wetland Birch 7 9.043483\n1 2023-01-02 Forest Birch 1 18.282768\n2 2023-01-03 Wetland Birch 1 10.126592\n3 2023-01-04 Wetland Pine 13 18.935423\n4 2023-01-05 Forest Pine 9 20.792978\n\n\nWhen to use inplace=True: - When preprocessing large datasets to save memory - In data cleaning pipelines for time series - When you’re sure you won’t need the original version of the data\nWhen not to use inplace=True: - When you need to preserve the original dataset for comparison - In functions where you want to return a modified copy without altering the input - When working with shared datasets that other parts of your analysis might depend on\n\n\nDate Filtering and Analysis\n\n\nCode\n# Filter by date range (e.g., spring season)\nspring_data = df[(df['date'] >= '2023-03-01') & (df['date'] < '2023-06-01')]\nprint(spring_data.head())\n\n# Extract date components\ndf['month'] = df['date'].dt.month\ndf['day_of_year'] = df['date'].dt.dayofyear\nprint(df.head())\n\n\n date site species count temperature\n59 2023-03-01 Wetland Birch 8 13.815907\n60 2023-03-02 Grassland Pine 7 12.573182\n61 2023-03-03 Forest Oak 18 15.409371\n62 2023-03-04 Grassland Birch 8 26.573293\n63 2023-03-05 Grassland Oak 1 5.663674\n date site species count temperature month day_of_year\n0 2023-01-01 Wetland Birch 7 9.043483 1 1\n1 2023-01-02 Forest Birch 1 18.282768 1 2\n2 2023-01-03 Wetland Birch 1 10.126592 1 3\n3 2023-01-04 Wetland Pine 13 18.935423 1 4\n4 2023-01-05 Forest Pine 9 20.792978 1 5\n\n\nWhen to use date manipulation: - Analyzing seasonal patterns - Studying time-specific events (e.g., flowering times, migration patterns) - Creating time-based features for models - Aligning climate data with other observations (resampling)" }, { "objectID": "course-materials/interactive-sessions/6a_grouping_joining_sorting.html#using-df.apply-to-transform-data", @@ -473,14 +473,14 @@ "href": "course-materials/cheatsheets/random_numbers.html", "title": "EDS 217 Cheatsheet", "section": "", - "text": "For information on the previous np.random API and its use cases, please refer to the NumPy documentation on legacy random generation: NumPy Legacy Random Generation\nThis cheatsheet focuses on the modern Generator-based approach to random number generation in NumPy.\n\n\n\n\nCode\nimport numpy as np\n\n\n\n\n\n\n\nCode\n# Create a Generator with the default BitGenerator\nrng = np.random.default_rng()\n\n# Create a Generator with a specific seed\nrng_seeded = np.random.default_rng(seed=42)\n\n\n\n\n\n\n\n\n\nCode\n# Single random float\nprint(rng.random())\n\n# Array of random floats\nprint(rng.random(5))\n\n\n0.22092102589868978\n[0.28943778 0.09584537 0.78135092 0.97740274 0.44528883]\n\n\n\n\n\n\n\nCode\n# Single random integer from 0 to 10 (inclusive)\nprint(rng.integers(11))\n\n# Array of random integers from 1 to 100 (inclusive)\nprint(rng.integers(1, 101, size=5))\n\n\n4\n[74 44 73 84 90]\n\n\n\n\n\n\n\nCode\n# Single value from standard normal distribution\nprint(rng.standard_normal())\n\n# Array from normal distribution with mean=0, std=1\nprint(rng.normal(loc=0, scale=1, size=5))\n\n\n-0.8129275335805446\n[ 1.1815606 -1.15590009 0.26582165 0.43263691 0.75292281]\n\n\n\n\n\n\n\n\nCode\n# Random choice from an array\narr = np.array([1, 2, 3, 4, 5])\nprint(rng.choice(arr))\n\n# Random sample without replacement\nprint(rng.choice(arr, size=3, replace=False))\n\n\n4\n[4 3 1]\n\n\n\n\n\n\n\nCode\narr = np.arange(10)\nrng.shuffle(arr)\nprint(arr)\n\n\n[9 5 7 3 1 2 4 8 6 0]\n\n\n\n\n\nGenerators provide methods for many other distributions:\n\n\nCode\n# Poisson distribution\nprint(rng.poisson(lam=5, size=3))\n\n# Exponential distribution\nprint(rng.exponential(scale=1.0, size=3))\n\n# Binomial distribution\nprint(rng.binomial(n=10, p=0.5, size=3))\n\n\n[11 4 5]\n[0.16471365 0.67393881 0.09664034]\n[4 6 6]\n\n\n\n\n\nGenerators can fill existing arrays, which can be more efficient:\n\n\nCode\narr = np.empty(5)\nrng.random(out=arr)\nprint(arr)\n\n\n[0.51132826 0.60823193 0.59053934 0.87320215 0.3752571 ]\n\n\n\n\n\nYou can use different Bit Generators with varying statistical qualities:\n\n\nCode\nfrom numpy.random import PCG64, MT19937\n\nrng_pcg = np.random.Generator(PCG64())\nrng_mt = np.random.Generator(MT19937())\n\nprint(\"PCG64:\", rng_pcg.random())\nprint(\"MT19937:\", rng_mt.random())\n\n\nPCG64: 0.47047396600491753\nMT19937: 0.7397295290585331\n\n\n\n\n\n\n\nCode\n# Save state\nstate = rng.bit_generator.state\n\n# Generate some numbers\nprint(\"Original:\", rng.random(3))\n\n# Restore state and regenerate\nrng.bit_generator.state = state\nprint(\"Restored:\", rng.random(3))\n\n\nOriginal: [0.45477352 0.27105728 0.16726136]\nRestored: [0.45477352 0.27105728 0.16726136]\n\n\n\n\n\nYou can create independent generators from an existing one:\n\n\nCode\nchild1, child2 = rng.spawn(2)\nprint(\"Child 1:\", child1.random())\nprint(\"Child 2:\", child2.random())\n\n\nChild 1: 0.5815447472196176\nChild 2: 0.23081422202004886\n\n\n\n\n\nGenerators are designed to be thread-safe and support “jumping” ahead in the sequence:\n\n\nCode\nrng = np.random.Generator(PCG64())\nrng.bit_generator.advance(1000) # Jump ahead 1000 steps\n\n\n<numpy.random._pcg64.PCG64 at 0x10e2b0040>\n\n\n\n\n\n\nUse default_rng() to create a Generator unless you have specific requirements for a different Bit Generator.\nSet a seed for reproducibility in scientific computations and testing.\nUse the spawn() method to create independent generators for parallel processing.\nWhen performance is critical, consider using the out parameter to fill existing arrays.\nFor very long periods or when security is important, consider using the PCG64DXSM Bit Generator.\n\nRemember, Generators provide a more robust, flexible, and future-proof approach to random number generation in NumPy. They offer better statistical properties and are designed to work well in both single-threaded and multi-threaded environments." + "text": "For information on the previous np.random API and its use cases, please refer to the NumPy documentation on legacy random generation: NumPy Legacy Random Generation\nThis cheatsheet focuses on the modern Generator-based approach to random number generation in NumPy.\n\n\n\n\nCode\nimport numpy as np\n\n\n\n\n\n\n\nCode\n# Create a Generator with the default BitGenerator\nrng = np.random.default_rng()\n\n# Create a Generator with a specific seed\nrng_seeded = np.random.default_rng(seed=42)\n\n\n\n\n\n\n\n\n\nCode\n# Single random float\nprint(rng.random())\n\n# Array of random floats\nprint(rng.random(5))\n\n\n0.3278102376191233\n[0.16307633 0.40392116 0.45921329 0.26359239 0.32981461]\n\n\n\n\n\n\n\nCode\n# Single random integer from 0 to 10 (inclusive)\nprint(rng.integers(11))\n\n# Array of random integers from 1 to 100 (inclusive)\nprint(rng.integers(1, 101, size=5))\n\n\n4\n[73 74 35 82 86]\n\n\n\n\n\n\n\nCode\n# Single value from standard normal distribution\nprint(rng.standard_normal())\n\n# Array from normal distribution with mean=0, std=1\nprint(rng.normal(loc=0, scale=1, size=5))\n\n\n-0.39813817964311515\n[ 0.28262105 -1.23483313 0.42485523 -0.12518701 -0.53235252]\n\n\n\n\n\n\n\n\nCode\n# Random choice from an array\narr = np.array([1, 2, 3, 4, 5])\nprint(rng.choice(arr))\n\n# Random sample without replacement\nprint(rng.choice(arr, size=3, replace=False))\n\n\n4\n[2 5 4]\n\n\n\n\n\n\n\nCode\narr = np.arange(10)\nrng.shuffle(arr)\nprint(arr)\n\n\n[4 1 6 5 8 2 3 7 9 0]\n\n\n\n\n\nGenerators provide methods for many other distributions:\n\n\nCode\n# Poisson distribution\nprint(rng.poisson(lam=5, size=3))\n\n# Exponential distribution\nprint(rng.exponential(scale=1.0, size=3))\n\n# Binomial distribution\nprint(rng.binomial(n=10, p=0.5, size=3))\n\n\n[8 3 4]\n[1.16757058 0.27745692 0.73386612]\n[6 5 5]\n\n\n\n\n\nGenerators can fill existing arrays, which can be more efficient:\n\n\nCode\narr = np.empty(5)\nrng.random(out=arr)\nprint(arr)\n\n\n[0.91186338 0.70114601 0.45616672 0.46383814 0.5871376 ]\n\n\n\n\n\nYou can use different Bit Generators with varying statistical qualities:\n\n\nCode\nfrom numpy.random import PCG64, MT19937\n\nrng_pcg = np.random.Generator(PCG64())\nrng_mt = np.random.Generator(MT19937())\n\nprint(\"PCG64:\", rng_pcg.random())\nprint(\"MT19937:\", rng_mt.random())\n\n\nPCG64: 0.8210003140505849\nMT19937: 0.41349136323428615\n\n\n\n\n\n\n\nCode\n# Save state\nstate = rng.bit_generator.state\n\n# Generate some numbers\nprint(\"Original:\", rng.random(3))\n\n# Restore state and regenerate\nrng.bit_generator.state = state\nprint(\"Restored:\", rng.random(3))\n\n\nOriginal: [0.8069469 0.67878297 0.48562037]\nRestored: [0.8069469 0.67878297 0.48562037]\n\n\n\n\n\nYou can create independent generators from an existing one:\n\n\nCode\nchild1, child2 = rng.spawn(2)\nprint(\"Child 1:\", child1.random())\nprint(\"Child 2:\", child2.random())\n\n\nChild 1: 0.11547306758577636\nChild 2: 0.8805861858890862\n\n\n\n\n\nGenerators are designed to be thread-safe and support “jumping” ahead in the sequence:\n\n\nCode\nrng = np.random.Generator(PCG64())\nrng.bit_generator.advance(1000) # Jump ahead 1000 steps\n\n\n<numpy.random._pcg64.PCG64 at 0x10dc90040>\n\n\n\n\n\n\nUse default_rng() to create a Generator unless you have specific requirements for a different Bit Generator.\nSet a seed for reproducibility in scientific computations and testing.\nUse the spawn() method to create independent generators for parallel processing.\nWhen performance is critical, consider using the out parameter to fill existing arrays.\nFor very long periods or when security is important, consider using the PCG64DXSM Bit Generator.\n\nRemember, Generators provide a more robust, flexible, and future-proof approach to random number generation in NumPy. They offer better statistical properties and are designed to work well in both single-threaded and multi-threaded environments." }, { "objectID": "course-materials/cheatsheets/random_numbers.html#numpy-generator-based-random-number-generation-cheatsheet", "href": "course-materials/cheatsheets/random_numbers.html#numpy-generator-based-random-number-generation-cheatsheet", "title": "EDS 217 Cheatsheet", "section": "", - "text": "For information on the previous np.random API and its use cases, please refer to the NumPy documentation on legacy random generation: NumPy Legacy Random Generation\nThis cheatsheet focuses on the modern Generator-based approach to random number generation in NumPy.\n\n\n\n\nCode\nimport numpy as np\n\n\n\n\n\n\n\nCode\n# Create a Generator with the default BitGenerator\nrng = np.random.default_rng()\n\n# Create a Generator with a specific seed\nrng_seeded = np.random.default_rng(seed=42)\n\n\n\n\n\n\n\n\n\nCode\n# Single random float\nprint(rng.random())\n\n# Array of random floats\nprint(rng.random(5))\n\n\n0.22092102589868978\n[0.28943778 0.09584537 0.78135092 0.97740274 0.44528883]\n\n\n\n\n\n\n\nCode\n# Single random integer from 0 to 10 (inclusive)\nprint(rng.integers(11))\n\n# Array of random integers from 1 to 100 (inclusive)\nprint(rng.integers(1, 101, size=5))\n\n\n4\n[74 44 73 84 90]\n\n\n\n\n\n\n\nCode\n# Single value from standard normal distribution\nprint(rng.standard_normal())\n\n# Array from normal distribution with mean=0, std=1\nprint(rng.normal(loc=0, scale=1, size=5))\n\n\n-0.8129275335805446\n[ 1.1815606 -1.15590009 0.26582165 0.43263691 0.75292281]\n\n\n\n\n\n\n\n\nCode\n# Random choice from an array\narr = np.array([1, 2, 3, 4, 5])\nprint(rng.choice(arr))\n\n# Random sample without replacement\nprint(rng.choice(arr, size=3, replace=False))\n\n\n4\n[4 3 1]\n\n\n\n\n\n\n\nCode\narr = np.arange(10)\nrng.shuffle(arr)\nprint(arr)\n\n\n[9 5 7 3 1 2 4 8 6 0]\n\n\n\n\n\nGenerators provide methods for many other distributions:\n\n\nCode\n# Poisson distribution\nprint(rng.poisson(lam=5, size=3))\n\n# Exponential distribution\nprint(rng.exponential(scale=1.0, size=3))\n\n# Binomial distribution\nprint(rng.binomial(n=10, p=0.5, size=3))\n\n\n[11 4 5]\n[0.16471365 0.67393881 0.09664034]\n[4 6 6]\n\n\n\n\n\nGenerators can fill existing arrays, which can be more efficient:\n\n\nCode\narr = np.empty(5)\nrng.random(out=arr)\nprint(arr)\n\n\n[0.51132826 0.60823193 0.59053934 0.87320215 0.3752571 ]\n\n\n\n\n\nYou can use different Bit Generators with varying statistical qualities:\n\n\nCode\nfrom numpy.random import PCG64, MT19937\n\nrng_pcg = np.random.Generator(PCG64())\nrng_mt = np.random.Generator(MT19937())\n\nprint(\"PCG64:\", rng_pcg.random())\nprint(\"MT19937:\", rng_mt.random())\n\n\nPCG64: 0.47047396600491753\nMT19937: 0.7397295290585331\n\n\n\n\n\n\n\nCode\n# Save state\nstate = rng.bit_generator.state\n\n# Generate some numbers\nprint(\"Original:\", rng.random(3))\n\n# Restore state and regenerate\nrng.bit_generator.state = state\nprint(\"Restored:\", rng.random(3))\n\n\nOriginal: [0.45477352 0.27105728 0.16726136]\nRestored: [0.45477352 0.27105728 0.16726136]\n\n\n\n\n\nYou can create independent generators from an existing one:\n\n\nCode\nchild1, child2 = rng.spawn(2)\nprint(\"Child 1:\", child1.random())\nprint(\"Child 2:\", child2.random())\n\n\nChild 1: 0.5815447472196176\nChild 2: 0.23081422202004886\n\n\n\n\n\nGenerators are designed to be thread-safe and support “jumping” ahead in the sequence:\n\n\nCode\nrng = np.random.Generator(PCG64())\nrng.bit_generator.advance(1000) # Jump ahead 1000 steps\n\n\n<numpy.random._pcg64.PCG64 at 0x10e2b0040>\n\n\n\n\n\n\nUse default_rng() to create a Generator unless you have specific requirements for a different Bit Generator.\nSet a seed for reproducibility in scientific computations and testing.\nUse the spawn() method to create independent generators for parallel processing.\nWhen performance is critical, consider using the out parameter to fill existing arrays.\nFor very long periods or when security is important, consider using the PCG64DXSM Bit Generator.\n\nRemember, Generators provide a more robust, flexible, and future-proof approach to random number generation in NumPy. They offer better statistical properties and are designed to work well in both single-threaded and multi-threaded environments." + "text": "For information on the previous np.random API and its use cases, please refer to the NumPy documentation on legacy random generation: NumPy Legacy Random Generation\nThis cheatsheet focuses on the modern Generator-based approach to random number generation in NumPy.\n\n\n\n\nCode\nimport numpy as np\n\n\n\n\n\n\n\nCode\n# Create a Generator with the default BitGenerator\nrng = np.random.default_rng()\n\n# Create a Generator with a specific seed\nrng_seeded = np.random.default_rng(seed=42)\n\n\n\n\n\n\n\n\n\nCode\n# Single random float\nprint(rng.random())\n\n# Array of random floats\nprint(rng.random(5))\n\n\n0.3278102376191233\n[0.16307633 0.40392116 0.45921329 0.26359239 0.32981461]\n\n\n\n\n\n\n\nCode\n# Single random integer from 0 to 10 (inclusive)\nprint(rng.integers(11))\n\n# Array of random integers from 1 to 100 (inclusive)\nprint(rng.integers(1, 101, size=5))\n\n\n4\n[73 74 35 82 86]\n\n\n\n\n\n\n\nCode\n# Single value from standard normal distribution\nprint(rng.standard_normal())\n\n# Array from normal distribution with mean=0, std=1\nprint(rng.normal(loc=0, scale=1, size=5))\n\n\n-0.39813817964311515\n[ 0.28262105 -1.23483313 0.42485523 -0.12518701 -0.53235252]\n\n\n\n\n\n\n\n\nCode\n# Random choice from an array\narr = np.array([1, 2, 3, 4, 5])\nprint(rng.choice(arr))\n\n# Random sample without replacement\nprint(rng.choice(arr, size=3, replace=False))\n\n\n4\n[2 5 4]\n\n\n\n\n\n\n\nCode\narr = np.arange(10)\nrng.shuffle(arr)\nprint(arr)\n\n\n[4 1 6 5 8 2 3 7 9 0]\n\n\n\n\n\nGenerators provide methods for many other distributions:\n\n\nCode\n# Poisson distribution\nprint(rng.poisson(lam=5, size=3))\n\n# Exponential distribution\nprint(rng.exponential(scale=1.0, size=3))\n\n# Binomial distribution\nprint(rng.binomial(n=10, p=0.5, size=3))\n\n\n[8 3 4]\n[1.16757058 0.27745692 0.73386612]\n[6 5 5]\n\n\n\n\n\nGenerators can fill existing arrays, which can be more efficient:\n\n\nCode\narr = np.empty(5)\nrng.random(out=arr)\nprint(arr)\n\n\n[0.91186338 0.70114601 0.45616672 0.46383814 0.5871376 ]\n\n\n\n\n\nYou can use different Bit Generators with varying statistical qualities:\n\n\nCode\nfrom numpy.random import PCG64, MT19937\n\nrng_pcg = np.random.Generator(PCG64())\nrng_mt = np.random.Generator(MT19937())\n\nprint(\"PCG64:\", rng_pcg.random())\nprint(\"MT19937:\", rng_mt.random())\n\n\nPCG64: 0.8210003140505849\nMT19937: 0.41349136323428615\n\n\n\n\n\n\n\nCode\n# Save state\nstate = rng.bit_generator.state\n\n# Generate some numbers\nprint(\"Original:\", rng.random(3))\n\n# Restore state and regenerate\nrng.bit_generator.state = state\nprint(\"Restored:\", rng.random(3))\n\n\nOriginal: [0.8069469 0.67878297 0.48562037]\nRestored: [0.8069469 0.67878297 0.48562037]\n\n\n\n\n\nYou can create independent generators from an existing one:\n\n\nCode\nchild1, child2 = rng.spawn(2)\nprint(\"Child 1:\", child1.random())\nprint(\"Child 2:\", child2.random())\n\n\nChild 1: 0.11547306758577636\nChild 2: 0.8805861858890862\n\n\n\n\n\nGenerators are designed to be thread-safe and support “jumping” ahead in the sequence:\n\n\nCode\nrng = np.random.Generator(PCG64())\nrng.bit_generator.advance(1000) # Jump ahead 1000 steps\n\n\n<numpy.random._pcg64.PCG64 at 0x10dc90040>\n\n\n\n\n\n\nUse default_rng() to create a Generator unless you have specific requirements for a different Bit Generator.\nSet a seed for reproducibility in scientific computations and testing.\nUse the spawn() method to create independent generators for parallel processing.\nWhen performance is critical, consider using the out parameter to fill existing arrays.\nFor very long periods or when security is important, consider using the PCG64DXSM Bit Generator.\n\nRemember, Generators provide a more robust, flexible, and future-proof approach to random number generation in NumPy. They offer better statistical properties and are designed to work well in both single-threaded and multi-threaded environments." }, { "objectID": "course-materials/cheatsheets/chart_customization.html", @@ -501,7 +501,7 @@ "href": "course-materials/cheatsheets/chart_customization.html#seaborn-customization", "title": "EDS 217 Cheatsheet", "section": "Seaborn Customization", - "text": "Seaborn Customization\n\nSetting the Style\n\n\nCode\nimport seaborn as sns\nimport pandas as pd\n\nsns.set_style(\"whitegrid\")\nsns.set_palette(\"deep\")\n\n\n\n\nLoading and Preparing Data\n\n\nCode\n# Load the tips dataset\ntips = sns.load_dataset(\"tips\")\n\n# Display the first few rows and data types\nprint(tips.head())\nprint(\"\\nData Types:\")\nprint(tips.dtypes)\n\n# Select only numeric columns for correlation\nnumeric_columns = tips.select_dtypes(include=[np.number]).columns\ntips_numeric = tips[numeric_columns]\n\n\n total_bill tip sex smoker day time size\n0 16.99 1.01 Female No Sun Dinner 2\n1 10.34 1.66 Male No Sun Dinner 3\n2 21.01 3.50 Male No Sun Dinner 3\n3 23.68 3.31 Male No Sun Dinner 2\n4 24.59 3.61 Female No Sun Dinner 4\n\nData Types:\ntotal_bill float64\ntip float64\nsex category\nsmoker category\nday category\ntime category\nsize int64\ndtype: object\n\n\n\n\nCustomizing a Scatter Plot\n\n\nCode\nplt.figure(figsize=(10, 6))\nsns.scatterplot(data=tips, x=\"total_bill\", y=\"tip\", hue=\"time\", size=\"size\")\nplt.title(\"Tips vs Total Bill\", fontsize=16)\nplt.show()\n\n\n\n\n\n\n\n\n\n\n\nCustomizing a Box Plot\n\n\nCode\nplt.figure(figsize=(10, 6))\nsns.boxplot(data=tips, x=\"day\", y=\"total_bill\", palette=\"Set3\")\nplt.title(\"Total Bill by Day\", fontsize=16)\nplt.show()\n\n\n/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_62194/2996973332.py:2: FutureWarning: \n\nPassing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.\n\n sns.boxplot(data=tips, x=\"day\", y=\"total_bill\", palette=\"Set3\")\n\n\n\n\n\n\n\n\n\n\n\nCustomizing a Heatmap (Correlation of Numeric Columns)\n\n\nCode\ncorr = tips_numeric.corr()\nplt.figure(figsize=(8, 6))\nsns.heatmap(corr, annot=True, cmap=\"coolwarm\", linewidths=0.5)\nplt.title(\"Correlation Heatmap of Numeric Columns\", fontsize=16)\nplt.show()\n\n\n\n\n\n\n\n\n\n\n\nCustomizing a Pair Plot\n\n\nCode\nsns.pairplot(tips, hue=\"time\", palette=\"husl\", height=2.5, \n vars=[\"total_bill\", \"tip\", \"size\"])\nplt.suptitle(\"Pair Plot of Tips Dataset\", y=1.02, fontsize=16)\nplt.show()\n\n\n\n\n\n\n\n\n\n\n\nCustomizing a Regression Plot\n\n\nCode\nplt.figure(figsize=(10, 6))\nsns.regplot(data=tips, x=\"total_bill\", y=\"tip\", scatter_kws={\"color\": \"blue\"}, line_kws={\"color\": \"red\"})\nplt.title(\"Regression Plot: Tip vs Total Bill\", fontsize=16)\nplt.show()\n\n\n\n\n\n\n\n\n\n\n\nCustomizing a Categorical Plot\n\n\nCode\nplt.figure(figsize=(12, 6))\nsns.catplot(data=tips, x=\"day\", y=\"total_bill\", hue=\"sex\", kind=\"violin\", split=True)\nplt.title(\"Distribution of Total Bill by Day and Sex\", fontsize=16)\nplt.show()\n\n\n<Figure size 1152x576 with 0 Axes>\n\n\n\n\n\n\n\n\n\nRemember, you can always combine Matplotlib and Seaborn customizations for even more control over your visualizations!" + "text": "Seaborn Customization\n\nSetting the Style\n\n\nCode\nimport seaborn as sns\nimport pandas as pd\n\nsns.set_style(\"whitegrid\")\nsns.set_palette(\"deep\")\n\n\n\n\nLoading and Preparing Data\n\n\nCode\n# Load the tips dataset\ntips = sns.load_dataset(\"tips\")\n\n# Display the first few rows and data types\nprint(tips.head())\nprint(\"\\nData Types:\")\nprint(tips.dtypes)\n\n# Select only numeric columns for correlation\nnumeric_columns = tips.select_dtypes(include=[np.number]).columns\ntips_numeric = tips[numeric_columns]\n\n\n total_bill tip sex smoker day time size\n0 16.99 1.01 Female No Sun Dinner 2\n1 10.34 1.66 Male No Sun Dinner 3\n2 21.01 3.50 Male No Sun Dinner 3\n3 23.68 3.31 Male No Sun Dinner 2\n4 24.59 3.61 Female No Sun Dinner 4\n\nData Types:\ntotal_bill float64\ntip float64\nsex category\nsmoker category\nday category\ntime category\nsize int64\ndtype: object\n\n\n\n\nCustomizing a Scatter Plot\n\n\nCode\nplt.figure(figsize=(10, 6))\nsns.scatterplot(data=tips, x=\"total_bill\", y=\"tip\", hue=\"time\", size=\"size\")\nplt.title(\"Tips vs Total Bill\", fontsize=16)\nplt.show()\n\n\n\n\n\n\n\n\n\n\n\nCustomizing a Box Plot\n\n\nCode\nplt.figure(figsize=(10, 6))\nsns.boxplot(data=tips, x=\"day\", y=\"total_bill\", palette=\"Set3\")\nplt.title(\"Total Bill by Day\", fontsize=16)\nplt.show()\n\n\n/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85582/2996973332.py:2: FutureWarning: \n\nPassing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.\n\n sns.boxplot(data=tips, x=\"day\", y=\"total_bill\", palette=\"Set3\")\n\n\n\n\n\n\n\n\n\n\n\nCustomizing a Heatmap (Correlation of Numeric Columns)\n\n\nCode\ncorr = tips_numeric.corr()\nplt.figure(figsize=(8, 6))\nsns.heatmap(corr, annot=True, cmap=\"coolwarm\", linewidths=0.5)\nplt.title(\"Correlation Heatmap of Numeric Columns\", fontsize=16)\nplt.show()\n\n\n\n\n\n\n\n\n\n\n\nCustomizing a Pair Plot\n\n\nCode\nsns.pairplot(tips, hue=\"time\", palette=\"husl\", height=2.5, \n vars=[\"total_bill\", \"tip\", \"size\"])\nplt.suptitle(\"Pair Plot of Tips Dataset\", y=1.02, fontsize=16)\nplt.show()\n\n\n\n\n\n\n\n\n\n\n\nCustomizing a Regression Plot\n\n\nCode\nplt.figure(figsize=(10, 6))\nsns.regplot(data=tips, x=\"total_bill\", y=\"tip\", scatter_kws={\"color\": \"blue\"}, line_kws={\"color\": \"red\"})\nplt.title(\"Regression Plot: Tip vs Total Bill\", fontsize=16)\nplt.show()\n\n\n\n\n\n\n\n\n\n\n\nCustomizing a Categorical Plot\n\n\nCode\nplt.figure(figsize=(12, 6))\nsns.catplot(data=tips, x=\"day\", y=\"total_bill\", hue=\"sex\", kind=\"violin\", split=True)\nplt.title(\"Distribution of Total Bill by Day and Sex\", fontsize=16)\nplt.show()\n\n\n<Figure size 1152x576 with 0 Axes>\n\n\n\n\n\n\n\n\n\nRemember, you can always combine Matplotlib and Seaborn customizations for even more control over your visualizations!" }, { "objectID": "course-materials/interactive-sessions/7a_visualizations_1.html#getting-started", @@ -536,7 +536,7 @@ "href": "course-materials/interactive-sessions/7a_visualizations_1.html#axes-settings", "title": "Interactive Session 7A", "section": "Axes settings", - "text": "Axes settings\nNext, we will explore how to scale and annotate a plot using axes routines that control what goes on around the edges of the plot.\n\nLimits\nBy default, matplotlib will attempt to determine x- and y-axis limits, which usually work pretty well. Sometimes, however, it is useful to have finer control. The simplest way to adjust the display limits is to use the plt.xlim() and plt.ylim() methods.\nIn the example below, adjust the numbers (these can be int or float values) to see how the plot changes.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Initialize empty figure\nfig1 = plt.figure()\n# Plot sine wave \nplt.plot(x, ysin, color='darkblue')\n\n# Set axis limits\nplt.xlim(-5,5)\nplt.ylim(-2,2)\n\n\n\n\n\n\n\n\n\n\n\nTicks and Tick Labels\nYou may also find it useful to adjust the ticks and/or tick labels that matplotlib displays by default. The plt.xticks() and plt.yticks() methods allow you to control the locations of both the ticks and the labels on the x- and y-axes, respectively. Both methods accept two list or array-like arguments, as well as optional keyword arguments. The first corresponds to the ticks, while the second controls the tick labels.\n# Set x-axis ticks at 0, 0.25, 0.5, 0.75, 1.0 with all labeled\nplt.xticks([0,0.25,0.5,0.75,1.0])\n# Set y-axis ticks from 0 to 100 with ticks on 10s and labels on 20s\nplt.yticks(np.arange(0,101,10),['0','','20','','40','','60','','80','','100'])\n\n\n\n\n\n\nImportant\n\n\n\nIf the labels are not specified, all ticks will be labeled accordingly. To only label certain ticks, you must pass a list with empty strings in the location of the ticks you wish to leave unlabeled (or the ticks will be labeled in order).\n\n\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Initialize empty figure\nfig1 = plt.figure()\n# Plot sine wave \nplt.plot(x, ysin, color='darkblue')\n\n# Set x-axis limits\nplt.xlim(-5,5)\n\n# Set axis ticks\nplt.xticks([-4,-3,-2,-1,0,1,2,3,4],['-4','','-2','','0','','2','','4'])\nplt.yticks([-1,-0.5,0,0.5,1])\n\n\n([<matplotlib.axis.YTick at 0x1a8cda6d0>,\n <matplotlib.axis.YTick at 0x1a8c93950>,\n <matplotlib.axis.YTick at 0x1a8c58c10>,\n <matplotlib.axis.YTick at 0x1a8d12650>,\n <matplotlib.axis.YTick at 0x1a8d1c6d0>],\n [Text(0, -1.0, '−1.0'),\n Text(0, -0.5, '−0.5'),\n Text(0, 0.0, '0.0'),\n Text(0, 0.5, '0.5'),\n Text(0, 1.0, '1.0')])\n\n\n\n\n\n\n\n\n\nAs with any plot, it is imperative to include x- and y-axis labels. This can be done by passing strings to the plt.xlabel() and plt.ylabel() methods:\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Initialize empty figure\nfig1 = plt.figure()\n# Plot sine wave \nplt.plot(x, ysin, color='darkblue')\n\n# Set x-axis limits\nplt.xlim(-5,5)\n\n# Set axis ticks\nplt.xticks([-4,-3,-2,-1,0,1,2,3,4],['-4','','-2','','0','','2','','4'])\nplt.yticks([-1,-0.5,0,0.5,1])\n\n# Set axis labels\nplt.xlabel('x-axis')\nplt.ylabel('y-axis')\n\n\nText(0, 0.5, 'y-axis')\n\n\n\n\n\n\n\n\n\nA nice feature about matplotlib is that it supports TeX formatting for mathematical expressions. This is quite useful for displaying equations, exponents, units, and other mathematical operators. The syntax for TeX expressions is 'r$TeX expression here$'. For example, we can display the axis labels as \\(x\\) and \\(\\sin{(x)}\\) as follows:\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Initialize empty figure\nfig1 = plt.figure()\n# Plot sine wave \nplt.plot(x, ysin, color='darkblue')\n\n# Set x-axis limits\nplt.xlim(-5,5)\n\n# Set axis ticks\nplt.xticks([-4,-3,-2,-1,0,1,2,3,4],['-4','','-2','','0','','2','','4'])\nplt.yticks([-1,-0.5,0,0.5,1])\n\n# Set axis labels\nplt.xlabel(r'$x$')\nplt.ylabel(r'$\\sin{(x)}$')\n\n\nText(0, 0.5, '$\\\\sin{(x)}$')\n\n\n\n\n\n\n\n\n\n\n\nTitles\nAdding a title to your plot is analogous to labeling the x- and y-axes. The plt.title() method allows you to set the title of your plot by passing a string:\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Initialize empty figure\nfig1 = plt.figure()\n# Plot sine wave \nplt.plot(x, ysin, color='darkblue')\nplt.plot(x, ycos, color='#B8D62E')\n\n# Set x-axis limits\nplt.xlim(-5,5)\n\n# Set axis ticks\nplt.xticks([-4,-3,-2,-1,0,1,2,3,4],['-4','','-2','','0','','2','','4'])\nplt.yticks([-1,-0.5,0,0.5,1])\n\n# Set axis labels\nplt.xlabel(r'$x$')\nplt.ylabel(r'$y$')\n\n# Set title\nplt.title('Sinusoidal functions')\n\n\nText(0.5, 1.0, 'Sinusoidal functions')\n\n\n\n\n\n\n\n\n\n\n\nLegends\nWhen multiple datasets are plotted on the same axes it is often useful to include a legend that labels each line or set of points. Matplotlib has a quick way of displaying a legend using the plt.legend() method. There are multiple ways of specifying the label for each dataset; I prefer to pass a list of strings to plt.legend():\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Initialize empty figure\nfig1 = plt.figure()\n# Plot sine wave \nplt.plot(x, ysin, color='darkblue')\nplt.plot(x, ycos, color='#B8D62E')\n\n# Set x-axis limits\nplt.xlim(-5,5)\n\n# Set axis ticks\nplt.xticks([-4,-3,-2,-1,0,1,2,3,4],['-4','','-2','','0','','2','','4'])\nplt.yticks([-1,-0.5,0,0.5,1])\n\n# Set axis labels\nplt.xlabel(r'$x$')\nplt.ylabel(r'$y$')\n\n# Set title\nplt.title('Sinusoidal functions')\n\n# Legend\nplt.legend(labels=['sin(x)','cos(x)'])\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNote\n\n\n\nAnother way of setting the data labels is to use the label keyword argument in the plt.plot() (or plt.scatter()) function:\n# Plot data\nplt.plot(x1, y1, label='Data1')\nplt.plot(x2, y2, label='Data2')\n\n# Legend\nplt.legend()\nNote that you must still run plt.legend() to display the legend.\n\n\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Initialize empty figure\nfig1 = plt.figure()\n# Plot sine wave \nplt.plot(x, ysin, label='sin(x)', color='darkblue')\nplt.plot(x, ycos, label='cos(x)', color='#B8D62E')\n\n# Set x-axis limits\nplt.xlim(-5,5)\n\n# Set axis ticks\nplt.xticks([-4,-3,-2,-1,0,1,2,3,4],['-4','','-2','','0','','2','','4'])\nplt.yticks([-1,-0.5,0,0.5,1])\n\n# Set axis labels\nplt.xlabel(r'$x$')\nplt.ylabel(r'$y$')\n\n# Set title\nplt.title('Sinusoidal functions')\n\n# Legend\nplt.legend()" + "text": "Axes settings\nNext, we will explore how to scale and annotate a plot using axes routines that control what goes on around the edges of the plot.\n\nLimits\nBy default, matplotlib will attempt to determine x- and y-axis limits, which usually work pretty well. Sometimes, however, it is useful to have finer control. The simplest way to adjust the display limits is to use the plt.xlim() and plt.ylim() methods.\nIn the example below, adjust the numbers (these can be int or float values) to see how the plot changes.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Initialize empty figure\nfig1 = plt.figure()\n# Plot sine wave \nplt.plot(x, ysin, color='darkblue')\n\n# Set axis limits\nplt.xlim(-5,5)\nplt.ylim(-2,2)\n\n\n\n\n\n\n\n\n\n\n\nTicks and Tick Labels\nYou may also find it useful to adjust the ticks and/or tick labels that matplotlib displays by default. The plt.xticks() and plt.yticks() methods allow you to control the locations of both the ticks and the labels on the x- and y-axes, respectively. Both methods accept two list or array-like arguments, as well as optional keyword arguments. The first corresponds to the ticks, while the second controls the tick labels.\n# Set x-axis ticks at 0, 0.25, 0.5, 0.75, 1.0 with all labeled\nplt.xticks([0,0.25,0.5,0.75,1.0])\n# Set y-axis ticks from 0 to 100 with ticks on 10s and labels on 20s\nplt.yticks(np.arange(0,101,10),['0','','20','','40','','60','','80','','100'])\n\n\n\n\n\n\nImportant\n\n\n\nIf the labels are not specified, all ticks will be labeled accordingly. To only label certain ticks, you must pass a list with empty strings in the location of the ticks you wish to leave unlabeled (or the ticks will be labeled in order).\n\n\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Initialize empty figure\nfig1 = plt.figure()\n# Plot sine wave \nplt.plot(x, ysin, color='darkblue')\n\n# Set x-axis limits\nplt.xlim(-5,5)\n\n# Set axis ticks\nplt.xticks([-4,-3,-2,-1,0,1,2,3,4],['-4','','-2','','0','','2','','4'])\nplt.yticks([-1,-0.5,0,0.5,1])\n\n\n([<matplotlib.axis.YTick at 0x1a9425c90>,\n <matplotlib.axis.YTick at 0x1a9463f90>,\n <matplotlib.axis.YTick at 0x1a944abd0>,\n <matplotlib.axis.YTick at 0x1a94aa7d0>,\n <matplotlib.axis.YTick at 0x1a94b0850>],\n [Text(0, -1.0, '−1.0'),\n Text(0, -0.5, '−0.5'),\n Text(0, 0.0, '0.0'),\n Text(0, 0.5, '0.5'),\n Text(0, 1.0, '1.0')])\n\n\n\n\n\n\n\n\n\nAs with any plot, it is imperative to include x- and y-axis labels. This can be done by passing strings to the plt.xlabel() and plt.ylabel() methods:\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Initialize empty figure\nfig1 = plt.figure()\n# Plot sine wave \nplt.plot(x, ysin, color='darkblue')\n\n# Set x-axis limits\nplt.xlim(-5,5)\n\n# Set axis ticks\nplt.xticks([-4,-3,-2,-1,0,1,2,3,4],['-4','','-2','','0','','2','','4'])\nplt.yticks([-1,-0.5,0,0.5,1])\n\n# Set axis labels\nplt.xlabel('x-axis')\nplt.ylabel('y-axis')\n\n\nText(0, 0.5, 'y-axis')\n\n\n\n\n\n\n\n\n\nA nice feature about matplotlib is that it supports TeX formatting for mathematical expressions. This is quite useful for displaying equations, exponents, units, and other mathematical operators. The syntax for TeX expressions is 'r$TeX expression here$'. For example, we can display the axis labels as \\(x\\) and \\(\\sin{(x)}\\) as follows:\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Initialize empty figure\nfig1 = plt.figure()\n# Plot sine wave \nplt.plot(x, ysin, color='darkblue')\n\n# Set x-axis limits\nplt.xlim(-5,5)\n\n# Set axis ticks\nplt.xticks([-4,-3,-2,-1,0,1,2,3,4],['-4','','-2','','0','','2','','4'])\nplt.yticks([-1,-0.5,0,0.5,1])\n\n# Set axis labels\nplt.xlabel(r'$x$')\nplt.ylabel(r'$\\sin{(x)}$')\n\n\nText(0, 0.5, '$\\\\sin{(x)}$')\n\n\n\n\n\n\n\n\n\n\n\nTitles\nAdding a title to your plot is analogous to labeling the x- and y-axes. The plt.title() method allows you to set the title of your plot by passing a string:\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Initialize empty figure\nfig1 = plt.figure()\n# Plot sine wave \nplt.plot(x, ysin, color='darkblue')\nplt.plot(x, ycos, color='#B8D62E')\n\n# Set x-axis limits\nplt.xlim(-5,5)\n\n# Set axis ticks\nplt.xticks([-4,-3,-2,-1,0,1,2,3,4],['-4','','-2','','0','','2','','4'])\nplt.yticks([-1,-0.5,0,0.5,1])\n\n# Set axis labels\nplt.xlabel(r'$x$')\nplt.ylabel(r'$y$')\n\n# Set title\nplt.title('Sinusoidal functions')\n\n\nText(0.5, 1.0, 'Sinusoidal functions')\n\n\n\n\n\n\n\n\n\n\n\nLegends\nWhen multiple datasets are plotted on the same axes it is often useful to include a legend that labels each line or set of points. Matplotlib has a quick way of displaying a legend using the plt.legend() method. There are multiple ways of specifying the label for each dataset; I prefer to pass a list of strings to plt.legend():\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Initialize empty figure\nfig1 = plt.figure()\n# Plot sine wave \nplt.plot(x, ysin, color='darkblue')\nplt.plot(x, ycos, color='#B8D62E')\n\n# Set x-axis limits\nplt.xlim(-5,5)\n\n# Set axis ticks\nplt.xticks([-4,-3,-2,-1,0,1,2,3,4],['-4','','-2','','0','','2','','4'])\nplt.yticks([-1,-0.5,0,0.5,1])\n\n# Set axis labels\nplt.xlabel(r'$x$')\nplt.ylabel(r'$y$')\n\n# Set title\nplt.title('Sinusoidal functions')\n\n# Legend\nplt.legend(labels=['sin(x)','cos(x)'])\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNote\n\n\n\nAnother way of setting the data labels is to use the label keyword argument in the plt.plot() (or plt.scatter()) function:\n# Plot data\nplt.plot(x1, y1, label='Data1')\nplt.plot(x2, y2, label='Data2')\n\n# Legend\nplt.legend()\nNote that you must still run plt.legend() to display the legend.\n\n\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Initialize empty figure\nfig1 = plt.figure()\n# Plot sine wave \nplt.plot(x, ysin, label='sin(x)', color='darkblue')\nplt.plot(x, ycos, label='cos(x)', color='#B8D62E')\n\n# Set x-axis limits\nplt.xlim(-5,5)\n\n# Set axis ticks\nplt.xticks([-4,-3,-2,-1,0,1,2,3,4],['-4','','-2','','0','','2','','4'])\nplt.yticks([-1,-0.5,0,0.5,1])\n\n# Set axis labels\nplt.xlabel(r'$x$')\nplt.ylabel(r'$y$')\n\n# Set title\nplt.title('Sinusoidal functions')\n\n# Legend\nplt.legend()" }, { "objectID": "course-materials/interactive-sessions/7a_visualizations_1.html#subplots-multiple-axes", @@ -774,14 +774,14 @@ "href": "course-materials/cheatsheets/data_grouping.html#grouped-operations", "title": "EDS 217 Cheatsheet", "section": "Grouped Operations", - "text": "Grouped Operations\nYou can apply operations to each group separately using transform() or apply().\n\nUsing transform() to alter each group in a group by object\n\n\nCode\n# Transform: apply function to each group, return same-sized DataFrame\ndef normalize(x):\n return (x - x.mean()) / x.std()\n\ndf['value_normalized'] = grouped['value'].transform(normalize)\n\n\n\n\nUsing apply() to alter each group in a group by object\n\n\nCode\n# Apply: apply function to each group, return a DataFrame or Series\ndef group_range(x):\n return x['value'].max() - x['value'].min()\n\nresult = grouped.apply(group_range)\n\n\n/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_61889/114114075.py:5: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.\n result = grouped.apply(group_range)" + "text": "Grouped Operations\nYou can apply operations to each group separately using transform() or apply().\n\nUsing transform() to alter each group in a group by object\n\n\nCode\n# Transform: apply function to each group, return same-sized DataFrame\ndef normalize(x):\n return (x - x.mean()) / x.std()\n\ndf['value_normalized'] = grouped['value'].transform(normalize)\n\n\n\n\nUsing apply() to alter each group in a group by object\n\n\nCode\n# Apply: apply function to each group, return a DataFrame or Series\ndef group_range(x):\n return x['value'].max() - x['value'].min()\n\nresult = grouped.apply(group_range)\n\n\n/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85349/114114075.py:5: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.\n result = grouped.apply(group_range)" }, { "objectID": "course-materials/cheatsheets/data_grouping.html#pivot-tables", "href": "course-materials/cheatsheets/data_grouping.html#pivot-tables", "title": "EDS 217 Cheatsheet", "section": "Pivot Tables", - "text": "Pivot Tables\nPivot tables are a powerful tool for reorganizing and summarizing data. They allow you to transform your data from a long format to a wide format, making it easier to analyze and visualize patterns.\n\nWorking with Pivot Tables\n\n\nCode\n# Sample DataFrame\ndf = pd.DataFrame({\n 'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],\n 'product': ['A', 'B', 'A', 'B'],\n 'sales': [100, 150, 120, 180]\n})\nprint(df)\n\n\n date product sales\n0 2023-01-01 A 100\n1 2023-01-01 B 150\n2 2023-01-02 A 120\n3 2023-01-02 B 180\n\n\n\nPivot tables with a single aggregation function\n\n\nCode\n# Create a pivot table\npivot_table = pd.pivot_table(df, values='sales', index='date', columns='product', aggfunc='sum')\nprint(pivot_table)\n\n\nproduct A B\ndate \n2023-01-01 100 150\n2023-01-02 120 180\n\n\n\n\nPivot tables with multiple aggregation\n\n\nCode\n# Pivot table with multiple aggregation functions\npivot_multi = pd.pivot_table(df, values='sales', index='date', columns='product', \n aggfunc=[np.sum, np.mean])\nprint(pivot_multi)\n\n\n sum mean \nproduct A B A B\ndate \n2023-01-01 100 150 100.0 150.0\n2023-01-02 120 180 120.0 180.0\n\n\n/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_61889/1326309547.py:2: FutureWarning: The provided callable <function sum at 0x10e2c72e0> is currently using DataFrameGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"sum\" instead.\n pivot_multi = pd.pivot_table(df, values='sales', index='date', columns='product',\n/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_61889/1326309547.py:2: FutureWarning: The provided callable <function mean at 0x10e2d8400> is currently using DataFrameGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"mean\" instead.\n pivot_multi = pd.pivot_table(df, values='sales', index='date', columns='product'," + "text": "Pivot Tables\nPivot tables are a powerful tool for reorganizing and summarizing data. They allow you to transform your data from a long format to a wide format, making it easier to analyze and visualize patterns.\n\nWorking with Pivot Tables\n\n\nCode\n# Sample DataFrame\ndf = pd.DataFrame({\n 'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],\n 'product': ['A', 'B', 'A', 'B'],\n 'sales': [100, 150, 120, 180]\n})\nprint(df)\n\n\n date product sales\n0 2023-01-01 A 100\n1 2023-01-01 B 150\n2 2023-01-02 A 120\n3 2023-01-02 B 180\n\n\n\nPivot tables with a single aggregation function\n\n\nCode\n# Create a pivot table\npivot_table = pd.pivot_table(df, values='sales', index='date', columns='product', aggfunc='sum')\nprint(pivot_table)\n\n\nproduct A B\ndate \n2023-01-01 100 150\n2023-01-02 120 180\n\n\n\n\nPivot tables with multiple aggregation\n\n\nCode\n# Pivot table with multiple aggregation functions\npivot_multi = pd.pivot_table(df, values='sales', index='date', columns='product', \n aggfunc=[np.sum, np.mean])\nprint(pivot_multi)\n\n\n sum mean \nproduct A B A B\ndate \n2023-01-01 100 150 100.0 150.0\n2023-01-02 120 180 120.0 180.0\n\n\n/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85349/1326309547.py:2: FutureWarning: The provided callable <function sum at 0x11053b2e0> is currently using DataFrameGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"sum\" instead.\n pivot_multi = pd.pivot_table(df, values='sales', index='date', columns='product',\n/var/folders/bs/x9tn9jz91cv6hb3q6p4djbmw0000gn/T/ipykernel_85349/1326309547.py:2: FutureWarning: The provided callable <function mean at 0x11054c400> is currently using DataFrameGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"mean\" instead.\n pivot_multi = pd.pivot_table(df, values='sales', index='date', columns='product'," }, { "objectID": "course-materials/cheatsheets/data_grouping.html#key-pivot-table-parameters", @@ -840,1103 +840,1194 @@ "text": "Conclusion\nThis guide has introduced you to determining variable types and exploring the methods available for different objects in Python. By understanding how to discover and use methods, you’ll be better equipped to manipulate data and build powerful programs.\nFeel free to experiment with the code examples interactively in your Jupyter notebook to deepen your understanding.\n\nEnd interactive session 1C" }, { - "objectID": "course-materials/cheatsheets/sets.html", - "href": "course-materials/cheatsheets/sets.html", - "title": "EDS 217 Cheatsheet", + "objectID": "course-materials/interactive-sessions/8a_github.html", + "href": "course-materials/interactive-sessions/8a_github.html", + "title": "Interactive Session", "section": "", - "text": "Code\n# Empty set\nempty_set = set()\nprint(f\"Empty set: {empty_set}\")\n\n# Set from a list\nset_from_list = set([1, 2, 3, 4, 5])\nprint(f\"Set from list: {set_from_list}\")\n\n# Set literal\nset_literal = {1, 2, 3, 4, 5}\nprint(f\"Set literal: {set_literal}\")\n\n\nEmpty set: set()\nSet from list: {1, 2, 3, 4, 5}\nSet literal: {1, 2, 3, 4, 5}" + "text": "This session contains detailed instructions for creating a new GitHub repository and pushing your EDS 217 course work to it. It also covers how to clean your Jupyter notebooks before committing them to ensure your repository is clean and professional.\n\n\n\n\n\n\nWarning\n\n\n\nJupyter notebooks files can be hard to use in github because they contain information about code execution order and output in the file. For this reason, you should clean notebooks before pushing them to a github repo. We will clean them using the nbstripout python package" }, { - "objectID": "course-materials/cheatsheets/sets.html#creating-sets", - "href": "course-materials/cheatsheets/sets.html#creating-sets", - "title": "EDS 217 Cheatsheet", + "objectID": "course-materials/interactive-sessions/8a_github.html#introduction", + "href": "course-materials/interactive-sessions/8a_github.html#introduction", + "title": "Interactive Session", "section": "", - "text": "Code\n# Empty set\nempty_set = set()\nprint(f\"Empty set: {empty_set}\")\n\n# Set from a list\nset_from_list = set([1, 2, 3, 4, 5])\nprint(f\"Set from list: {set_from_list}\")\n\n# Set literal\nset_literal = {1, 2, 3, 4, 5}\nprint(f\"Set literal: {set_literal}\")\n\n\nEmpty set: set()\nSet from list: {1, 2, 3, 4, 5}\nSet literal: {1, 2, 3, 4, 5}" + "text": "This session contains detailed instructions for creating a new GitHub repository and pushing your EDS 217 course work to it. It also covers how to clean your Jupyter notebooks before committing them to ensure your repository is clean and professional.\n\n\n\n\n\n\nWarning\n\n\n\nJupyter notebooks files can be hard to use in github because they contain information about code execution order and output in the file. For this reason, you should clean notebooks before pushing them to a github repo. We will clean them using the nbstripout python package" }, { - "objectID": "course-materials/cheatsheets/sets.html#basic-operations", - "href": "course-materials/cheatsheets/sets.html#basic-operations", - "title": "EDS 217 Cheatsheet", - "section": "Basic Operations", - "text": "Basic Operations\n\n\nCode\ns = {1, 2, 3, 4, 5}\nprint(f\"Initial set: {s}\")\n\n# Add an element\ns.add(6)\nprint(f\"After adding 6: {s}\")\n\n# Remove an element\ns.remove(3) # Raises KeyError if not found\nprint(f\"After removing 3: {s}\")\n\ns.discard(10) # Doesn't raise error if not found\nprint(f\"After discarding 10 (not in set): {s}\")\n\n# Pop a random element\npopped = s.pop()\nprint(f\"Popped element: {popped}\")\nprint(f\"Set after pop: {s}\")\n\n# Check membership\nprint(f\"Is 2 in the set? {2 in s}\")\n\n# Clear the set\ns.clear()\nprint(f\"Set after clear: {s}\")\n\n\nInitial set: {1, 2, 3, 4, 5}\nAfter adding 6: {1, 2, 3, 4, 5, 6}\nAfter removing 3: {1, 2, 4, 5, 6}\nAfter discarding 10 (not in set): {1, 2, 4, 5, 6}\nPopped element: 1\nSet after pop: {2, 4, 5, 6}\nIs 2 in the set? True\nSet after clear: set()" + "objectID": "course-materials/interactive-sessions/8a_github.html#steps-to-setting-up-a-github-repo-for-your-coursework", + "href": "course-materials/interactive-sessions/8a_github.html#steps-to-setting-up-a-github-repo-for-your-coursework", + "title": "Interactive Session", + "section": "Steps to Setting up a GitHub repo for your coursework:", + "text": "Steps to Setting up a GitHub repo for your coursework:\n\nCreate a new repository on GitHub\nInitialize a local Git repository\nClean Jupyter notebooks of output and execution data\nAdd, commit, and push files to a GitHub repository" }, { - "objectID": "course-materials/cheatsheets/sets.html#set-methods", - "href": "course-materials/cheatsheets/sets.html#set-methods", - "title": "EDS 217 Cheatsheet", - "section": "Set Methods", - "text": "Set Methods\n\n\nCode\na = {1, 2, 3}\nb = {3, 4, 5}\nprint(f\"Set a: {a}\")\nprint(f\"Set b: {b}\")\n\n\nSet a: {1, 2, 3}\nSet b: {3, 4, 5}\n\n\n\nUnion\n\n\nCode\nunion_set = a.union(b)\nprint(f\"Union of a and b: {union_set}\")\n\n\nUnion of a and b: {1, 2, 3, 4, 5}\n\n\n\n\nIntersection\n\n\nCode\nintersection_set = a.intersection(b)\nprint(f\"Intersection of a and b: {intersection_set}\")\n\n\nIntersection of a and b: {3}\n\n\n\n\nDifference\n\n\nCode\ndifference_set = a.difference(b)\nprint(f\"Difference of a and b: {difference_set}\")\n\n\nDifference of a and b: {1, 2}\n\n\n\n\nSymmetric difference\n\n\nCode\nsymmetric_difference_set = a.symmetric_difference(b)\nprint(f\"Symmetric difference of a and b: {symmetric_difference_set}\")\n\n\nSymmetric difference of a and b: {1, 2, 4, 5}\n\n\n\n\nSubset and superset\n\n\nCode\nis_subset = a.issubset(b)\nis_superset = a.issuperset(b)\nprint(f\"Is a a subset of b? {is_subset}\")\nprint(f\"Is a a superset of b? {is_superset}\")\n\n\nIs a a subset of b? False\nIs a a superset of b? False" + "objectID": "course-materials/interactive-sessions/8a_github.html#creating-a-new-github-repository", + "href": "course-materials/interactive-sessions/8a_github.html#creating-a-new-github-repository", + "title": "Interactive Session", + "section": "Creating a New GitHub Repository", + "text": "Creating a New GitHub Repository\nLet’s start by creating a new repository on GitHub:\n\nLog in to your GitHub account\nClick the ‘+’ icon in the top-right corner and select ‘New repository’\nName your repository (e.g., “EDS-217-Course-Work”)\nAdd a description (optional)\nChoose to make the repository public or private\nDon’t initialize the repository with a README, .gitignore, or license\nClick ‘Create repository’\n\nAfter creating the repository, you’ll see a page with instructions. We’ll use these in the next steps." }, { - "objectID": "course-materials/coding-colabs/5c_cleaning_data.html", - "href": "course-materials/coding-colabs/5c_cleaning_data.html", - "title": "Day 5: 🙌 Coding Colab", - "section": "", - "text": "In this collaborative coding exercise, you will work together and apply your new data cleaning skills to a simple dataframe that has a suprising number of problems." + "objectID": "course-materials/interactive-sessions/8a_github.html#initializing-a-local-git-repository", + "href": "course-materials/interactive-sessions/8a_github.html#initializing-a-local-git-repository", + "title": "Interactive Session", + "section": "Initializing a Local Git Repository", + "text": "Initializing a Local Git Repository\nNow, let’s set up your local directory as a Git repository:\n\nOpen a terminal on the class workbench server\nNavigate to your course work directory:\n\n\n\nCode\ncd path/to/your/EDS-217\n\n\n\nInitialize the repository:\n\n\n\nCode\ngit init\n\n\n\nAdd your GitHub repository as the remote origin:\n\n\n\nCode\ngit remote add origin https://github.com/your-username/EDS-217-Course-Work.git\n\n\nReplace your-username with your actual GitHub username." }, { - "objectID": "course-materials/coding-colabs/5c_cleaning_data.html#introduction", - "href": "course-materials/coding-colabs/5c_cleaning_data.html#introduction", - "title": "Day 5: 🙌 Coding Colab", - "section": "", - "text": "In this collaborative coding exercise, you will work together and apply your new data cleaning skills to a simple dataframe that has a suprising number of problems." + "objectID": "course-materials/interactive-sessions/8a_github.html#cleaning-jupyter-notebooks", + "href": "course-materials/interactive-sessions/8a_github.html#cleaning-jupyter-notebooks", + "title": "Interactive Session", + "section": "Cleaning Jupyter Notebooks", + "text": "Cleaning Jupyter Notebooks\nBefore we commit our notebooks, let’s clean them to remove output cells and execution data:\n\nInstall the nbstripout tool if you haven’t already:\n\n\n\nCode\npip install nbstripout\n\n\n\nConfigure nbstripout for your repository:\n\n\n\nCode\nnbstripout --install --attributes .gitattributes\n\n\nThis sets up nbstripout to automatically clean your notebooks when you commit them." }, { - "objectID": "course-materials/coding-colabs/5c_cleaning_data.html#resources", - "href": "course-materials/coding-colabs/5c_cleaning_data.html#resources", - "title": "Day 5: 🙌 Coding Colab", - "section": "Resources", - "text": "Resources\nHere’s our course cheatsheet on cleaning data:\n\nPandas Cleaning Cheatsheet\n\nFeel free to refer to this cheatsheet throughout the exercise if you need a quick reminder about syntax or functionality." + "objectID": "course-materials/interactive-sessions/8a_github.html#adding-committing-and-pushing-files", + "href": "course-materials/interactive-sessions/8a_github.html#adding-committing-and-pushing-files", + "title": "Interactive Session", + "section": "Adding, Committing, and Pushing Files", + "text": "Adding, Committing, and Pushing Files\nNow we’re ready to add our files to the repository:\n\nAdd all files in the directory:\n\n\n\nCode\ngit add .\n\n\n\nCommit the files:\n\n\n\nCode\ngit commit -m \"Initial commit: Adding EDS 217 course work\"\n\n\n\nPush the files to GitHub:\n\n\n\nCode\ngit push -u origin main\n\n\nNote: If your default branch is named “master” instead of “main”, use git push -u origin master." }, { - "objectID": "course-materials/coding-colabs/5c_cleaning_data.html#setup", - "href": "course-materials/coding-colabs/5c_cleaning_data.html#setup", - "title": "Day 5: 🙌 Coding Colab", - "section": "Setup", - "text": "Setup\nFirst, let’s import the necessary libraries and load an example messy dataframe.\n\nimport pandas as pd\nimport numpy as np\n\nurl = 'https://bit.ly/messy_csv'\nmessy_df = pd.read_csv(url)" + "objectID": "course-materials/interactive-sessions/8a_github.html#verifying-your-repository", + "href": "course-materials/interactive-sessions/8a_github.html#verifying-your-repository", + "title": "Interactive Session", + "section": "Verifying Your Repository", + "text": "Verifying Your Repository\n\nGo to your GitHub repository page in your web browser\nRefresh the page\nYou should now see all your course files listed in the repository" }, { - "objectID": "course-materials/coding-colabs/5c_cleaning_data.html#practical-exercise-cleaning-a-messy-environmental-dataset", - "href": "course-materials/coding-colabs/5c_cleaning_data.html#practical-exercise-cleaning-a-messy-environmental-dataset", - "title": "Day 5: 🙌 Coding Colab", - "section": "Practical Exercise: Cleaning a Messy Environmental Dataset", - "text": "Practical Exercise: Cleaning a Messy Environmental Dataset\nLet’s apply what we’ve learned so far to clean the messy environmental dataset.\nYour task is to clean this dataframe by\n\nRemoving duplicates\nHandling missing values (either fill or dropna to remove rows with missing data)\nEnsuring consistent data types (dates, strings)\nFormatting the ‘site’ column for consistency\nMaking sure all column names are lower case, without whitespace.\n\nTry to implement these steps using the techniques we’ve learned.\n\nEnd Coding Colab Session (Day 4)" + "objectID": "course-materials/interactive-sessions/8a_github.html#conclusion", + "href": "course-materials/interactive-sessions/8a_github.html#conclusion", + "title": "Interactive Session", + "section": "Conclusion", + "text": "Conclusion\nCongratulations! You’ve successfully created a GitHub repository for your EDS 217 course work, cleaned your Jupyter notebooks, and pushed your files to GitHub. This process helps you maintain a clean, professional repository of your work that you can easily share or refer back to in the future." }, { - "objectID": "course-materials/interactive-sessions/2a_getting_help.html", - "href": "course-materials/interactive-sessions/2a_getting_help.html", - "title": "Interactive Session 2A", + "objectID": "course-materials/interactive-sessions/8a_github.html#additional-resources", + "href": "course-materials/interactive-sessions/8a_github.html#additional-resources", + "title": "Interactive Session", + "section": "Additional Resources", + "text": "Additional Resources\n\nGitHub Docs: Creating a new repository\nGit Documentation\nnbstripout Documentation" + }, + { + "objectID": "course-materials/cheatsheets/comprehensions.html", + "href": "course-materials/cheatsheets/comprehensions.html", + "title": "EDS 217 Cheatsheet", "section": "", - "text": "Objective: Learn how to get help, work with variables, and explore methods available for different Python objects in Jupyter Notebooks.\nEstimated Time: 45-60 minutes" + "text": "This cheatsheet provides a quick reference for using comprehensions in Python, including list comprehensions, dictionary comprehensions, and how to incorporate conditional logic. Use this as a guide during your master’s program to write more concise and readable code." }, { - "objectID": "course-materials/interactive-sessions/2a_getting_help.html#getting-help-in-python", - "href": "course-materials/interactive-sessions/2a_getting_help.html#getting-help-in-python", - "title": "Interactive Session 2A", - "section": "1. Getting Help in Python", - "text": "1. Getting Help in Python\n\nUsing help()\n\nIn a Jupyter Notebook cell, type:\n\n\nCode\n #| echo: false\n\nhelp(len)\n\n\nHelp on built-in function len in module builtins:\n\nlen(obj, /)\n Return the number of items in a container.\n\n\n\nRun the cell to see detailed information about the len() function.\n\n\n\nTrying help() Yourself\n\nUse the help() function on other built-in functions like print or sum.\n\n\n\nUsing ? and ??\n\nType:\nRun the cell to see quick documentation.\nTry:\nThis gives more detailed information, including source code (if available)." + "objectID": "course-materials/cheatsheets/comprehensions.html#list-comprehensions", + "href": "course-materials/cheatsheets/comprehensions.html#list-comprehensions", + "title": "EDS 217 Cheatsheet", + "section": "List Comprehensions", + "text": "List Comprehensions\n\nBasic Syntax\nA list comprehension provides a concise way to create lists. The basic syntax is:\n\n\nCode\n# [expression for item in iterable]\nsquares = [i ** 2 for i in range(1, 6)]\nprint(squares)\n\n\n[1, 4, 9, 16, 25]\n\n\n\n\nWith Conditional Logic\nYou can add a condition to include only certain items in the new list:\n\n\nCode\n# [expression for item in iterable if condition]\neven_squares = [i ** 2 for i in range(1, 6) if i % 2 == 0]\nprint(even_squares)\n\n\n[4, 16]\n\n\n\n\nNested List Comprehensions\nList comprehensions can be nested to handle more complex data structures:\n\n\nCode\n# [(expression1, expression2) for item1 in iterable1 for item2 in iterable2]\npairs = [(i, j) for i in range(1, 4) for j in range(1, 3)]\nprint(pairs)\n\n\n[(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)]\n\n\n\n\nEvaluating Functions in a List Comprehension\nYou can use list comprehensions to apply a function to each item in an iterable:\n\n\nCode\n# Function to evaluate\ndef square(x):\n return x ** 2\n\n# List comprehension applying the function\nsquares = [square(i) for i in range(1, 6)]\nprint(squares)\n\n\n[1, 4, 9, 16, 25]" }, { - "objectID": "course-materials/interactive-sessions/2a_getting_help.html#working-with-variables", - "href": "course-materials/interactive-sessions/2a_getting_help.html#working-with-variables", - "title": "Interactive Session 2A", - "section": "2. Working with Variables", - "text": "2. Working with Variables\n\nCreating Variables\n\nIn a new cell, create a few variables:\nUse type() to check the data type of each variable:\n\n\nCode\ntype(a)\ntype(b)\ntype(c)\n\n\nstr\n\n\n\n\n\nExploring Variables\n\nExperiment with creating your own variables and checking their types.\nChange the values and data types and see what happens." + "objectID": "course-materials/cheatsheets/comprehensions.html#dictionary-comprehensions", + "href": "course-materials/cheatsheets/comprehensions.html#dictionary-comprehensions", + "title": "EDS 217 Cheatsheet", + "section": "Dictionary Comprehensions", + "text": "Dictionary Comprehensions\n\nBasic Syntax\nDictionary comprehensions provide a concise way to create dictionaries. The basic syntax is:\n\n\nCode\n# {key_expression: value_expression for item in iterable}\n# Example: Mapping fruit names to their lengths\nfruits = ['apple', 'banana', 'cherry']\nfruit_lengths = {fruit: len(fruit) for fruit in fruits}\nprint(fruit_lengths)\n\n\n{'apple': 5, 'banana': 6, 'cherry': 6}\n\n\n\n\nWithout zip\nYou can create a dictionary without using zip by leveraging the index:\n\n\nCode\n# {key_expression: value_expression for index in range(len(list))}\n# Example: Mapping employee IDs to names\nemployee_ids = [101, 102, 103]\nemployee_names = ['Alice', 'Bob', 'Charlie']\nid_to_name = {employee_ids[i]: employee_names[i] for i in range(len(employee_ids))}\nprint(id_to_name)\n\n\n{101: 'Alice', 102: 'Bob', 103: 'Charlie'}\n\n\n\n\nWith Conditional Logic\nYou can include conditions to filter out key-value pairs:\n\n\nCode\n# {key_expression: value_expression for item in iterable if condition}\n# Example: Filtering students who passed\nstudents = ['Alice', 'Bob', 'Charlie']\nscores = [85, 62, 90]\npassing_students = {students[i]: scores[i] for i in range(len(students)) if scores[i] >= 70}\nprint(passing_students)\n\n\n{'Alice': 85, 'Charlie': 90}\n\n\n\n\nEvaluating Functions in a Dictionary Comprehension\nYou can use dictionary comprehensions to apply a function to values in an iterable:\n\n\nCode\n# Function to evaluate\ndef capitalize_name(name):\n return name.upper()\n\n# Example: Mapping student names to capitalized names\nstudents = ['alice', 'bob', 'charlie']\ncapitalized_names = {name: capitalize_name(name) for name in students}\nprint(capitalized_names)\n\n\n{'alice': 'ALICE', 'bob': 'BOB', 'charlie': 'CHARLIE'}" }, { - "objectID": "course-materials/interactive-sessions/2a_getting_help.html#exploring-methods-available-for-objects", - "href": "course-materials/interactive-sessions/2a_getting_help.html#exploring-methods-available-for-objects", - "title": "Interactive Session 2A", - "section": "3. Exploring Methods Available for Objects", - "text": "3. Exploring Methods Available for Objects\n\nUsing dir()\n\nUse dir() to explore available methods for objects:\n\n\nCode\ndir(a)\ndir(b)\ndir(c)\n\n\n['__add__',\n '__class__',\n '__contains__',\n '__delattr__',\n '__dir__',\n '__doc__',\n '__eq__',\n '__format__',\n '__ge__',\n '__getattribute__',\n '__getitem__',\n '__getnewargs__',\n '__getstate__',\n '__gt__',\n '__hash__',\n '__init__',\n '__init_subclass__',\n '__iter__',\n '__le__',\n '__len__',\n '__lt__',\n '__mod__',\n '__mul__',\n '__ne__',\n '__new__',\n '__reduce__',\n '__reduce_ex__',\n '__repr__',\n '__rmod__',\n '__rmul__',\n '__setattr__',\n '__sizeof__',\n '__str__',\n '__subclasshook__',\n 'capitalize',\n 'casefold',\n 'center',\n 'count',\n 'encode',\n 'endswith',\n 'expandtabs',\n 'find',\n 'format',\n 'format_map',\n 'index',\n 'isalnum',\n 'isalpha',\n 'isascii',\n 'isdecimal',\n 'isdigit',\n 'isidentifier',\n 'islower',\n 'isnumeric',\n 'isprintable',\n 'isspace',\n 'istitle',\n 'isupper',\n 'join',\n 'ljust',\n 'lower',\n 'lstrip',\n 'maketrans',\n 'partition',\n 'removeprefix',\n 'removesuffix',\n 'replace',\n 'rfind',\n 'rindex',\n 'rjust',\n 'rpartition',\n 'rsplit',\n 'rstrip',\n 'split',\n 'splitlines',\n 'startswith',\n 'strip',\n 'swapcase',\n 'title',\n 'translate',\n 'upper',\n 'zfill']\n\n\n\n\n\nUsing help() with Methods\n\nPick a method from the list returned by dir() and use help() to learn more about it:\n\n\nCode\nhelp(c.upper)\n\n\nHelp on built-in function upper:\n\nupper() method of builtins.str instance\n Return a copy of the string converted to uppercase.\n\n\n\n\n\n\nExploring Methods\n\nTry calling a method on your variables:\n\n\n'HELLO, WORLD!'\n\n\n\n\n```\n\nEnd interactive session 2A" + "objectID": "course-materials/cheatsheets/comprehensions.html#best-practices-for-using-comprehensions", + "href": "course-materials/cheatsheets/comprehensions.html#best-practices-for-using-comprehensions", + "title": "EDS 217 Cheatsheet", + "section": "Best Practices for Using Comprehensions", + "text": "Best Practices for Using Comprehensions\n\nKeep It Simple: Use comprehensions for simple transformations and filtering. For complex logic, consider using traditional loops for better readability.\nNested Comprehensions: While powerful, nested comprehensions can be hard to read. Use them sparingly and consider breaking down the logic into multiple steps if needed.\nReadability: Always prioritize code readability. If a comprehension is difficult to understand, it might be better to use a loop." }, { - "objectID": "course-materials/coding-colabs/4b_pandas_dataframes.html#introduction", - "href": "course-materials/coding-colabs/4b_pandas_dataframes.html#introduction", - "title": "Coding Colab", - "section": "Introduction", - "text": "Introduction\nIn this collaborative coding exercise, you’ll work with a partner to practice importing, cleaning, exploring, and analyzing DataFrames using pandas. You’ll be working with a dataset containing yearly visitor information about national parks in the United States.\nHelpful class CheatSheets:\nPandas DataFrames\nPandas PDF Cheat Sheet\nDataFrame Workflows" + "objectID": "course-materials/cheatsheets/comprehensions.html#additional-resources", + "href": "course-materials/cheatsheets/comprehensions.html#additional-resources", + "title": "EDS 217 Cheatsheet", + "section": "Additional Resources", + "text": "Additional Resources\n\nOfficial Python Documentation: List Comprehensions\nPython Dictionary Comprehensions: Dictionary Comprehensions\nPEP 202: PEP 202 - List Comprehensions" }, { - "objectID": "course-materials/coding-colabs/4b_pandas_dataframes.html#setup", - "href": "course-materials/coding-colabs/4b_pandas_dataframes.html#setup", - "title": "Coding Colab", - "section": "Setup", - "text": "Setup\nFirst, let’s import the necessary libraries and load our dataset.\n\n\nCode\nimport pandas as pd\nimport numpy as np\n\n# Load the dataset\nurl = \"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-09-17/national_parks.csv\"\nparks_df = pd.read_csv(url)" + "objectID": "course-materials/coding-colabs/3d_pandas_series.html", + "href": "course-materials/coding-colabs/3d_pandas_series.html", + "title": "Day 3: 🙌 Coding Colab", + "section": "", + "text": "In this collaborative coding exercise, we’ll explore Pandas Series, a fundamental data structure in the Pandas library. You’ll work together to create, manipulate, and analyze Series objects." }, { - "objectID": "course-materials/coding-colabs/4b_pandas_dataframes.html#task-1-data-exploration-and-cleaning", - "href": "course-materials/coding-colabs/4b_pandas_dataframes.html#task-1-data-exploration-and-cleaning", - "title": "Coding Colab", - "section": "Task 1: Data Exploration and Cleaning", - "text": "Task 1: Data Exploration and Cleaning\nWith your partner, explore the DataFrame and perform some initial cleaning. Create cells in your notebook that provide the following information\n\n\n\n\n\n\nTip\n\n\n\nUse print() statements and/or f-strings to create your output in a way that makes it easy to understand your results.\n\n\n\nHow many rows and columns does the DataFrame have?\nWhat are the column names?\nWhat data types are used in each column?\nAre there any missing values in the DataFrame?\nRemove the rows where year is Total (these are summary rows we don’t need for our analysis).\nConvert the year column to numeric type." + "objectID": "course-materials/coding-colabs/3d_pandas_series.html#introduction", + "href": "course-materials/coding-colabs/3d_pandas_series.html#introduction", + "title": "Day 3: 🙌 Coding Colab", + "section": "", + "text": "In this collaborative coding exercise, we’ll explore Pandas Series, a fundamental data structure in the Pandas library. You’ll work together to create, manipulate, and analyze Series objects." }, { - "objectID": "course-materials/coding-colabs/4b_pandas_dataframes.html#task-2-basic-filtering-and-analysis", - "href": "course-materials/coding-colabs/4b_pandas_dataframes.html#task-2-basic-filtering-and-analysis", - "title": "Coding Colab", - "section": "Task 2: Basic Filtering and Analysis", - "text": "Task 2: Basic Filtering and Analysis\nNow, let’s practice some basic filtering and analysis operations:\n\nCreate a new DataFrame containing only data for the years 2000-2015 and only data for National Parks (unit_type is National Park)\nFind the total number of visitors across all National Parks for each year from 2000-2015.\nCalculate the average yearly visitors for each National Park during the 2000-2015 period.\nIdentify the top 5 most visited National Parks (based on total visitors) during the 2000-2015 period." + "objectID": "course-materials/coding-colabs/3d_pandas_series.html#resources", + "href": "course-materials/coding-colabs/3d_pandas_series.html#resources", + "title": "Day 3: 🙌 Coding Colab", + "section": "Resources", + "text": "Resources\nHere’s our course cheatsheet on pandas Series:\n\nPandas Series Cheatsheet\n\nFeel free to refer to this cheatsheet throughout the exercise if you need a quick reminder about syntax or functionality." }, { - "objectID": "course-materials/coding-colabs/4b_pandas_dataframes.html#task-3-thinking-in-dataframes", - "href": "course-materials/coding-colabs/4b_pandas_dataframes.html#task-3-thinking-in-dataframes", - "title": "Coding Colab", - "section": "Task 3: Thinking in DataFrames", - "text": "Task 3: Thinking in DataFrames\n\nIn 2016, a blog post from 538.com explored these data. Take a look at the graphics in the post that use our data and discuss with your partner what steps and functions you think would be necessary to filter, group, and aggregate your dataframe in order to make any of the plots. See if you can make “rough drafts” of any of them using the simple DataFrame.plot() function." + "objectID": "course-materials/coding-colabs/3d_pandas_series.html#setup", + "href": "course-materials/coding-colabs/3d_pandas_series.html#setup", + "title": "Day 3: 🙌 Coding Colab", + "section": "Setup", + "text": "Setup\nFirst, let’s import the necessary libraries and create a sample Series.\n\nimport pandas as pd\nimport numpy as np\n\n# Create a sample Series\nfruits = pd.Series(['apple', 'banana', 'cherry', 'date', 'elderberry'], name='Fruits')\nprint(fruits)\n\n0 apple\n1 banana\n2 cherry\n3 date\n4 elderberry\nName: Fruits, dtype: object" }, { - "objectID": "course-materials/coding-colabs/4b_pandas_dataframes.html#conclusion", - "href": "course-materials/coding-colabs/4b_pandas_dataframes.html#conclusion", - "title": "Coding Colab", - "section": "Conclusion", - "text": "Conclusion\nGreat job working through these exercises! You’ve practiced importing data, cleaning a dataset, exploring DataFrames, and performing various filtering and analysis operations using pandas. These skills are fundamental to data analysis in Python and will be valuable as you continue to work with more complex datasets." + "objectID": "course-materials/coding-colabs/3d_pandas_series.html#exercise-1-creating-a-series", + "href": "course-materials/coding-colabs/3d_pandas_series.html#exercise-1-creating-a-series", + "title": "Day 3: 🙌 Coding Colab", + "section": "Exercise 1: Creating a Series", + "text": "Exercise 1: Creating a Series\nWork together to create a Series representing the prices of the fruits in our fruits Series.\n\n# Your code here\n# Create a Series called 'prices' with the same index as 'fruits'\n# Use these prices: apple: $0.5, banana: $0.3, cherry: $1.0, date: $1.5, elderberry: $2.0" }, { - "objectID": "course-materials/live-coding/4d_data_import_export.html#overview", - "href": "course-materials/live-coding/4d_data_import_export.html#overview", - "title": "Live Coding Session 4D", - "section": "Overview", - "text": "Overview\nIn this session, we will be exploring data import using the read_csv() function in pandas. Live coding is a great way to learn programming as it allows you to see the process of writing code in real-time, including how to deal with unexpected issues and debug errors." + "objectID": "course-materials/coding-colabs/3d_pandas_series.html#exercise-2-series-operations", + "href": "course-materials/coding-colabs/3d_pandas_series.html#exercise-2-series-operations", + "title": "Day 3: 🙌 Coding Colab", + "section": "Exercise 2: Series Operations", + "text": "Exercise 2: Series Operations\nCollaborate to perform the following operations:\n\nCalculate the total price of all fruits.\nFind the most expensive fruit.\nApply a 10% discount to all fruits priced over $1.0.\n\n\n# Your code here\n# 1. Calculate the total price of all fruits\n# 2. Find the most expensive fruit\n# 3. Apply a 10% discount to all fruits priced over $1.0" }, { - "objectID": "course-materials/live-coding/4d_data_import_export.html#objectives", - "href": "course-materials/live-coding/4d_data_import_export.html#objectives", - "title": "Live Coding Session 4D", - "section": "Objectives", - "text": "Objectives\n\nUnderstand the fundamentals of flow control in Python.\nUse read_csv() options to handle different .csv file structures.\nLearn how to parse dates and handle missing data during import.\nLearn how to filter columns and handle large files.\n\nDevelop the ability to troubleshoot and debug in a live setting." + "objectID": "course-materials/coding-colabs/3d_pandas_series.html#exercise-3-series-analysis", + "href": "course-materials/coding-colabs/3d_pandas_series.html#exercise-3-series-analysis", + "title": "Day 3: 🙌 Coding Colab", + "section": "Exercise 3: Series Analysis", + "text": "Exercise 3: Series Analysis\nWork as a team to answer the following questions:\n\nWhat is the average price of the fruits?\nHow many fruits cost less than $1.0?\nWhat is the price range (difference between max and min prices)?\n\n\n# Your code here\n# 1. Calculate the average price of the fruits\n# 2. Count how many fruits cost less than $1.0\n# 3. Calculate the price range (difference between max and min prices)" }, { - "objectID": "course-materials/live-coding/4d_data_import_export.html#getting-started", - "href": "course-materials/live-coding/4d_data_import_export.html#getting-started", - "title": "Live Coding Session 4D", - "section": "Getting Started", - "text": "Getting Started\nTo get the most out of this session, please follow these guidelines:\nPrepare Your Environment: - Log into our server and start JupyterLab. - Open a new Jupyter notebook where you can write your own code as we go along. - Make sure to name the notebook something informative so you can refer back to it later.\n\nStep 1: Create a New Notebook\n\nOpen Jupyter Lab or Jupyter Notebook.\nCreate a new Python notebook.\nRename your notebook to pd_read_csv.ipynb.\n\n\n\nStep 2: Import Required Libraries\nIn the first cell of your notebook, import the necessary libraries:\n\nimport pandas as pd\nimport numpy as np\n\n\n\nStep 3: Set Up Data URLs\nTo ensure we’re all working with the same data, copy and paste the following URLs into a new code cell and run the cell (SHIFT-ENTER):\n\n# URLs for different CSV files we'll be using\nurl_basic = 'https://bit.ly/eds217-basic'\nurl_missing = 'https://bit.ly/eds217-missing'\nurl_dates = 'https://bit.ly/eds217-dates'\nurl_no_header = 'https://bit.ly/eds217-noheader'\nurl_tsv = 'https://bit.ly/eds217-tabs'\nurl_large = 'https://bit.ly/eds217-large'\n\n\n\nStep 4: Prepare Markdown Cells for Notes\nCreate several markdown cells throughout your notebook to take notes during the session. Here are some suggested headers:\n\nBasic Usage and Column Selection\nHandling Missing Data\nParsing Dates\nWorking with Files Without Headers\nWorking with Tab-Separated Values (TSV) Files\nHandling Large Files: Reading a Subset of Data\n\n\n\nStep 5: Create Code Cells for Each Topic\nUnder each markdown header, create empty code cells where you’ll write and execute code during the live session.\n\n\nStep 6: Final Preparations\n\nEnsure you have a stable internet connection to access the CSV files.\nHave the Pandas documentation page open in a separate tab for quick reference: https://pandas.pydata.org/docs/\n\n\n\nReady to Go!\nYou’re now set up and ready to follow along with the live coding session. Remember to actively code along and take notes in your markdown cells. Don’t hesitate to ask questions during the session!\nHappy coding!" + "objectID": "course-materials/coding-colabs/3d_pandas_series.html#exercise-4-series-manipulation", + "href": "course-materials/coding-colabs/3d_pandas_series.html#exercise-4-series-manipulation", + "title": "Day 3: 🙌 Coding Colab", + "section": "Exercise 4: Series Manipulation", + "text": "Exercise 4: Series Manipulation\nCollaborate to perform these manipulations on the fruits and prices Series:\n\nAdd a new fruit ‘fig’ with a price of $1.2 to both Series using pd.concat\nRemove ‘banana’ from both Series.\nSort both Series by fruit name (alphabetically).\n\n\n# Your code here\n# 1. Add 'fig' to both Series (price: $1.2)\n# 2. Remove 'banana' from both Series\n# 3. Sort both Series alphabetically by fruit name" }, { - "objectID": "course-materials/live-coding/4d_data_import_export.html#session-format", - "href": "course-materials/live-coding/4d_data_import_export.html#session-format", - "title": "Live Coding Session 4D", - "section": "Session Format", - "text": "Session Format\n\nIntroduction\n\nBrief discussion about the topic and its importance in data science.\n\n\n\nDemonstration\n\nI will demonstrate code examples live. Follow along and write the code into your own Jupyter notebook.\n\n\n\nPractice\n\nYou will have the opportunity to try exercises on your own to apply what you’ve learned.\n\n\n\nQ&A\n\nWe will have a Q&A session at the end where you can ask specific questions about the code, concepts, or issues encountered during the session." + "objectID": "course-materials/coding-colabs/3d_pandas_series.html#conclusion", + "href": "course-materials/coding-colabs/3d_pandas_series.html#conclusion", + "title": "Day 3: 🙌 Coding Colab", + "section": "Conclusion", + "text": "Conclusion\nIn this collaborative exercise, you’ve practiced creating, manipulating, and analyzing Pandas Series. You’ve learned how to perform basic operations, apply conditions, and modify Series objects. These skills will be valuable as you work with more complex datasets in the future." }, { - "objectID": "course-materials/live-coding/4d_data_import_export.html#after-the-session", - "href": "course-materials/live-coding/4d_data_import_export.html#after-the-session", - "title": "Live Coding Session 4D", - "section": "After the Session", - "text": "After the Session\n\nReview your notes and try to replicate the exercises on your own.\nExperiment with the code by modifying parameters or adding new features to deepen your understanding.\nCheck out our class read_csv() cheatsheet." + "objectID": "course-materials/coding-colabs/3d_pandas_series.html#discussion-questions", + "href": "course-materials/coding-colabs/3d_pandas_series.html#discussion-questions", + "title": "Day 3: 🙌 Coding Colab", + "section": "Discussion Questions", + "text": "Discussion Questions\n\nWhat advantages does using a Pandas Series offer compared to using a Python list or dictionary?\nCan you think of a real-world scenario where you might use a Pandas Series instead of a DataFrame?\nWhat challenges did you face while working with Series in this exercise, and how did you overcome them?\n\nDiscuss these questions with your team and share your insights.\n\nEnd Coding Colab Session (Day 4)" }, { - "objectID": "course-materials/cheatsheets/functions.html", - "href": "course-materials/cheatsheets/functions.html", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "In Python, a function is defined using the def keyword, followed by the function name and parentheses () that may include parameters.\n\n\nCode\ndef function_name(parameters):\n # Function body\n return result\n\n\n\n\n\n\n\nCode\ndef celsius_to_fahrenheit(celsius):\n \"\"\"Convert Celsius to Fahrenheit.\"\"\"\n fahrenheit = (celsius * 9/5) + 32\n return fahrenheit\n\n\n\n\n\nCall a function by using its name followed by parentheses, and pass arguments if the function requires them.\n\n\nCode\ntemperature_celsius = 25\ntemperature_fahrenheit = celsius_to_fahrenheit(temperature_celsius)\nprint(temperature_fahrenheit) # Output: 77.0\n\n\n77.0\n\n\n\n\n\n\n\n\n\n\nCode\ndef kilometers_to_miles(kilometers):\n \"\"\"Convert kilometers to miles.\"\"\"\n miles = kilometers * 0.621371\n return miles\n\n# Usage\ndistance_km = 10\ndistance_miles = kilometers_to_miles(distance_km)\nprint(distance_miles) # Output: 6.21371\n\n\n6.21371\n\n\n\n\n\ndef mps_to_kmph(mps):\n \"\"\"Convert meters per second to kilometers per hour.\"\"\"\n kmph = mps * 3.6\n return kmph\n\n# Usage\nspeed_mps = 5\nspeed_kmph = mps_to_kmph(speed_mps)\nprint(speed_kmph) # Output: 18.0\n\n\n\n\n\n\nYou can return multiple values from a function by using a tuple.\nimport statistics\n\ndef calculate_mean_std(data):\n \"\"\"Calculate mean and standard deviation of a dataset.\"\"\"\n mean = statistics.mean(data)\n std_dev = statistics.stdev(data)\n return mean, std_dev\n\n# Usage\ndata = [12, 15, 20, 22, 25]\nmean, std_dev = calculate_mean_std(data)\nprint(f\"Mean: {mean}, Standard Deviation: {std_dev}\")\n\n\n\n\nYou can set default values for parameters, making them optional when calling the function.\n\n\ndef convert_temperature(temp, from_unit='C', to_unit='F'):\n \"\"\"Convert temperature between Celsius and Fahrenheit.\"\"\"\n if from_unit == 'C' and to_unit == 'F':\n return (temp * 9/5) + 32\n elif from_unit == 'F' and to_unit == 'C':\n return (temp - 32) * 5/9\n else:\n return temp # No conversion needed\n\n# Usage\ntemp_in_fahrenheit = convert_temperature(25) # Defaults to C to F\ntemp_in_celsius = convert_temperature(77, from_unit='F', to_unit='C')\nprint(temp_in_fahrenheit) # Output: 77.0\nprint(temp_in_celsius) # Output: 25.0\n\n\n\n\nYou can call a function using keyword arguments to make it clearer which arguments are being set, especially useful when many parameters are involved.\n# Call using keyword arguments\ntemp = convert_temperature(temp=25, from_unit='C', to_unit='F')\n\n\n\nA higher-order function is a function that can take other functions as arguments or return them as results.\n\n\ndef apply_conversion(conversion_func, data):\n \"\"\"Apply a conversion function to a list of data.\"\"\"\n return [conversion_func(value) for value in data]\n\n# Convert a list of temperatures from Celsius to Fahrenheit\ntemperatures_celsius = [0, 20, 30, 40]\ntemperatures_fahrenheit = apply_conversion(celsius_to_fahrenheit, temperatures_celsius)\nprint(temperatures_fahrenheit) # Output: [32.0, 68.0, 86.0, 104.0]\n\n\n\n\n\n\nDegree days are a measure of heat accumulation used to predict plant and animal development rates.\ndef calculate_degree_days(daily_temps, base_temp=10):\n \"\"\"Calculate degree days for a series of daily temperatures.\"\"\"\n degree_days = 0\n for temp in daily_temps:\n if temp > base_temp:\n degree_days += temp - base_temp\n return degree_days\n\n# Usage\ndaily_temps = [12, 15, 10, 18, 20, 7]\ndegree_days = calculate_degree_days(daily_temps)\nprint(degree_days) # Output: 35\n\n\n\n\nFunctions encapsulate reusable code logic and can simplify complex operations.\nParameters allow for input variability, while return values provide output.\nUse default parameters and keyword arguments to enhance flexibility and readability.\nHigher-order functions enable more abstract and powerful code structures." + "objectID": "course-materials/lectures/seaborn.html#philosophy-of-seaborn", + "href": "course-materials/lectures/seaborn.html#philosophy-of-seaborn", + "title": "Introduction to Seaborn", + "section": "Philosophy of Seaborn", + "text": "Philosophy of Seaborn\nSeaborn aims to make visualization a central part of exploring and understanding data.\nIts dataset-oriented plotting functions operate on dataframes and arrays containing whole datasets.\nIt tries to automatically perform semantic mapping and statistical aggregation to produce informative plots." }, { - "objectID": "course-materials/cheatsheets/functions.html#basics-of-functions", - "href": "course-materials/cheatsheets/functions.html#basics-of-functions", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "In Python, a function is defined using the def keyword, followed by the function name and parentheses () that may include parameters.\n\n\nCode\ndef function_name(parameters):\n # Function body\n return result\n\n\n\n\n\n\n\nCode\ndef celsius_to_fahrenheit(celsius):\n \"\"\"Convert Celsius to Fahrenheit.\"\"\"\n fahrenheit = (celsius * 9/5) + 32\n return fahrenheit\n\n\n\n\n\nCall a function by using its name followed by parentheses, and pass arguments if the function requires them.\n\n\nCode\ntemperature_celsius = 25\ntemperature_fahrenheit = celsius_to_fahrenheit(temperature_celsius)\nprint(temperature_fahrenheit) # Output: 77.0\n\n\n77.0" + "objectID": "course-materials/lectures/seaborn.html#main-ideas-in-seaborn", + "href": "course-materials/lectures/seaborn.html#main-ideas-in-seaborn", + "title": "Introduction to Seaborn", + "section": "Main Ideas in Seaborn", + "text": "Main Ideas in Seaborn\n\nIntegration with Pandas: Works well with Pandas data structures.\nBuilt-in Themes: Provides built-in themes for styling matplotlib graphics.\nColor Palettes: Offers a variety of color palettes to reveal patterns in the data.\nStatistical Estimation: Seaborn includes functions to fit and visualize linear regression models." }, { - "objectID": "course-materials/cheatsheets/functions.html#common-unit-conversions", - "href": "course-materials/cheatsheets/functions.html#common-unit-conversions", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "Code\ndef kilometers_to_miles(kilometers):\n \"\"\"Convert kilometers to miles.\"\"\"\n miles = kilometers * 0.621371\n return miles\n\n# Usage\ndistance_km = 10\ndistance_miles = kilometers_to_miles(distance_km)\nprint(distance_miles) # Output: 6.21371\n\n\n6.21371\n\n\n\n\n\ndef mps_to_kmph(mps):\n \"\"\"Convert meters per second to kilometers per hour.\"\"\"\n kmph = mps * 3.6\n return kmph\n\n# Usage\nspeed_mps = 5\nspeed_kmph = mps_to_kmph(speed_mps)\nprint(speed_kmph) # Output: 18.0" + "objectID": "course-materials/lectures/seaborn.html#major-features-of-seaborn", + "href": "course-materials/lectures/seaborn.html#major-features-of-seaborn", + "title": "Introduction to Seaborn", + "section": "Major Features of Seaborn", + "text": "Major Features of Seaborn\nSeaborn simplifies many aspects of creating complex visualizations in Python. Some of its major features include:\n\nFacetGrids and PairGrids: For plotting conditional relationships.\nFactorplot: For categorical variables.\nJointplot: For joint distributions.\nTime Series functionality: Through functions like tsplot." }, { - "objectID": "course-materials/cheatsheets/functions.html#handling-multiple-return-values", - "href": "course-materials/cheatsheets/functions.html#handling-multiple-return-values", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "You can return multiple values from a function by using a tuple.\nimport statistics\n\ndef calculate_mean_std(data):\n \"\"\"Calculate mean and standard deviation of a dataset.\"\"\"\n mean = statistics.mean(data)\n std_dev = statistics.stdev(data)\n return mean, std_dev\n\n# Usage\ndata = [12, 15, 20, 22, 25]\nmean, std_dev = calculate_mean_std(data)\nprint(f\"Mean: {mean}, Standard Deviation: {std_dev}\")" + "objectID": "course-materials/lectures/seaborn.html#using-seaborn", + "href": "course-materials/lectures/seaborn.html#using-seaborn", + "title": "Introduction to Seaborn", + "section": "Using seaborn", + "text": "Using seaborn\nimport seaborn as sns\nWhy sns?" }, { - "objectID": "course-materials/cheatsheets/functions.html#default-parameters", - "href": "course-materials/cheatsheets/functions.html#default-parameters", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "You can set default values for parameters, making them optional when calling the function.\n\n\ndef convert_temperature(temp, from_unit='C', to_unit='F'):\n \"\"\"Convert temperature between Celsius and Fahrenheit.\"\"\"\n if from_unit == 'C' and to_unit == 'F':\n return (temp * 9/5) + 32\n elif from_unit == 'F' and to_unit == 'C':\n return (temp - 32) * 5/9\n else:\n return temp # No conversion needed\n\n# Usage\ntemp_in_fahrenheit = convert_temperature(25) # Defaults to C to F\ntemp_in_celsius = convert_temperature(77, from_unit='F', to_unit='C')\nprint(temp_in_fahrenheit) # Output: 77.0\nprint(temp_in_celsius) # Output: 25.0" + "objectID": "course-materials/lectures/seaborn.html#theme-options", + "href": "course-materials/lectures/seaborn.html#theme-options", + "title": "Introduction to Seaborn", + "section": "Theme Options", + "text": "Theme Options\n# Set the theme to whitegrid\nsns.set_theme(style=\"whitegrid\")\n\ndarkgrid: The default theme. Background is a dark gray grid (not to be confused with a solid gray).\nwhitegrid: Similar to darkgrid but with a lighter background. This theme is particularly useful for plots with dense data points." }, { - "objectID": "course-materials/cheatsheets/functions.html#using-keyword-arguments", - "href": "course-materials/cheatsheets/functions.html#using-keyword-arguments", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "You can call a function using keyword arguments to make it clearer which arguments are being set, especially useful when many parameters are involved.\n# Call using keyword arguments\ntemp = convert_temperature(temp=25, from_unit='C', to_unit='F')" + "objectID": "course-materials/lectures/seaborn.html#themes-continued", + "href": "course-materials/lectures/seaborn.html#themes-continued", + "title": "Introduction to Seaborn", + "section": "Themes (continued)", + "text": "Themes (continued)\n\ndark: This theme provides a dark background without any grid lines. It’s suitable for presentations or where visuals are prioritized.\nwhite: Offers a clean, white background without grid lines. This is well in situations where the data and annotations need to stand out without any additional distraction.\nticks: This theme is similar to the white theme but adds ticks on the axes, which enhances the precision of interpreting the data." }, { - "objectID": "course-materials/cheatsheets/functions.html#higher-order-functions", - "href": "course-materials/cheatsheets/functions.html#higher-order-functions", - "title": "EDS 217 Cheatsheet", + "objectID": "course-materials/lectures/seaborn.html#getting-ready-to-seaborn", + "href": "course-materials/lectures/seaborn.html#getting-ready-to-seaborn", + "title": "Introduction to Seaborn", + "section": "Getting ready to Seaborn", + "text": "Getting ready to Seaborn\nImport the library and set a style\n\nimport seaborn as sns # (but now you know it should have been ssn 🤓)\nsns.set(style=\"darkgrid\") # This is the default, so skip it if wanted" + }, + { + "objectID": "course-materials/lectures/seaborn.html#conclusion", + "href": "course-materials/lectures/seaborn.html#conclusion", + "title": "Introduction to Seaborn", + "section": "Conclusion", + "text": "Conclusion\nSeaborn is a versatile and powerful tool for statistical data visualization in Python. Whether you need to visualize the distribution of a dataset, the relationship between multiple variables, or the dependencies between categorical data, Seaborn has a plot type to make your analysis more intuitive and insightful." + }, + { + "objectID": "course-materials/coding-colabs/7c_visualizations.html", + "href": "course-materials/coding-colabs/7c_visualizations.html", + "title": "Day 7: 🙌 Coding Colab", "section": "", - "text": "A higher-order function is a function that can take other functions as arguments or return them as results.\n\n\ndef apply_conversion(conversion_func, data):\n \"\"\"Apply a conversion function to a list of data.\"\"\"\n return [conversion_func(value) for value in data]\n\n# Convert a list of temperatures from Celsius to Fahrenheit\ntemperatures_celsius = [0, 20, 30, 40]\ntemperatures_fahrenheit = apply_conversion(celsius_to_fahrenheit, temperatures_celsius)\nprint(temperatures_fahrenheit) # Output: [32.0, 68.0, 86.0, 104.0]" + "text": "In this collaborative coding exercise, you’ll work with a partner to explore a dataset using the seaborn library. You’ll focus on a workflow that includes:\n\nExploring distributions with histograms\nExamining correlations among variables\nInvestigating relationships more closely with regression plots and joint distribution plots\n\nWe’ll be using the Palmer Penguins dataset, which contains information about different penguin species, their physical characteristics, and the islands they inhabit." }, { - "objectID": "course-materials/cheatsheets/functions.html#practical-example-climate-data-analysis", - "href": "course-materials/cheatsheets/functions.html#practical-example-climate-data-analysis", - "title": "EDS 217 Cheatsheet", + "objectID": "course-materials/coding-colabs/7c_visualizations.html#introduction", + "href": "course-materials/coding-colabs/7c_visualizations.html#introduction", + "title": "Day 7: 🙌 Coding Colab", "section": "", - "text": "Degree days are a measure of heat accumulation used to predict plant and animal development rates.\ndef calculate_degree_days(daily_temps, base_temp=10):\n \"\"\"Calculate degree days for a series of daily temperatures.\"\"\"\n degree_days = 0\n for temp in daily_temps:\n if temp > base_temp:\n degree_days += temp - base_temp\n return degree_days\n\n# Usage\ndaily_temps = [12, 15, 10, 18, 20, 7]\ndegree_days = calculate_degree_days(daily_temps)\nprint(degree_days) # Output: 35\n\n\n\n\nFunctions encapsulate reusable code logic and can simplify complex operations.\nParameters allow for input variability, while return values provide output.\nUse default parameters and keyword arguments to enhance flexibility and readability.\nHigher-order functions enable more abstract and powerful code structures." + "text": "In this collaborative coding exercise, you’ll work with a partner to explore a dataset using the seaborn library. You’ll focus on a workflow that includes:\n\nExploring distributions with histograms\nExamining correlations among variables\nInvestigating relationships more closely with regression plots and joint distribution plots\n\nWe’ll be using the Palmer Penguins dataset, which contains information about different penguin species, their physical characteristics, and the islands they inhabit." }, { - "objectID": "course-materials/live-coding/5a_selecting_and_filtering.html#overview", - "href": "course-materials/live-coding/5a_selecting_and_filtering.html#overview", - "title": "Live Coding Session 5A", - "section": "Overview", - "text": "Overview\nIn this session, we will be exploring how to select and filter data from DataFrames." + "objectID": "course-materials/coding-colabs/7c_visualizations.html#setup", + "href": "course-materials/coding-colabs/7c_visualizations.html#setup", + "title": "Day 7: 🙌 Coding Colab", + "section": "Setup", + "text": "Setup\nFirst, let’s import the necessary libraries and load our dataset.\n\n\nCode\nimport pandas as pd\nimport seaborn as sns\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n# Set the style for better-looking plots\nsns.set_style(\"whitegrid\")\n\n# Load the Palmer Penguins dataset\npenguins = sns.load_dataset(\"penguins\")\n\n# Display the first few rows and basic information about the dataset\nprint(penguins.head())\nprint(penguins.info())\n\n\n species island bill_length_mm bill_depth_mm flipper_length_mm \\\n0 Adelie Torgersen 39.1 18.7 181.0 \n1 Adelie Torgersen 39.5 17.4 186.0 \n2 Adelie Torgersen 40.3 18.0 195.0 \n3 Adelie Torgersen NaN NaN NaN \n4 Adelie Torgersen 36.7 19.3 193.0 \n\n body_mass_g sex \n0 3750.0 Male \n1 3800.0 Female \n2 3250.0 Female \n3 NaN NaN \n4 3450.0 Female \n<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 344 entries, 0 to 343\nData columns (total 7 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 species 344 non-null object \n 1 island 344 non-null object \n 2 bill_length_mm 342 non-null float64\n 3 bill_depth_mm 342 non-null float64\n 4 flipper_length_mm 342 non-null float64\n 5 body_mass_g 342 non-null float64\n 6 sex 333 non-null object \ndtypes: float64(4), object(3)\nmemory usage: 18.9+ KB\nNone" }, { - "objectID": "course-materials/live-coding/5a_selecting_and_filtering.html#objectives", - "href": "course-materials/live-coding/5a_selecting_and_filtering.html#objectives", - "title": "Live Coding Session 5A", - "section": "Objectives", - "text": "Objectives\n\nApply various indexing methods to select rows and columns in dataframes.\nUse boolean logic to filter data based on values\nDevelop the ability to troubleshoot and debug in a live setting." + "objectID": "course-materials/coding-colabs/7c_visualizations.html#task-1-exploring-distributions-with-histograms", + "href": "course-materials/coding-colabs/7c_visualizations.html#task-1-exploring-distributions-with-histograms", + "title": "Day 7: 🙌 Coding Colab", + "section": "Task 1: Exploring Distributions with Histograms", + "text": "Task 1: Exploring Distributions with Histograms\nLet’s start by exploring the distributions of various numerical variables in our dataset using histograms.\n\nCreate histograms for ‘bill_length_mm’, ‘bill_depth_mm’, ‘flipper_length_mm’, and ‘body_mass_g’.\nExperiment with different numbers of bins to see how it affects the visualization.\nTry using sns.histplot() with the ‘kde’ parameter set to True to overlay a kernel density estimate." }, { - "objectID": "course-materials/live-coding/5a_selecting_and_filtering.html#getting-started", - "href": "course-materials/live-coding/5a_selecting_and_filtering.html#getting-started", - "title": "Live Coding Session 5A", + "objectID": "course-materials/coding-colabs/7c_visualizations.html#task-2-examining-correlations", + "href": "course-materials/coding-colabs/7c_visualizations.html#task-2-examining-correlations", + "title": "Day 7: 🙌 Coding Colab", + "section": "Task 2: Examining Correlations", + "text": "Task 2: Examining Correlations\nNow, let’s look at the correlations between the numerical variables in our dataset using Seaborn’s built-in correlation plot.\n\nUse sns.pairplot() to create a grid of scatter plots for all numeric variables.\nModify the pairplot to show the species information using different colors.\nInterpret the pairplot: which variables seem to be most strongly correlated? Do you notice any patterns related to species?" + }, + { + "objectID": "course-materials/coding-colabs/7c_visualizations.html#task-3-investigating-relationships-with-regression-plots", + "href": "course-materials/coding-colabs/7c_visualizations.html#task-3-investigating-relationships-with-regression-plots", + "title": "Day 7: 🙌 Coding Colab", + "section": "Task 3: Investigating Relationships with Regression Plots", + "text": "Task 3: Investigating Relationships with Regression Plots\nLet’s dig deeper into the relationships between variables using regression plots.\n\nCreate a regression plot (sns.regplot) showing the relationship between ‘flipper_length_mm’ and ‘body_mass_g’.\nCreate another regplot showing the relationship between ‘bill_length_mm’ and ‘bill_depth_mm’.\nTry adding the ‘species’ information to one of these plots using different colors. Hint: You might want to use sns.lmplot for this." + }, + { + "objectID": "course-materials/coding-colabs/7c_visualizations.html#task-4-joint-distribution-plots", + "href": "course-materials/coding-colabs/7c_visualizations.html#task-4-joint-distribution-plots", + "title": "Day 7: 🙌 Coding Colab", + "section": "Task 4: Joint Distribution Plots", + "text": "Task 4: Joint Distribution Plots\nFinally, let’s use joint distribution plots to examine both the relationship between two variables and their individual distributions.\n\nCreate a joint plot for ‘flipper_length_mm’ and ‘body_mass_g’.\nExperiment with different kind parameters in the joint plot (e.g., ‘scatter’, ‘kde’, ‘hex’).\nCreate another joint plot, this time for ‘bill_length_mm’ and ‘bill_depth_mm’, colored by species." + }, + { + "objectID": "course-materials/coding-colabs/7c_visualizations.html#bonus-challenge", + "href": "course-materials/coding-colabs/7c_visualizations.html#bonus-challenge", + "title": "Day 7: 🙌 Coding Colab", + "section": "Bonus Challenge", + "text": "Bonus Challenge\nIf you finish early, try this bonus challenge:\nCreate a correlation matrix heatmap using Seaborn’s sns.heatmap() function. This will provide a different view of the correlations between variables compared to the pairplot.\n\nCreate a correlation matrix using the numerical columns in the dataset.\n\n\n\n\n\n\n\nCreating correlation matricies in pandas\n\n\n\nPandas dataframes include two built-in methods that can be combined to quickly create a correlation matrix between all the numerical data in a dataframe.\n\n.select_dtypes() is a method that selects only the columns of a dataframe that match a type of data. Running the .select_dtypes(include=np.number) method on a dataframe will return a new dataframe that contains only the columns that have a numeric datatype.\n.corr() is a method that creates a correlation matrix between every column in a dataframe. For it to work, you need to make sure you only have numeric data in your dataframe, so chaining this method after the .select_dtypes() method will get you a complete correlation matrix in a single line of code!\n\n\n\n\nVisualize this correlation matrix using sns.heatmap().\nCustomize the heatmap by adding annotations and adjusting the colormap.\nCompare the insights from this heatmap with those from the pairplot. What additional information does each visualization provide?" + }, + { + "objectID": "course-materials/coding-colabs/7c_visualizations.html#conclusion", + "href": "course-materials/coding-colabs/7c_visualizations.html#conclusion", + "title": "Day 7: 🙌 Coding Colab", + "section": "Conclusion", + "text": "Conclusion\nYou’ve practiced using seaborn to explore a dataset through various visualization techniques. Often these visualizations can be very helpful at the start of a data exploration activity as they are fundamental to exploratory data analysis in Python. As such, they will be valuable as you continue to work with more complex datasets.\n\nEnd Coding Colab Session (Day 7)" + }, + { + "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#getting-started", + "href": "course-materials/interactive-sessions/2b_dictionaries.html#getting-started", + "title": "Interactive Session 2B", "section": "Getting Started", - "text": "Getting Started\nWe will be using the data stored in the csv at this url:\n\nurl = 'https://bit.ly/eds217-studentdata'\n\nTo get the most out of this session, please follow these guidelines:\n\nPrepare Your Environment:\n\nMake sure JupyterLab is up and running on your machine.\nOpen a new Jupyter notebook where you can write your own code as we go along.\nMake sure to name the notebook something informative so you can refer back to it later.\n\nSetup Your Notebook:\nBefore we begin the live coding session, please set up your Jupyter notebook with the following structure. This will help you organize your notes and code as we progress through the lesson.\n\nCreate a new Jupyter notebook.\nIn the first cell, create a markdown cell with the title of our session:\n\n# Basic Pandas Selection and Filtering\n\nBelow that, create markdown cells for each of the following topics we’ll cover. Leave empty code cells between each markdown cell where you’ll write your code during the session:\n\n# Introduction to Pandas Selection and Filtering\n\n## 1. Setup\n\n[Empty Code Cell]\n\n## 2. Basic Selection\n\n[Empty Code Cell]\n\n## 3. Filtering Based on Column Values\n\n### 3a. Single Condition Filtering\n\n[Empty Code Cell]\n\n### 3b. Multiple Conditions with Logical Operators\n\n[Empty Code Cell]\n\n### 3c. Using the filter command\n\n[ Emptyh Code Cell]\n\n## 4. Combining Selection and Filtering\n\n[Empty Code Cell]\n\n## 5. Using .isin() for Multiple Values\n\n[Empty Code Cell]\n\n## 6. Filtering with String Methods\n\n[Empty Code Cell]\n\n## 7. Advanced Selection: .loc vs .iloc\n\n[Empty Code Cell]\n\n## Conclusion\nAs we progress through the live coding session, you’ll fill in the code cells with the examples we work on together.\nFeel free to add additional markdown cells for your own notes or observations throughout the session.\n\nBy setting up your notebook this way, you’ll have a clear structure to follow along with the lesson and easily reference specific topics later for review. Remember, you can always add more cells or modify the structure as needed during the session!\n\nParticipation:\n\nTry to code along with me during the session.\nFeel free to ask questions at any time. Remember, if you have a question, others probably do too!\n\nResources:\n\nI will be sharing snippets of code and notes. Make sure to take your own notes and save snippets in your notebook for future reference.\nCheck out our class data selection and filtering cheatsheet." + "text": "Getting Started\nBefore we begin our interactive session, please follow these steps to set up your Jupyter Notebook:\n\nOpen JupyterLab and create a new notebook:\n\nClick on the + button in the top left corner\nSelect Python 3.10.0 from the Notebook options\n\nRename your notebook:\n\nRight-click on the Untitled.ipynb tab\nSelect “Rename”\nName your notebook with the format: Session_2B_Dictionaries.ipynb\n\nAdd a title cell:\n\nIn the first cell of your notebook, change the cell type to “Markdown”\nAdd the following content (replace the placeholders with the actual information):\n\n\n# Day 2: Session B - Dictionaries\n\n[Link to session webpage](https://eds-217-essential-python.github.io/course-materials/interactive-sessions/2b_dictionaries.html)\n\nDate: 09/04/2024\n\nAdd a code cell:\n\nBelow the title cell, add a new cell\nEnsure it’s set as a “Code” cell\nThis will be where you start writing your Python code for the session\n\nThroughout the session:\n\nTake notes in Markdown cells\nCopy or write code in Code cells\nRun cells to test your code\nAsk questions if you need clarification\n\n\n\n\n\n\n\n\nCaution\n\n\n\nRemember to save your work frequently by clicking the save icon or using the keyboard shortcut (Ctrl+S or Cmd+S).\n\n\nLet’s begin our interactive session!" }, { - "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html", - "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html", - "title": "Interactive Session 2C", - "section": "", - "text": "Objective:\nThis session aims to help you understand how to interpret error messages in Python. By generating errors in a controlled environment, you’ll learn how to read error reports, identify the source of the problem, and correct your code. This is an essential skill for debugging and improving your Python programming abilities.\nEstimated Time: 45-60 minutes" + "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#part-1-basic-concepts-with-species-lookup-table", + "href": "course-materials/interactive-sessions/2b_dictionaries.html#part-1-basic-concepts-with-species-lookup-table", + "title": "Interactive Session 2B", + "section": "Part 1: Basic Concepts with Species Lookup Table", + "text": "Part 1: Basic Concepts with Species Lookup Table\n\nIntroduction to Dictionaries\nDictionaries in Python are collections of key-value pairs that allow for efficient data storage and retrieval. Each key maps to a specific value, making dictionaries ideal for representing real-world data in a structured format.\nProbably the easiest mental model for thinking about structured data is a spreadsheet. You are all familiar with Excel spreadsheets, with their numbered rows and lettered columns. In the spreadsheet, data is often “structured” so that each row is an entry, and each column is perhaps a variable recorded for that entry.\n\n\n\nstructured-data" }, { - "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-1-introduction-to-python-errors", - "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-1-introduction-to-python-errors", - "title": "Interactive Session 2C", - "section": "Part 1: Introduction to Python Errors", - "text": "Part 1: Introduction to Python Errors\n\n1.1 Generating a Syntax Error\nIn Python, a syntax error occurs when the code you write doesn’t conform to the rules of the language.\n\nStep 1: Run the following code in a Jupyter notebook cell to generate a syntax error.\nprint(\"Hello World\nStep 2: Observe the error message. It should look something like this:\nFile \"<ipython-input-1>\", line 1\n print(\"Hello World\n ^\nSyntaxError: EOL while scanning string literal\nStep 3: Explanation: The error message indicates that the End Of Line (EOL) was reached while the string literal was still open. A string literal is what is created inside the open \" and close \". The caret (^) points to where Python expected the closing quote.\nStep 4: Fix the Error: Correct the code by adding the missing closing quotation mark.\nprint(\"Hello World\")" + "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#instructions", + "href": "course-materials/interactive-sessions/2b_dictionaries.html#instructions", + "title": "Interactive Session 2B", + "section": "Instructions", + "text": "Instructions\nWe will work through this material together, writing a new notebook as we go.\n\n\n\n\n\n\nNote\n\n\n\n🐍     This symbol designates an important note about Python structure, syntax, or another quirk.\n\n\n\n✏️   This symbol designates code you should add to your notebook and run." }, { - "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-2-name-errors-with-variables", - "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-2-name-errors-with-variables", - "title": "Interactive Session 2C", - "section": "Part 2: Name Errors with Variables", - "text": "Part 2: Name Errors with Variables\n\n2.1 Using an Undefined Variable\nA NameError occurs when you try to use a variable that hasn’t been defined.\n\nStep 1: Run the following code to generate a NameError.\nprint(variable)\nStep 2: Observe the error message.\nNameError: name 'variable' is not defined\nStep 3: Explanation: Python is telling you that the variable variable has not been defined. This means you are trying to use a variable that Python doesn’t recognize.\nStep 4: Fix the Error: Define the variable before using it.\nvariable = \"I'm now defined!\"\nprint(variable)\n\n\n\n\n\n\n\nCommon NameError patterns in Python\n\n\n\nA NameError often occurs when Python can’t find a variable or function you’re trying to use. This is usually because of:\n\nTypos in Function or Variable Names:\n\nIf you mistype a function or variable name, Python will raise a NameError because it doesn’t recognize the name.\nExample:\nprnt(\"Hello, World!\") # NameError: name 'prnt' is not defined\n\nFix: Correct the typo to print(\"Hello, World!\").\n\n\nUsing Literals as Variables:\n\nA NameError can also happen if you accidentally try to use a string or number as if it were a variable.\nExample:\n\"Hello\" = 5 # NameError: can't assign to literal\n\nFix: Make sure you’re using valid variable names and not trying to assign values to literals.\n\n\n\nRemember: Always double-check your spelling and ensure that you’re using variable names correctly!" + "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#dictionaries", + "href": "course-materials/interactive-sessions/2b_dictionaries.html#dictionaries", + "title": "Interactive Session 2B", + "section": "Dictionaries", + "text": "Dictionaries\n\nTLDR: Dictionaries are a very common collection type that allows data to be organized using a key:value framework. Because of the similarity between key:value pairs and many data structures (e.g. “lookup tables”), you will see Dictionaries quite a bit when working in python\n\nThe first collection we will look at today is the dictionary, or dict. This is one of the most powerful data structures in python. It is a mutable, unordered collection, which means that it can be altered, but elements within the structure cannot be referenced by their position and they cannot be sorted.\nYou can create a dictionary using the {}, providing both a key and a value, which are separated by a :." }, { - "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-3-type-errors-with-functions", - "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-3-type-errors-with-functions", - "title": "Interactive Session 2C", - "section": "Part 3: Type Errors with Functions", - "text": "Part 3: Type Errors with Functions\n\n3.1 Passing Incorrect Data Types\nA TypeError occurs when an operation or function is applied to an object of an inappropriate type.\n\nStep 1: Run the following code to generate a TypeError.\nnumber = 5\nprint(number + \"10\")\nStep 2: Observe the error message.\nTypeError: unsupported operand type(s) for +: 'int' and 'str'\nStep 3: Explanation: The error indicates that you are trying to add an integer (int) and a string (str), which is not allowed in Python.\nStep 4: Fix the Error: Convert the string \"10\" to an integer or the integer number to a string.\nprint(number + 10) # Correct approach 1\n\n# or\n\nprint(str(number) + \"10\") # Correct approach 2" + "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#creating-manipulating-dictionaries", + "href": "course-materials/interactive-sessions/2b_dictionaries.html#creating-manipulating-dictionaries", + "title": "Interactive Session 2B", + "section": "Creating & Manipulating Dictionaries", + "text": "Creating & Manipulating Dictionaries\nWe’ll start by creating a dictionary to store the common name of various species found in California’s coastal tidepools.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Define a dictionary with species data containing latin names and corresponding common names.\nspecies_dict = {\n \"P ochraceus\": \"Ochre sea star\",\n \"M californianus\": \"California mussel\",\n \"H rufescens\": \"Red abalone\"\n}\n\n\n\n\n\n\n\n\nNote\n\n\n\n🐍 <b>Note.</b> The use of whitespace and indentation is important in python. In the example above, the dictionary entries are indented relative to the brackets <code>{</code> and <code>}</code>. In addition, there is no space between the <code>'key'</code>, the <code>:</code>, and the <code>'value'</code> for each entry. Finally, notice that there is a <code>,</code> following each dictionary entry. This pattern is the same as all of the other <i>collection</i> data types we've seen so far, including <b>list</b>, <b>set</b>, and <b>tuple</b>.\n\n\n\nAccessing elements in a dictionary\nAccessing an element in a dictionary is easy if you know what you are looking for.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\nspecies_dict['M californianus']\n\n\n'California mussel'\n\n\n\n\nAdding a New Species\nBecause dictionaries are mutable, it is easy to add additional entries and doing so is straightforward. You specify the key and the value it maps to.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Adding a new entry for Leather star\nspecies_dict[\"D imbricata\"] = \"Leather star\"\n\n\n\n\nAccessing and Modifying Data\nAccessing data in a dictionary can be done directly by the key, and modifications are just as direct.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Accessing a species by its latin name\nprint(\"Common name for P ochraceus:\", species_dict[\"P ochraceus\"])\n\n\nCommon name for P ochraceus: Ochre sea star\n\n\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Updating the common name for Ochre Sea Star abalone\nspecies_dict[\"P ochraceus\"] = \"Purple Starfish\"\nprint(\"Updated data for Pisaster ochraceus:\", species_dict[\"P ochraceus\"])\n\n\nUpdated data for Pisaster ochraceus: Purple Starfish\n\n\n\n\nRemoving a Dictionary Element\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Removing \"P ochraceus\"\ndel species_dict[\"P ochraceus\"]\nprint(f\"Deleted data for Pisaster ochraceus, new dictionary: {species_dict}\")\n\n\nDeleted data for Pisaster ochraceus, new dictionary: {'M californianus': 'California mussel', 'H rufescens': 'Red abalone', 'D imbricata': 'Leather star'}" }, { - "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-4-index-errors-with-lists", - "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-4-index-errors-with-lists", - "title": "Interactive Session 2C", - "section": "Part 4: Index Errors with Lists", - "text": "Part 4: Index Errors with Lists\n\n4.1 Accessing an Invalid Index\nAn IndexError occurs when you try to access an index that is out of the range of a list.\n\nStep 1: Run the following code to generate an IndexError.\nmy_list = [1, 2, 3]\nprint(my_list[5])\nStep 2: Observe the error message.\nIndexError: list index out of range\nStep 3: Explanation: Python is telling you that the index 5 is out of range for the list my_list, which only has indices 0, 1, 2.\nStep 4: Fix the Error: Access a valid index or use dynamic methods to avoid hardcoding indices.\nprint(my_list[2]) # Last valid index\n\n# or\n\nprint(my_list[-1]) # Access the last element using negative indexing" + "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#accessing-dictionary-keys-and-values", + "href": "course-materials/interactive-sessions/2b_dictionaries.html#accessing-dictionary-keys-and-values", + "title": "Interactive Session 2B", + "section": "Accessing dictionary keys and values", + "text": "Accessing dictionary keys and values\nEvery dictionary has builtin methods to retrieve its keys and values. These functions are called, appropriately, keys() and values()\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Accessing the dictionary keys:\nlatin_names = species_dict.keys()\nprint(f\"Dictionary keys (latin names): {latin_names}\")\n\n\nDictionary keys (latin names): dict_keys(['M californianus', 'H rufescens', 'D imbricata'])\n\n\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Accessing the dictionary values\ncommon_names = species_dict.values()\nprint(f\"Dictionary values (common names): {common_names}\")\n\n\nDictionary values (common names): dict_values(['California mussel', 'Red abalone', 'Leather star'])\n\n\n\n\n\n\n\n\nNote\n\n\n\n🐍 Note. The keys() and values() functions return a dict_key object and dict_values object, respectively. Each of these objects contains a list of either the keys or values. You can force the result of the keys() or values() function into a list by wrapping either one in a list() command." }, { - "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-5-attribute-errors", - "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-5-attribute-errors", - "title": "Interactive Session 2C", - "section": "Part 5: Attribute Errors", - "text": "Part 5: Attribute Errors\n\n5.1 Using Attributes Incorrectly\nAn AttributeError occurs when you try to access an attribute or method that doesn’t exist on the object.\n\nStep 1: Run the following code to generate an AttributeError.\nmy_string = \"Hello\"\nmy_string.append(\" World\")\nStep 2: Observe the error message.\nAttributeError: 'str' object has no attribute 'append'\nStep 3: Explanation: Python is telling you that the str object (a string) does not have an append method, which is a method for lists.\nStep 4: Fix the Error: Use string concatenation instead of append.\nmy_string = \"Hello\"\nmy_string = my_string + \" World\"\nprint(my_string)" + "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#looping-through-dictionaries", + "href": "course-materials/interactive-sessions/2b_dictionaries.html#looping-through-dictionaries", + "title": "Interactive Session 2B", + "section": "Looping through Dictionaries ", + "text": "Looping through Dictionaries \nPython has an efficient way to loop through all the keys and values of a dictionary at the same time. The items() method returns a tuple containing a (key, value) for each element in a dictionary. In practice this means that we can loop through a dictionary in the following way:\n\n\nCode\nmy_dict = {'name': 'Homer Simpson',\n 'occupation': 'nuclear engineer',\n 'address': '742 Evergreen Terrace',\n 'city': 'Springfield',\n 'state': ' ? '\n }\n\nfor key, value in my_dict.items():\n print(f\"{key.capitalize()}: {value}.\")\n\n\nName: Homer Simpson.\nOccupation: nuclear engineer.\nAddress: 742 Evergreen Terrace.\nCity: Springfield.\nState: ? .\n\n\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\nAdd a new code cell and code to loop through the species_dict dictionary and print out a sentence providing the common name of each species (e.g. “The common name of M californianus” is…“)." }, { - "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-6-tracing-errors-through-a-function-call-stack", - "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-6-tracing-errors-through-a-function-call-stack", - "title": "Interactive Session 2C", - "section": "Part 6: Tracing Errors Through a Function Call Stack", - "text": "Part 6: Tracing Errors Through a Function Call Stack\n\n6.1 Understanding a Complicated Error Stack Trace\nErrors can sometimes appear deep within a function call, triggered by code that was written earlier in your script. When this happens, understanding the stack trace (the sequence of function calls leading to the error) is crucial for identifying the root cause. In this part of the exercise, you’ll explore an example where an error in a plotting function arises from an earlier mistake in your code.\n\nStep 1: Run the following code, which attempts to plot a simple line graph using Matplotlib.\n\nimport matplotlib.pyplot as plt\n\ndef generate_plot(data):\n plt.plot(data)\n plt.show()\n\n# Step 2: Introduce an error\nmy_data = [1, 2, \"three\", 4, 5] # Mixing strings and integers in the list\n\n# Step 3: Call the function to generate the plot\ngenerate_plot(my_data)\n\nStep 2: Observe the error message.\n\nFile \"<ipython-input-1>\", line 5, in generate_plot\n plt.plot(data)\n...\nFile \"/path/to/matplotlib/lines.py\", line XYZ, in _xy_from_xy\n raise ValueError(\"some explanation about incompatible types\")\nValueError: could not convert string to float: 'three'\n\nStep 3: Explanation: This error occurs because the plot function in Matplotlib expects numerical data to plot. The error message points to a deeper issue in the lines.py file inside the Matplotlib library, but the actual problem originates from your my_data list, which includes a string (“three”) instead of a numeric value.\nStep 4: Trace the Error:\n\nThe error originates in the plt.plot(data) function call.\nMatplotlib’s internal functions (_xy_from_xy in this case) try to process the data but encounter an issue when they can’t convert the string “three” into a float.\n\nStep 5: Fix the Error: Correct the data by ensuring all elements are numeric.\nmy_data = [1, 2, 3, 4, 5] # Correcting the list to contain only integers\ngenerate_plot(my_data) # Now this will work without an error" + "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#accessing-un-assigned-elements-in-dictionaries", + "href": "course-materials/interactive-sessions/2b_dictionaries.html#accessing-un-assigned-elements-in-dictionaries", + "title": "Interactive Session 2B", + "section": "Accessing un-assigned elements in Dictionaries", + "text": "Accessing un-assigned elements in Dictionaries\nAttempting to retrieve an element of a dictionary that doesn’t exist is the same as requesting an index of a list that doesn’t exist - Python will raise an Exception. For example, if you attempt to retrieve the definition of a field that hasn’t been defined, then you get an error:\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\nspecies_dict[\"E dofleini\"]\nYou should get a KeyError exception:\nKeyError: ‘E dofleini’\nTo avoid getting an error when requesting an element from a dict, you can use the get() function. The get() function will return None if the element doesn’t exist:\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\nspecies_description = species_dict.get(\"E dofleini\")\nprint(\"Accessing non-existent latin name, E dofleini:\\n\", species_description)\n\n\nAccessing non-existent latin name, E dofleini:\n None\n\n\nYou can also provide an argument to python to return if the item isn’t found:\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\nspecies_description = species_dict.get(\"E dofleini\", \"Species not found in dictionary\")\nprint(\"Accessing non-existent latin name, E dofleini:\\n\", species_description)\n\n\nAccessing non-existent latin name, E dofleini:\n Species not found in dictionary" }, { - "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-7-tracing-errors-in-jupyter-notebooks", - "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-7-tracing-errors-in-jupyter-notebooks", - "title": "Interactive Session 2C", - "section": "Part 7: Tracing Errors in Jupyter Notebooks", - "text": "Part 7: Tracing Errors in Jupyter Notebooks\nWhen you run code in a Jupyter Notebook, the Python interpreter refers to the code in the notebook cells as it generates a stack trace when an error occurs. Here’s how Jupyter Notebooks handle this:\n\nHow Jupyter Notebooks Generate Stack Traces\n\nCell Execution:\n\nEach time you run a cell in a Jupyter Notebook, the code in that cell is executed by the Python interpreter. The code from each cell is treated as part of a sequential script, but each cell is an individual execution block.\n\nInput Label:\n\nJupyter assigns each cell an input label, such as In [1]:, In [2]:, etc. This label is used to identify the specific cell where the code was executed.\n\nStack Trace Generation:\n\nWhen an error occurs, Python generates a stack trace that shows the sequence of function calls leading to the error. In a Jupyter Notebook, this stack trace includes references to the notebook cells that were executed.\nThe stack trace will point to the line number within the cell and the input label, such as In [2], indicating where in your notebook the error originated.\n\nExample Stack Trace in Jupyter:\n\nSuppose you have the following code in a cell labeled In [2]:\ndef divide(x, y):\n return x / y\n\ndivide(10, 0)\nRunning this code will generate a ZeroDivisionError, and the stack trace might look like this:\n---------------------------------------------------------------------------\nZeroDivisionError Traceback (most recent call last)\n<ipython-input-2-d7d8f8a6c1c1> in <module>\n 2 return x / y\n 3 \n----> 4 divide(10, 0)\n 5 \n\n<ipython-input-2-d7d8f8a6c1c1> in divide(x, y)\n 1 def divide(x, y):\n----> 2 return x / y\n 3 \n 4 divide(10, 0)\nExplanation:\n\nThe Traceback (most recent call last) shows the series of calls leading to the error.\nThe <ipython-input-2-d7d8f8a6c1c1> refers to the code in cell In [2].\nThe stack trace pinpoints the exact line where the error occurred within that cell.\n\n\nMultiple Cell References:\n\nIf your code calls functions defined in different cells, the stack trace will show references to multiple cells. For example, if a function is defined in one cell and then called in another, the stack trace will include both cells in the sequence of calls.\n\nLimitations:\n\nThe stack trace in Jupyter Notebooks is specific to the cells that have been executed. If you modify a cell and re-run it, the new code is associated with that cell’s input label, and previous stack traces will not reflect those changes.\n\n\n\n\nSummary:\nIn Jupyter Notebooks, stack traces refer to the specific cells (In [X]) where the code was executed. The stack trace will show you the input label of the cell and the line number where the error occurred, helping you to quickly locate and fix issues in your notebook. Understanding how Jupyter references your code in stack traces is crucial for effective debugging.\n\n\nGeneral Summary of Stack Traces\n\nWhat to Look For: In complex stack traces, start by looking at the error message itself, which often appears at the bottom of the stack. Work your way backward through the stack to identify where in your code the problem originated.\nTracing Function Calls: Understand how data flows through your functions. An error in a deeply nested function may often be triggered by an incorrect input or state set earlier in the code." + "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#summary-and-additional-resources", + "href": "course-materials/interactive-sessions/2b_dictionaries.html#summary-and-additional-resources", + "title": "Interactive Session 2B", + "section": "Summary and Additional Resources", + "text": "Summary and Additional Resources\nWe’ve explored the creation, modification, and application of dictionaries in Python, highlighting their utility in storing structured data. As you progress in Python, you’ll find dictionaries indispensable across various applications, from data analysis to machine learning.\nFor further study, consult the following resources: - Python’s Official Documentation on Dictionaries - Our class Dictionary Cheatsheet\n\n\nEnd interactive session 2B" }, { - "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#error-summary", - "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#error-summary", - "title": "Interactive Session 2C", - "section": "Error Summary", - "text": "Error Summary\n\nAlways read the error message carefully; it usually points directly to the problem.\nSyntaxErrors - and, to a lesser extent, NameErrors are often due to small mistakes like typos, missing parentheses, or missing quotes.\nTypeErrors often occur when trying to perform operations on incompatible data types.\nAttributeErrors occur when you are trying to use a method that doesn’t exist for an object. These can also show up due to typos in your code that make the interpreter think you are trying to call a method.\nWhile every error type has a specific meaning, always check your code for typos when trying to debug an error. Many typos do not prevent the interpreter from running your code and the eventual error caused by a typo might be hard to interpret!\n\n\nBy the end of this session, you should feel more comfortable identifying and fixing common Python errors. This skill is critical for debugging and developing more complex programs in the future.\n\n\nEnd interactive session 2C" + "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html", + "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html", + "title": "Day 6: 🙌 Coding Colab", + "section": "", + "text": "In this coding colab, you’ll analyze global temperature anomalies and CO2 concentration data. You’ll practice data manipulation, joining datasets, time series analysis, and visualization techniques." }, { - "objectID": "course-materials/live-coding/3a_control_flows.html#overview", - "href": "course-materials/live-coding/3a_control_flows.html#overview", - "title": "Live Coding Session 3A", - "section": "Overview", - "text": "Overview\nIn this session, we will be exploring Control Flows - if-elif, for, while and other ways of altering the flow of code execution. Live coding is a great way to learn programming as it allows you to see the process of writing code in real-time, including how to deal with unexpected issues and debug errors." + "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#introduction", + "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#introduction", + "title": "Day 6: 🙌 Coding Colab", + "section": "", + "text": "In this coding colab, you’ll analyze global temperature anomalies and CO2 concentration data. You’ll practice data manipulation, joining datasets, time series analysis, and visualization techniques." }, { - "objectID": "course-materials/live-coding/3a_control_flows.html#objectives", - "href": "course-materials/live-coding/3a_control_flows.html#objectives", - "title": "Live Coding Session 3A", - "section": "Objectives", - "text": "Objectives\n\nUnderstand the fundamentals of flow control in Python.\nApply if-elif-else constructions in practical examples.\nUse for and while loops to iterate through collections.\nDevelop the ability to troubleshoot and debug in a live setting." + "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#learning-objectives", + "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#learning-objectives", + "title": "Day 6: 🙌 Coding Colab", + "section": "Learning Objectives", + "text": "Learning Objectives\nBy the end of this colab, you will be able to:\n\nLoad and preprocess time series data\nJoin datasets based on datetime indices\nPerform basic time series analysis and resampling\nApply data manipulation techniques to extract insights from environmental datasets" }, { - "objectID": "course-materials/live-coding/3a_control_flows.html#getting-started", - "href": "course-materials/live-coding/3a_control_flows.html#getting-started", - "title": "Live Coding Session 3A", - "section": "Getting Started", - "text": "Getting Started\nBefore we begin our interactive session, please follow these steps to set up your Jupyter Notebook:\n\nOpen JupyterLab and create a new notebook:\n\nClick on the + button in the top left corner\nSelect Python 3.10.0 from the Notebook options\n\nRename your notebook:\n\nRight-click on the Untitled.ipynb tab\nSelect “Rename”\nName your notebook with the format: Session_XY_Topic.ipynb (Replace X with the day number and Y with the session number)\n\nAdd a title cell:\n\nIn the first cell of your notebook, change the cell type to “Markdown”\nAdd the following content (replace the placeholders with the actual information):\n\n\n# Day X: Session Y - [Session Topic]\n\n[Link to session webpage]\n\nDate: [Current Date]\n\nSet up your notebook:\nPlease set up your Jupyter Notebook with the following structure. We’ll fill in the content together during our session.\n\n ### Introduction to Control Flows\n\n <insert code cell below> \n\n ### Conditionals\n\n #### Basic If Statement\n\n <insert code cell below> \n \n #### Adding Else\n\n <insert code cell below> \n \n #### Using Elif\n\n <insert code cell below> \n \n ### Loops\n\n #### For Loops\n\n <insert code cell below> \n \n #### While Loops\n\n <insert code cell below> \n \n ### Applying Control Flows in Data Science\n\n <insert code cell below> \n \n ### Conclusion\n\n <insert code cell below> \n \n\n\n\n\n\n\nCaution\n\n\n\nDon’t forget to save your work frequently by clicking the save icon or using the keyboard shortcut (Ctrl+S or Cmd+S).\n\n\nRemember, we’ll be coding together, so don’t worry about filling in the content now. Just set up the structure, and we’ll dive into the details during our session!\n\nParticipation:\n\nTry to code along with me during the session.\nFeel free to ask questions at any time. Remember, if you have a question, others probably do too!\n\nResources:\n\nI will be sharing snippets of code and notes. Make sure to take your own notes and save snippets in your notebook for future reference.\nCheck out our class control flows cheatsheet." + "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#setup", + "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#setup", + "title": "Day 6: 🙌 Coding Colab", + "section": "Setup", + "text": "Setup\nLet’s start by importing necessary libraries and loading our datasets:\n\n\nCode\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\n\n# Load the temperature anomaly dataset\ntemp_url = \"https://bit.ly/monthly_temp\"\ntemp_df = pd.read_csv(temp_url, parse_dates=['Date'])\n\n# Load the CO2 concentration dataset\nco2_url = \"https://bit.ly/monthly_CO2\"\nco2_df = pd.read_csv(co2_url, parse_dates=['Date'])\n\nprint(\"Temperature data:\")\nprint(temp_df.head())\nprint(\"\\nCO2 data:\")\nprint(co2_df.head())\n\n\nTemperature data:\n Date MonthlyAnomaly\n0 1880-01-01 -0.20\n1 1880-02-01 -0.25\n2 1880-03-01 -0.09\n3 1880-04-01 -0.16\n4 1880-05-01 -0.09\n\nCO2 data:\n Date CO2Concentration\n0 1958-04-01 317.45\n1 1958-05-01 317.51\n2 1958-06-01 317.27\n3 1958-07-01 315.87\n4 1958-08-01 314.93" }, { - "objectID": "course-materials/live-coding/3a_control_flows.html#session-format", - "href": "course-materials/live-coding/3a_control_flows.html#session-format", - "title": "Live Coding Session 3A", - "section": "Session Format", - "text": "Session Format\n\nIntroduction\n\nA brief discussion about the topic in Python programming and its importance in data science.\n\n\n\nDemonstration\n\nI will demonstrate code examples live. Follow along and write the code into your own Jupyter notebook.\n\n\n\nPractice\n\nYou will have the opportunity to try exercises on your own to apply what you’ve learned.\n\n\n\nQ&A\n\nWe will have a Q&A session at the end where you can ask specific questions about the code, concepts, or issues encountered during the session." + "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-1-data-preparation", + "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-1-data-preparation", + "title": "Day 6: 🙌 Coding Colab", + "section": "Task 1: Data Preparation", + "text": "Task 1: Data Preparation\n\nSet the ‘Date’ column as the index for both dataframes.\nEnsure that there are no missing values in either dataset." }, { - "objectID": "course-materials/live-coding/3a_control_flows.html#after-the-session", - "href": "course-materials/live-coding/3a_control_flows.html#after-the-session", - "title": "Live Coding Session 3A", - "section": "After the Session", - "text": "After the Session\n\nReview your notes and try to replicate the exercises on your own.\nExperiment with the code by modifying parameters or adding new features to deepen your understanding.\nCheck out our class flow control cheatsheet." + "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-2-joining-datasets", + "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-2-joining-datasets", + "title": "Day 6: 🙌 Coding Colab", + "section": "Task 2: Joining Datasets", + "text": "Task 2: Joining Datasets\n\nMerge the temperature and CO2 datasets based on their date index.\nHandle any missing values that may have been introduced by the merge.\nCreate some plots showing temperature anomalies and CO2 concentrations over time using pandas built-in plotting functions." }, { - "objectID": "course-materials/lectures/data_types.html#types-of-data-in-python", - "href": "course-materials/lectures/data_types.html#types-of-data-in-python", - "title": "Basic Data Types in Python", - "section": "Types of Data in Python", - "text": "Types of Data in Python\nPython categorizes data into two main types:\n\nValues: Singular items like numbers or strings.\nCollections: Groupings of values, like lists or dictionaries.\n\nMutable vs Immutable\n\nMutable: Objects whose content can be changed after creation.\nImmutable: Objects that cannot be altered after they are created." + "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-3-time-series-analysis", + "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-3-time-series-analysis", + "title": "Day 6: 🙌 Coding Colab", + "section": "Task 3: Time Series Analysis", + "text": "Task 3: Time Series Analysis\n\nResample the data to annual averages.\nCalculate the year-over-year change in temperature anomalies and CO2 concentrations.\nCreate a scatter plot (use the plt.scatter() function) of annual temperature anomalies vs CO2 concentrations." }, { - "objectID": "course-materials/lectures/data_types.html#overview-of-main-data-types", - "href": "course-materials/lectures/data_types.html#overview-of-main-data-types", - "title": "Basic Data Types in Python", - "section": "Overview of Main Data Types", - "text": "Overview of Main Data Types\n\n\n\n\n\n\n\n\nCategory\nMutable\nImmutable\n\n\n\n\nValues\n-\nint, float, complex, str\n\n\nCollections\nlist, dict, set, bytearray\ntuple, frozenset" + "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-4-seasonal-analysis", + "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-4-seasonal-analysis", + "title": "Day 6: 🙌 Coding Colab", + "section": "Task 4: Seasonal Analysis", + "text": "Task 4: Seasonal Analysis\n\nCreate a function to extract the season from a given date (hint: use the date.month attribute and if-elif-else to assign the season in your function).\nUse the function to create a new column called Season\nCalculate the average temperature anomaly and CO2 concentration for each season.\nCreate a box plot (use sns.boxplot) showing the distribution of temperature anomalies for each season." }, { - "objectID": "course-materials/lectures/data_types.html#numeric-types", - "href": "course-materials/lectures/data_types.html#numeric-types", - "title": "Basic Data Types in Python", - "section": "Numeric Types", - "text": "Numeric Types\nIntegers (int)\n\nUse: Counting, indexing, and more.\nConstruction: x = 5\nImmutable: Cannot change the value of x without creating a new int." + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#introduction", + "href": "course-materials/lectures/05_Drop_or_Impute.html#introduction", + "title": "EDS 217 - Lecture", + "section": "Introduction", + "text": "Introduction\n\nData cleaning is crucial in data analysis\nMissing data is a common challenge\nTwo main approaches:\n\nDropping missing data\nImputation\n\nUnderstanding the nature of missingness is key" + }, + { + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#types-of-missing-data", + "href": "course-materials/lectures/05_Drop_or_Impute.html#types-of-missing-data", + "title": "EDS 217 - Lecture", + "section": "Types of Missing Data", + "text": "Types of Missing Data\n\n\nMissing Completely at Random (MCAR)\nMissing at Random (MAR)\nMissing Not at Random (MNAR)" + }, + { + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#missing-completely-at-random-mcar", + "href": "course-materials/lectures/05_Drop_or_Impute.html#missing-completely-at-random-mcar", + "title": "EDS 217 - Lecture", + "section": "Missing Completely at Random (MCAR)", + "text": "Missing Completely at Random (MCAR)\n\nNo relationship between missingness and any values\nExample: Survey responses lost due to a computer glitch\nLeast problematic type of missing data\nDropping MCAR data is generally safe but reduces sample size" }, { - "objectID": "course-materials/lectures/data_types.html#numeric-types-continued", - "href": "course-materials/lectures/data_types.html#numeric-types-continued", - "title": "Basic Data Types in Python", - "section": "Numeric Types (continued)", - "text": "Numeric Types (continued)\nFloating-Point Numbers (float)\n\nUse: Representing real numbers for measurements, fractions, etc.\nConstruction: y = 3.14\nImmutable: Like integers, any change creates a new float." + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#mcar-example-assigning-nan-randomly", + "href": "course-materials/lectures/05_Drop_or_Impute.html#mcar-example-assigning-nan-randomly", + "title": "EDS 217 - Lecture", + "section": "MCAR Example (Assigning nan randomly)", + "text": "MCAR Example (Assigning nan randomly)\n\nimport pandas as pd\nimport numpy as np\n\n# Create sample data with MCAR\nnp.random.seed(42)\ndf = pd.DataFrame({'A': np.random.rand(100), 'B': np.random.rand(100)})\ndf.loc[np.random.choice(df.index, 10, replace=False), 'A'] = np.nan\nprint(df.isnull().sum())\n\nA 10\nB 0\ndtype: int64" }, { - "objectID": "course-materials/lectures/data_types.html#text-type", - "href": "course-materials/lectures/data_types.html#text-type", - "title": "Basic Data Types in Python", - "section": "Text Type", - "text": "Text Type\nStrings (str)\n\nUse: Handling textual data.\nConstruction: s = \"Data Science\"\nImmutable: Modifying s requires creating a new string." + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#missing-at-random-mar", + "href": "course-materials/lectures/05_Drop_or_Impute.html#missing-at-random-mar", + "title": "EDS 217 - Lecture", + "section": "Missing at Random (MAR)", + "text": "Missing at Random (MAR)\n\nMissingness is related to other observed variables\nExample: Older participants more likely to skip income questions\nMore common in real-world datasets\nDropping MAR data can introduce bias" }, { - "objectID": "course-materials/lectures/data_types.html#sequence-types", - "href": "course-materials/lectures/data_types.html#sequence-types", - "title": "Basic Data Types in Python", - "section": "Sequence Types", - "text": "Sequence Types\nLists (list)\n\nUse: Storing an ordered collection of items.\nConstruction: my_list = [1, 2, 3]\nMutable: Items can be added, removed, or changed." + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#mar-example-assigning-nan-randomly-filtered-on-column-value", + "href": "course-materials/lectures/05_Drop_or_Impute.html#mar-example-assigning-nan-randomly-filtered-on-column-value", + "title": "EDS 217 - Lecture", + "section": "MAR Example (Assigning nan randomly, filtered on column value)", + "text": "MAR Example (Assigning nan randomly, filtered on column value)\n\n# Create sample data with MAR\nnp.random.seed(42)\ndf = pd.DataFrame({\n 'Age': np.random.randint(18, 80, 100),\n 'Income': np.random.randint(20000, 100000, 100)\n})\ndf.loc[df['Age'] > 60, 'Income'] = np.where(\n np.random.rand(len(df[df['Age'] > 60])) < 0.3, \n np.nan, \n df.loc[df['Age'] > 60, 'Income']\n)\nprint(df[df['Age'] > 60]['Income'].isnull().sum() / len(df[df['Age'] > 60]))\n\n0.2972972972972973" }, { - "objectID": "course-materials/lectures/data_types.html#sequence-types-continued", - "href": "course-materials/lectures/data_types.html#sequence-types-continued", - "title": "Basic Data Types in Python", - "section": "Sequence Types (continued)", - "text": "Sequence Types (continued)\nTuples (tuple)\n\nUse: Immutable lists. Often used where a fixed, unchangeable sequence is needed.\nConstruction: my_tuple = (1, 2, 3)\nImmutable: Cannot alter the contents once created." + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#missing-not-at-random-mnar", + "href": "course-materials/lectures/05_Drop_or_Impute.html#missing-not-at-random-mnar", + "title": "EDS 217 - Lecture", + "section": "Missing Not at Random (MNAR)", + "text": "Missing Not at Random (MNAR)\n\nMissingness is related to the missing values themselves\nExample: People with high incomes more likely to skip income questions\nMost problematic type of missing data\nNeither dropping nor simple imputation may be appropriate" }, { - "objectID": "course-materials/lectures/data_types.html#set-types", - "href": "course-materials/lectures/data_types.html#set-types", - "title": "Basic Data Types in Python", - "section": "Set Types", - "text": "Set Types\nSets (set)\n\nUse: Unique collection of items, great for membership testing, removing duplicates.\nConstruction: my_set = {1, 2, 3}\nMutable: Can add or remove items." + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#dropping-missing-data", + "href": "course-materials/lectures/05_Drop_or_Impute.html#dropping-missing-data", + "title": "EDS 217 - Lecture", + "section": "Dropping Missing Data", + "text": "Dropping Missing Data\nPros:\n\n\nSimple and quick\nMaintains the distribution of complete cases\nAppropriate for MCAR data\n\n\nCons:\n\n\nReduces sample size\nCan introduce bias for MAR or MNAR data\nMay lose important information" }, { - "objectID": "course-materials/lectures/data_types.html#set-types-continued", - "href": "course-materials/lectures/data_types.html#set-types-continued", - "title": "Basic Data Types in Python", - "section": "Set Types (continued)", - "text": "Set Types (continued)\nFrozen Sets (frozenset)\n\nUse: Immutable version of sets.\nConstruction: my_frozenset = frozenset([1, 2, 3])\nImmutable: Safe for use as dictionary keys." + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#drop-example", + "href": "course-materials/lectures/05_Drop_or_Impute.html#drop-example", + "title": "EDS 217 - Lecture", + "section": "Drop Example", + "text": "Drop Example\n\n# Dropping missing data\ndf_dropped = df.dropna()\nprint(f\"Original shape: {df.shape}, After dropping: {df_dropped.shape}\")\n\nOriginal shape: (100, 2), After dropping: (89, 2)" }, { - "objectID": "course-materials/lectures/data_types.html#mapping-types", - "href": "course-materials/lectures/data_types.html#mapping-types", - "title": "Basic Data Types in Python", - "section": "Mapping Types", - "text": "Mapping Types\nDictionaries (dict)\n\nUse: Key-value pairs for fast lookup and data management.\nConstruction: my_dict = {'key': 'value'}\nMutable: Add, remove, or change associations." + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#imputation", + "href": "course-materials/lectures/05_Drop_or_Impute.html#imputation", + "title": "EDS 217 - Lecture", + "section": "Imputation", + "text": "Imputation\nPros:\n\n\nPreserves sample size\nCan reduce bias for MAR data\nAllows use of all available information\n\n\nCons:\n\n\nCan introduce bias if done incorrectly\nMay underestimate variability\nCan be computationally intensive for complex methods" }, { - "objectID": "course-materials/lectures/data_types.html#conclusion", - "href": "course-materials/lectures/data_types.html#conclusion", - "title": "Basic Data Types in Python", - "section": "Conclusion", - "text": "Conclusion\nUnderstanding these basic types is crucial for data handling and manipulation in Python, especially in data science where the type of data dictates the analysis technique. As we move into more advanced Python we will get to know more complex data types.\nFor more information, you can always refer to the Python official documentation." + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#imputation-example", + "href": "course-materials/lectures/05_Drop_or_Impute.html#imputation-example", + "title": "EDS 217 - Lecture", + "section": "Imputation Example", + "text": "Imputation Example\n\n# Simple mean imputation\ndf_imputed = df.fillna(df.mean())\nprint(f\"Original missing: {df['Income'].isnull().sum()}, After imputation: {df_imputed['Income'].isnull().sum()}\")\n\nOriginal missing: 11, After imputation: 0" }, { - "objectID": "course-materials/interactive-sessions/1a_iPython_JupyterLab.html", - "href": "course-materials/interactive-sessions/1a_iPython_JupyterLab.html", - "title": "Interactive Session 1A", - "section": "", - "text": "This is a short exercise to introduce you to the IPython REPL within a JupyterLab session hosted on a Posit Workbench server. This exercise will help you become familiar with the interactive environment we will use throughout the class (and throughout your time in the MEDS program) as well as an introduction to some basic Python operations.\n\n\nExercise: Introduction to IPython REPL in JupyterLab\nObjective: Learn how to use the IPython REPL in JupyterLab for basic Python programming and explore some interactive features.\n\n\n\n\n\n\nNote\n\n\n\nThe Read-Eval-Print Loop (REPL) is an interactive programming environment that allows users to execute Python code line-by-line, providing immediate feedback and facilitating rapid testing, debugging, and learning.\n\n\n\n\nStep 1: Access JupyterLab\n\nLog in to the Posit Workbench Server\n\nOpen a web browser and go to workbench-1.bren.ucsb.edu.\nEnter your login credentials to access the server.\n\nSelect JupyterLab\n\nOnce logged in, click on the “New Session” button, and select “JupyterLab” from the list of options\n\n\nStart JupyterLab Session\n\nClick the “Start Session” button in the lower right of the modal window. You don’t need to edit either the Session Name or Cluster.\n\nWait for the Session to Launch\n\nYour browser will auto join the session as soon as the server starts it up.\n\n\n\n\n\nStep 2: Open the IPython REPL\nWhen the session launches you should see an interface that looks like this:\n\n\nStart a Terminal\n\nSelect “Terminal” from the list of available options in the Launcher pane. This will open a new terminal tab.\n\nLaunch IPython\n\nIn the terminal, type ipython and press Enter to start the IPython REPL.\n\n\n\n\n\nStep 3: Basic IPython Commands\nIn the IPython REPL, try the following commands to get familiar with the environment:\n\nBasic Arithmetic Operations\n\nCalculate the sum of two numbers:\n3 + 5\nMultiply two numbers:\n4 * 7\nDivide two numbers:\n10 / 2\n\nVariable Assignment\n\nAssign a value to a variable and use it in a calculation:\nx = 10\ny = 5\nresult = x * y\nresult\n\nBuilt-in Functions\n\nUse a built-in function to find the maximum of a list of numbers:\nnumbers = [3, 9, 1, 6, 2]\nmax(numbers)\n\nInteractive Help\n\nUse the help() function to get more information about a built-in function:\nhelp(print)\nUse the ? to get a quick description of an object or function:\nlen?\n\n\n\n\n\nStep 4: Explore IPython Features\n\nTab Completion\n\nStart typing a command or variable name and press Tab to auto-complete or view suggestions:\nnum # Press Tab here\n\nMagic Commands\n\nUse the %timeit magic command to time the execution of a statement:\n%timeit sum(range(1000))\n\nHistory\n\nView the command history using the %history magic command:\n%history\n\nClear the Console\n\nClear the current console session with:\n%clear\n\n\n\n\n\nStep 5: Exit the IPython REPL\n\nTo exit the IPython REPL, type exit() or press Ctrl+D.\n\n\n\n\nWrap-Up\nCongratulations! You have completed the introduction to the IPython REPL in JupyterLab. You learned how to perform basic operations, use interactive help, explore magic commands, and utilize IPython features.\nFeel free to explore more IPython functionalities or ask questions if you need further assistance.\n\n\nEnd interactive session 1A" + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#imputation-methods", + "href": "course-materials/lectures/05_Drop_or_Impute.html#imputation-methods", + "title": "EDS 217 - Lecture", + "section": "Imputation Methods", + "text": "Imputation Methods\n\nSimple imputation:\n\nMean, median, mode\nLast observation carried forward (LOCF)\n\nAdvanced imputation:\n\nMultiple Imputation\nK-Nearest Neighbors (KNN)\nRegression imputation" }, { - "objectID": "course-materials/day8.html#class-materials", - "href": "course-materials/day8.html#class-materials", - "title": "Building a Python Data Science Workflow", - "section": "Class materials", - "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 8 / morning\nWorking on Final Data Science Project (all day)\n\n\n\nday 8 / afternoon" + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#best-practices", + "href": "course-materials/lectures/05_Drop_or_Impute.html#best-practices", + "title": "EDS 217 - Lecture", + "section": "Best Practices", + "text": "Best Practices\n\n\nUnderstand your data and the missingness mechanism\nVisualize patterns of missingness\nConsider the impact on your analysis\nUse appropriate methods based on the type of missingness\nConduct sensitivity analyses\nDocument your approach and assumptions" }, { - "objectID": "course-materials/day8.html#end-of-day-practice", - "href": "course-materials/day8.html#end-of-day-practice", - "title": "Building a Python Data Science Workflow", - "section": "End-of-day practice", - "text": "End-of-day practice\nThere are no additional end-of-day tasks / activities today!" + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#conclusion", + "href": "course-materials/lectures/05_Drop_or_Impute.html#conclusion", + "title": "EDS 217 - Lecture", + "section": "Conclusion", + "text": "Conclusion\n\nUnderstanding the nature of missingness is crucial\nBoth dropping and imputation have pros and cons\nChoose the appropriate method based on:\n\nType of missingness (MCAR, MAR, MNAR)\nSample size\nAnalysis goals\n\nAlways document your approach and conduct sensitivity analyses" }, { - "objectID": "course-materials/day8.html#additional-resources", - "href": "course-materials/day8.html#additional-resources", - "title": "Building a Python Data Science Workflow", - "section": "Additional Resources", - "text": "Additional Resources" + "objectID": "course-materials/lectures/05_Drop_or_Impute.html#questions", + "href": "course-materials/lectures/05_Drop_or_Impute.html#questions", + "title": "EDS 217 - Lecture", + "section": "Questions?", + "text": "Questions?\nThank you for your attention!" }, { - "objectID": "course-materials/day6.html#class-materials", - "href": "course-materials/day6.html#class-materials", - "title": "Data Handling and Visualization, Day 1", - "section": "Class materials", - "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 6 / morning\n🐼 Grouping, joining, sorting, and applying\n🙌 Coding Colab: Data Manipulation\n\n\nday 6 / afternoon\n🐼 Working with dates\nEnd-of-day practice" + "objectID": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html", + "href": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html", + "title": "Day 2: 🙌 Coding Colab", + "section": "", + "text": "Before we begin, here are quick links to our course cheatsheets. These may be helpful during the exercise:\n\nPython Basics Cheatsheet\nList Cheatsheet\nDictionaries Cheatsheet\nSets Cheatsheet\n\nFeel free to refer to these cheatsheets throughout the exercise if you need a quick reminder about syntax or functionality." }, { - "objectID": "course-materials/day6.html#end-of-day-practice", - "href": "course-materials/day6.html#end-of-day-practice", - "title": "Data Handling and Visualization, Day 1", - "section": "End-of-day practice", - "text": "End-of-day practice\nComplete the following tasks / activities before heading home for the day!\n\n Day 6 Practice: 🕺 Eurovision Data Science 💃" + "objectID": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#quick-references", + "href": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#quick-references", + "title": "Day 2: 🙌 Coding Colab", + "section": "", + "text": "Before we begin, here are quick links to our course cheatsheets. These may be helpful during the exercise:\n\nPython Basics Cheatsheet\nList Cheatsheet\nDictionaries Cheatsheet\nSets Cheatsheet\n\nFeel free to refer to these cheatsheets throughout the exercise if you need a quick reminder about syntax or functionality." }, { - "objectID": "course-materials/day6.html#additional-resources", - "href": "course-materials/day6.html#additional-resources", - "title": "Data Handling and Visualization, Day 1", - "section": "Additional Resources", - "text": "Additional Resources" + "objectID": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#introduction-to-paired-programming-5-minutes", + "href": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#introduction-to-paired-programming-5-minutes", + "title": "Day 2: 🙌 Coding Colab", + "section": "Introduction to Paired Programming (5 minutes)", + "text": "Introduction to Paired Programming (5 minutes)\nWelcome to today’s Coding Colab! In this session, you’ll be working in pairs to explore and reinforce your understanding of lists and dictionaries, while also discovering the unique features of sets.\n\nBenefits of Paired Programming\n\nKnowledge sharing: Learn from each other’s experiences and approaches.\nImproved code quality: Catch errors earlier with two sets of eyes.\nEnhanced problem-solving: Discuss ideas for more creative solutions.\nSkill development: Improve communication and teamwork skills.\n\n\n\nHow to Make the Most of Paired Programming\n\nAssign roles: One person is the “driver” (typing), the other is the “navigator” (reviewing).\nSwitch roles regularly: Swap every 10-15 minutes to stay engaged.\nCommunicate clearly: Explain your thought process and ask questions.\nBe open to ideas: Listen to your partner’s suggestions.\nStay focused: Keep the conversation relevant to the task." }, { - "objectID": "course-materials/day4.html#class-materials", - "href": "course-materials/day4.html#class-materials", - "title": "Working with DataFrames in Pandas", - "section": "Class materials", - "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 4 / morning\n🐼 Intro to DataFrames\n🙌 Coding Colab: Working with DataFrames\n\n\nday 4 / afternoon\n🐼 DataFrame Workflows\n📝 Data Import/Export" + "objectID": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#exercise-overview", + "href": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#exercise-overview", + "title": "Day 2: 🙌 Coding Colab", + "section": "Exercise Overview", + "text": "Exercise Overview\nThis Coding Colab will reinforce your understanding of lists and dictionaries while introducing you to sets. You’ll work through a series of tasks, discussing and implementing solutions together." }, { - "objectID": "course-materials/day4.html#end-of-day-practice", - "href": "course-materials/day4.html#end-of-day-practice", - "title": "Working with DataFrames in Pandas", - "section": "End-of-day practice", - "text": "End-of-day practice\nComplete the following tasks / activities before heading home for the day!\n\n Day 4 Practice: Reading, Visualizing, and Exporting Data in Pandas" + "objectID": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#part-1-lists-and-dictionaries-review-15-minutes", + "href": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#part-1-lists-and-dictionaries-review-15-minutes", + "title": "Day 2: 🙌 Coding Colab", + "section": "Part 1: Lists and Dictionaries Review (15 minutes)", + "text": "Part 1: Lists and Dictionaries Review (15 minutes)\n\nTask 1: List Operations\nCreate a list of your favorite fruits and perform the following operations:\n\nCreate a list called fruits with at least 3 fruit names.\nAdd a new fruit to the end of the list.\nRemove the second fruit from the list.\nPrint the final list.\n\n\n\nTask 2: Dictionary Operations\nCreate a dictionary representing a simple inventory system:\n\nCreate a dictionary called inventory with at least 3 items and their quantities.\nAdd a new item to the inventory.\nUpdate the quantity of an existing item.\nPrint the final inventory." }, { - "objectID": "course-materials/day4.html#additional-resources", - "href": "course-materials/day4.html#additional-resources", - "title": "Working with DataFrames in Pandas", - "section": "Additional Resources", - "text": "Additional Resources" + "objectID": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#part-2-introducing-sets-15-minutes", + "href": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#part-2-introducing-sets-15-minutes", + "title": "Day 2: 🙌 Coding Colab", + "section": "Part 2: Introducing Sets (15 minutes)", + "text": "Part 2: Introducing Sets (15 minutes)\nHere’s a Sets Cheatsheet. Sets are a lot like lists, so take a look at the cheatsheet to see how they are created and manipulated!\n\nTask 3: Creating and Manipulating Sets\n\nCreate two sets: evens with even numbers from 2 to 10, and odds with odd numbers from 1 to 9.\nPrint both sets.\nFind and print the union of the two sets.\nFind and print the intersection of the two sets.\nAdd a new element to the evens set.\n\n\n\nTask 4: Combining Set Operations and List Operations\nUsing a set is a great way to remove duplicates in a list.\n\nCreate a list with some duplicates: numbers = [1, 2, 2, 3, 3, 3, 4, 4, 5]\nUse a set to remove duplicates.\nCreate a new list from the set.\nPrint the new list without duplicates" }, { - "objectID": "course-materials/day2.html#class-materials", - "href": "course-materials/day2.html#class-materials", - "title": "Python Data Collections", - "section": "Class materials", - "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 2 / morning\n🐍 Lists\n🐍 Dictionaries\n\n\nday 2 / afternoon\n🙌 Working with Lists, Dictionaries, and Sets\n📝 List and Dictionary Comprehensions" + "objectID": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#conclusion-and-discussion-10-minutes", + "href": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#conclusion-and-discussion-10-minutes", + "title": "Day 2: 🙌 Coding Colab", + "section": "Conclusion and Discussion (10 minutes)", + "text": "Conclusion and Discussion (10 minutes)\nAs a pair, discuss the following questions:\n\nWhat are the main differences between lists, dictionaries, and sets?\nIn what situations would you prefer to use a set over a list or dictionary?\nHow did working in pairs help you understand these concepts better?" }, { - "objectID": "course-materials/day2.html#end-of-day-practice", - "href": "course-materials/day2.html#end-of-day-practice", - "title": "Python Data Collections", - "section": "End-of-day practice", - "text": "End-of-day practice\nComplete the following tasks / activities before heading home for the day!\n\n Day 2 Practice: Python Data Structures Practice" + "objectID": "course-materials/live-coding/dictionaries.html", + "href": "course-materials/live-coding/dictionaries.html", + "title": "[Live Coding] Session 2B", + "section": "", + "text": "Introduction to Dictionaries (5 minutes)\nCreating and Accessing Dictionaries (10 minutes)\nManipulating Dictionaries (10 minutes)\nIterating Over Dictionaries (5 minutes)\nStoring Structured Data Using Dictionaries (10 minutes)\nPractical Application in Data Science (5 minutes)\n\n\n\n\n\n\nObjective: Introduce what dictionaries are and their importance in Python.\nKey Points:\n\nDefinition: Dictionaries are collections of key-value pairs.\nUnordered and indexed by keys, making data access fast and efficient.\n\nLive Code Example:\n\nexample_dict = {'name': 'Earth', 'moons': 1}\nprint(\"Example dictionary:\", example_dict)\n\nExample dictionary: {'name': 'Earth', 'moons': 1}\n\n\n\n\n\n\n\nObjective: Show how to create dictionaries using different methods and how to access elements.\nKey Points:\n\nCreating dictionaries using curly braces {} and the dict() constructor.\nAccessing values using keys, demonstrating safe access with .get().\n\nLive Code Example:\n\n# Creating a dictionary using dict()\nanother_dict = dict(name='Mars', moons=2)\nprint(\"Another dictionary (dict()):\", another_dict)\n\nanother_dict2 = {'name': 'Mars',\n 'moons': 2\n }\n\nprint(\"Another dictionary ({}):\", another_dict2)\nprint(\"Are they the same?\", another_dict==another_dict2)\n\n# Accessing elements\nprint(\"Temperature using get (no default):\", example_dict.get('temp'))\nprint(\"Temperature using get (with default):\", example_dict.get('temp', 'No temperature data'))\n\nAnother dictionary (dict()): {'name': 'Mars', 'moons': 2}\nAnother dictionary ({}): {'name': 'Mars', 'moons': 2}\nAre they the same? True\nTemperature using get (no default): None\nTemperature using get (with default): No temperature data\n\n\n\n\n\n\n\nObjective: Teach how to add, update, delete dictionary items.\nKey Points:\n\nAdding and updating by assigning values to keys.\nRemoving items using del and pop().\n\nLive Code Example:\n\n# Adding a new key-value pair\nanother_dict['atmosphere'] = 'thin'\nprint(\"Updated with atmosphere:\", another_dict)\n\n# Removing an entry using del\ndel another_dict['atmosphere']\nprint(\"After deletion:\", another_dict)\n\n# Removing an entry using pop\nmoons = another_dict.pop('moons', 'No moons key found')\nprint(\"Removed moons:\", moons)\nprint(\"After popping moons:\", another_dict)\n\nUpdated with atmosphere: {'name': 'Mars', 'moons': 2, 'atmosphere': 'thin'}\nAfter deletion: {'name': 'Mars', 'moons': 2}\nRemoved moons: 2\nAfter popping moons: {'name': 'Mars'}\n\n\n\n\n\n\n\nObjective: Explain how to iterate over dictionary keys, values, and key-value pairs.\nKey Points:\n\nUsing .keys(), .values(), and .items() for different iteration needs.\n\nLive Code Example:\n\n# Creating a new dictionary for iteration examples\niteration_dict = {'planet': 'Earth', 'moons': 1, 'orbit': 'Sun'}\n\n# Iterating over keys\nprint(\"Keys:\")\nfor key in iteration_dict.keys():\n print(f\"Key: {key}\")\n\n# Iterating over values\nprint(\"\\nValues:\")\nfor value in iteration_dict.values():\n print(f\"Value: {value}\")\n\n# Iterating over items\nprint(\"\\nKey-Value Pairs:\")\nfor key, value in iteration_dict.items():\n print(f\"{key}: {value}\")\n\nKeys:\nKey: planet\nKey: moons\nKey: orbit\n\nValues:\nValue: Earth\nValue: 1\nValue: Sun\n\nKey-Value Pairs:\nplanet: Earth\nmoons: 1\norbit: Sun\n\n\nAdditional Notes:\n\nThe dict.keys(), dict.values(), and dict.items() methods are used to return view objects that provide a dynamic view on the dictionary’s keys, values, and key-value pairs respectively.\nThese views are iterable and reflect changes to the dictionary, making them highly useful for looping and other operations that involve dictionary elements.\nWhat Each Function Returns\n\ndict.keys():\n\n\nReturns a view object displaying all the keys in the dictionary (default)\nUseful for iterating over keys or checking if certain keys exist within the dictionary.\n\n\ndict.values():\n\n\nReturns a view object that contains all the values in the dictionary.\nThis is helpful for operations that need to access every value, such as aggregations or conditions applied to dictionary values.\n\n\ndict.items():\n\n\nReturns a view object with tuples containing (key, value) pairs.\nExtremely useful for looping through both keys and values simultaneously, allowing operations that depend on both elements.\n\nThese methods are particularly useful in data analysis, data cleaning, or any task where data stored in dictionaries needs systematic processing.\nTo learn more about how these iterables can be utilized in Python, you can visit the official Python documentation on iterables and iterators: Python Iterables and Iterators Documentation\n\n\n\n\n\n\nObjective: Show how dictionaries can handle complex, structured data.\nKey Points:\n\nNested dictionaries and lists to create multi-dimensional data structures.\n\nLive Code Example:\n\n# Nested dictionary for environmental data\nenvironmental_data = {\n 'Location A': {'temperature': 19, 'conditions': ['sunny', 'dry']},\n 'Location B': {'temperature': 22, 'conditions': ['rainy', 'humid']}\n}\nprint(\"Environmental data for Location A:\", environmental_data['Location A']['conditions'])\n\nEnvironmental data for Location A: ['sunny', 'dry']\n\n\n\n\n\n\n\nObjective: Demonstrate the use of dictionaries in data science for data aggregation.\nKey Points:\n\nUsing dictionaries to count occurrences and summarize data.\n\nLive Code Example:\n\nweather_log = ['sunny', 'rainy', 'sunny', 'cloudy', 'sunny', 'rainy']\nweather_count = {}\nfor condition in weather_log:\n weather_count[condition] = weather_count.get(condition, 0) + 1\nprint(\"Weather condition counts:\", weather_count)\n\nWeather condition counts: {'sunny': 3, 'rainy': 2, 'cloudy': 1}\n\n\n\n\n\n\n\n\nRecap: Highlight the flexibility and power of dictionaries in Python programming, especially for data manipulation and structured data operations." }, { - "objectID": "course-materials/day2.html#additional-resources", - "href": "course-materials/day2.html#additional-resources", - "title": "Python Data Collections", - "section": "Additional Resources", - "text": "Additional Resources" + "objectID": "course-materials/live-coding/dictionaries.html#session-outline", + "href": "course-materials/live-coding/dictionaries.html#session-outline", + "title": "[Live Coding] Session 2B", + "section": "", + "text": "Introduction to Dictionaries (5 minutes)\nCreating and Accessing Dictionaries (10 minutes)\nManipulating Dictionaries (10 minutes)\nIterating Over Dictionaries (5 minutes)\nStoring Structured Data Using Dictionaries (10 minutes)\nPractical Application in Data Science (5 minutes)\n\n\n\n\n\n\nObjective: Introduce what dictionaries are and their importance in Python.\nKey Points:\n\nDefinition: Dictionaries are collections of key-value pairs.\nUnordered and indexed by keys, making data access fast and efficient.\n\nLive Code Example:\n\nexample_dict = {'name': 'Earth', 'moons': 1}\nprint(\"Example dictionary:\", example_dict)\n\nExample dictionary: {'name': 'Earth', 'moons': 1}\n\n\n\n\n\n\n\nObjective: Show how to create dictionaries using different methods and how to access elements.\nKey Points:\n\nCreating dictionaries using curly braces {} and the dict() constructor.\nAccessing values using keys, demonstrating safe access with .get().\n\nLive Code Example:\n\n# Creating a dictionary using dict()\nanother_dict = dict(name='Mars', moons=2)\nprint(\"Another dictionary (dict()):\", another_dict)\n\nanother_dict2 = {'name': 'Mars',\n 'moons': 2\n }\n\nprint(\"Another dictionary ({}):\", another_dict2)\nprint(\"Are they the same?\", another_dict==another_dict2)\n\n# Accessing elements\nprint(\"Temperature using get (no default):\", example_dict.get('temp'))\nprint(\"Temperature using get (with default):\", example_dict.get('temp', 'No temperature data'))\n\nAnother dictionary (dict()): {'name': 'Mars', 'moons': 2}\nAnother dictionary ({}): {'name': 'Mars', 'moons': 2}\nAre they the same? True\nTemperature using get (no default): None\nTemperature using get (with default): No temperature data\n\n\n\n\n\n\n\nObjective: Teach how to add, update, delete dictionary items.\nKey Points:\n\nAdding and updating by assigning values to keys.\nRemoving items using del and pop().\n\nLive Code Example:\n\n# Adding a new key-value pair\nanother_dict['atmosphere'] = 'thin'\nprint(\"Updated with atmosphere:\", another_dict)\n\n# Removing an entry using del\ndel another_dict['atmosphere']\nprint(\"After deletion:\", another_dict)\n\n# Removing an entry using pop\nmoons = another_dict.pop('moons', 'No moons key found')\nprint(\"Removed moons:\", moons)\nprint(\"After popping moons:\", another_dict)\n\nUpdated with atmosphere: {'name': 'Mars', 'moons': 2, 'atmosphere': 'thin'}\nAfter deletion: {'name': 'Mars', 'moons': 2}\nRemoved moons: 2\nAfter popping moons: {'name': 'Mars'}\n\n\n\n\n\n\n\nObjective: Explain how to iterate over dictionary keys, values, and key-value pairs.\nKey Points:\n\nUsing .keys(), .values(), and .items() for different iteration needs.\n\nLive Code Example:\n\n# Creating a new dictionary for iteration examples\niteration_dict = {'planet': 'Earth', 'moons': 1, 'orbit': 'Sun'}\n\n# Iterating over keys\nprint(\"Keys:\")\nfor key in iteration_dict.keys():\n print(f\"Key: {key}\")\n\n# Iterating over values\nprint(\"\\nValues:\")\nfor value in iteration_dict.values():\n print(f\"Value: {value}\")\n\n# Iterating over items\nprint(\"\\nKey-Value Pairs:\")\nfor key, value in iteration_dict.items():\n print(f\"{key}: {value}\")\n\nKeys:\nKey: planet\nKey: moons\nKey: orbit\n\nValues:\nValue: Earth\nValue: 1\nValue: Sun\n\nKey-Value Pairs:\nplanet: Earth\nmoons: 1\norbit: Sun\n\n\nAdditional Notes:\n\nThe dict.keys(), dict.values(), and dict.items() methods are used to return view objects that provide a dynamic view on the dictionary’s keys, values, and key-value pairs respectively.\nThese views are iterable and reflect changes to the dictionary, making them highly useful for looping and other operations that involve dictionary elements.\nWhat Each Function Returns\n\ndict.keys():\n\n\nReturns a view object displaying all the keys in the dictionary (default)\nUseful for iterating over keys or checking if certain keys exist within the dictionary.\n\n\ndict.values():\n\n\nReturns a view object that contains all the values in the dictionary.\nThis is helpful for operations that need to access every value, such as aggregations or conditions applied to dictionary values.\n\n\ndict.items():\n\n\nReturns a view object with tuples containing (key, value) pairs.\nExtremely useful for looping through both keys and values simultaneously, allowing operations that depend on both elements.\n\nThese methods are particularly useful in data analysis, data cleaning, or any task where data stored in dictionaries needs systematic processing.\nTo learn more about how these iterables can be utilized in Python, you can visit the official Python documentation on iterables and iterators: Python Iterables and Iterators Documentation\n\n\n\n\n\n\nObjective: Show how dictionaries can handle complex, structured data.\nKey Points:\n\nNested dictionaries and lists to create multi-dimensional data structures.\n\nLive Code Example:\n\n# Nested dictionary for environmental data\nenvironmental_data = {\n 'Location A': {'temperature': 19, 'conditions': ['sunny', 'dry']},\n 'Location B': {'temperature': 22, 'conditions': ['rainy', 'humid']}\n}\nprint(\"Environmental data for Location A:\", environmental_data['Location A']['conditions'])\n\nEnvironmental data for Location A: ['sunny', 'dry']\n\n\n\n\n\n\n\nObjective: Demonstrate the use of dictionaries in data science for data aggregation.\nKey Points:\n\nUsing dictionaries to count occurrences and summarize data.\n\nLive Code Example:\n\nweather_log = ['sunny', 'rainy', 'sunny', 'cloudy', 'sunny', 'rainy']\nweather_count = {}\nfor condition in weather_log:\n weather_count[condition] = weather_count.get(condition, 0) + 1\nprint(\"Weather condition counts:\", weather_count)\n\nWeather condition counts: {'sunny': 3, 'rainy': 2, 'cloudy': 1}\n\n\n\n\n\n\n\n\nRecap: Highlight the flexibility and power of dictionaries in Python programming, especially for data manipulation and structured data operations." }, { - "objectID": "course-materials/cheatsheets/workflow_methods.html", - "href": "course-materials/cheatsheets/workflow_methods.html", + "objectID": "course-materials/cheatsheets/JupyterLab.html", + "href": "course-materials/cheatsheets/JupyterLab.html", "title": "EDS 217 Cheatsheet", "section": "", - "text": "This table maps commonly used pandas DataFrame methods to the steps in the course-specific data science workflow. Each method is linked to its official pandas documentation for easy reference.\n\n\n\nDataFrame Method —————————-\nImport\nExploration\nCleaning\nFiltering/ Selection\nTransforming\nSorting\nGrouping\nAggregating\nVisualizing\n\n\n\n\nread_csv()\n✓\n\n\n\n\n\n\n\n\n\n\nread_excel()\n✓\n\n\n\n\n\n\n\n\n\n\nhead()\n\n✓\n\n\n\n\n\n\n\n\n\ntail()\n\n✓\n\n\n\n\n\n\n\n\n\ninfo()\n\n✓\n✓\n\n\n\n\n\n\n\n\ndescribe()\n\n✓\n\n\n\n\n\n✓\n\n\n\ndtypes\n\n✓\n✓\n\n\n\n\n\n\n\n\nshape\n\n✓\n\n\n\n\n\n\n\n\n\ncolumns\n\n✓\n\n\n\n\n\n\n\n\n\nisnull()\n\n✓\n✓\n\n\n\n\n\n\n\n\nnotnull()\n\n✓\n✓\n\n\n\n\n\n\n\n\ndropna()\n\n\n✓\n\n✓\n\n\n\n\n\n\nfillna()\n\n\n✓\n\n✓\n\n\n\n\n\n\nreplace()\n\n\n✓\n\n✓\n\n\n\n\n\n\nastype()\n\n\n✓\n\n✓\n\n\n\n\n\n\nrename()\n\n\n✓\n\n✓\n\n\n\n\n\n\ndrop()\n\n\n✓\n✓\n✓\n\n\n\n\n\n\nduplicated()\n\n✓\n✓\n\n\n\n\n\n\n\n\ndrop_duplicates()\n\n\n✓\n\n✓\n\n\n\n\n\n\nvalue_counts()\n\n✓\n\n\n\n\n\n✓\n\n\n\nunique()\n\n✓\n\n\n\n\n\n\n\n\n\nnunique()\n\n✓\n\n\n\n\n\n✓\n\n\n\nsample()\n\n✓\n\n✓\n\n\n\n\n\n\n\ncorr()\n\n✓\n\n\n\n\n\n✓\n✓\n\n\ncov()\n\n✓\n\n\n\n\n\n✓\n\n\n\ngroupby()\n\n\n\n\n\n\n✓\n\n\n\n\nagg()\n\n\n\n\n\n\n✓\n✓\n\n\n\napply()\n\n\n\n\n✓\n\n\n\n\n\n\nmerge()\n\n\n\n\n✓\n\n\n\n\n\n\njoin()\n\n\n\n\n✓\n\n\n\n\n\n\nconcat()\n\n\n\n\n✓\n\n\n\n\n\n\npivot()\n\n\n\n\n✓\n\n\n\n\n\n\nmelt()\n\n\n\n\n✓\n\n\n\n\n\n\nsort_values()\n\n\n\n\n\n✓\n\n\n\n\n\nnlargest()\n\n\n\n✓\n\n✓\n\n\n\n\n\nnsmallest()\n\n\n\n✓\n\n✓\n\n\n\n\n\nquery()\n\n\n\n✓\n\n\n\n\n\n\n\neval()\n\n\n\n\n✓\n\n\n\n\n\n\ncut()\n\n\n\n\n✓\n\n\n\n\n\n\nqcut()\n\n\n\n\n✓\n\n\n\n\n\n\nget_dummies()\n\n\n\n\n✓\n\n\n\n\n\n\niloc[]\n\n\n\n✓\n\n\n\n\n\n\n\nloc[]\n\n\n\n✓\n\n\n\n\n\n\n\nplot()\n\n✓\n\n\n\n\n\n\n✓\n\n\n\nNote: This table includes some of the most commonly used DataFrame methods, but it’s not exhaustive. Some methods may be applicable to multiple steps depending on the specific use case." + "text": "Before we can use the Variable Inspector in JupyterLab, we need to install the extension. Follow these steps:\n\nStart a new JupyterLab session in your web browser.\nClick on the “+” button in the top left corner to open the Launcher (it might already be opened).\nUnder “Other”, click on “Terminal” to open a new terminal session.\nIn the terminal, type the following command and press Enter:\npip install lckr-jupyterlab-variableinspector\nWait for the installation to complete. You should see a message indicating successful installation.\nOnce the installation is complete, you need to restart JupyterLab for the changes to take effect. To do this:\n\nSave all your open notebooks and files.\nClose all browser tabs with JupyterLab.\nLogin to https://workbench-1.bren.ucsb.edu again.\nRestart a JupyterLab session\n\n\nAfter restarting JupyterLab, the Variable Inspector extension should be available for use.\n\n\n\nNow that you have installed the Variable Inspector extension, here’s how to use it:\nOpen the Variable Inspector: - Menu: View > Activate Command Palette, then type “variable inspector” - Shortcut: Ctrl + Shift + I (Windows/Linux) or Cmd + Shift + I (Mac) - Right-click in an open notebook and select “Open Variable Inspector” (will be at the bottom of the list)\nThe Variable Inspector shows: - Variable names - Types - Values/shapes - Count (for collections)\n\n\n\n\n\n\nLimits to the Variable Inspector\n\n\n\nThe variable inspector is not suitable for use with large dataframes or large arrays. You should use standard commands like df.head(), df.tail(), df.info(), df.describe() to inspect large dataframes.\n\n\n\n\nCode\n# Example variables\nx = 5\ny = \"Hello\"\nz = [1, 2, 3]\n\n# These will appear in the Variable Inspector" }, { - "objectID": "course-materials/cheatsheets/workflow_methods.html#pandas-dataframe-methods-in-data-science-workflows", - "href": "course-materials/cheatsheets/workflow_methods.html#pandas-dataframe-methods-in-data-science-workflows", + "objectID": "course-materials/cheatsheets/JupyterLab.html#variable-inspection-in-jupyterlab", + "href": "course-materials/cheatsheets/JupyterLab.html#variable-inspection-in-jupyterlab", "title": "EDS 217 Cheatsheet", "section": "", - "text": "This table maps commonly used pandas DataFrame methods to the steps in the course-specific data science workflow. Each method is linked to its official pandas documentation for easy reference.\n\n\n\nDataFrame Method —————————-\nImport\nExploration\nCleaning\nFiltering/ Selection\nTransforming\nSorting\nGrouping\nAggregating\nVisualizing\n\n\n\n\nread_csv()\n✓\n\n\n\n\n\n\n\n\n\n\nread_excel()\n✓\n\n\n\n\n\n\n\n\n\n\nhead()\n\n✓\n\n\n\n\n\n\n\n\n\ntail()\n\n✓\n\n\n\n\n\n\n\n\n\ninfo()\n\n✓\n✓\n\n\n\n\n\n\n\n\ndescribe()\n\n✓\n\n\n\n\n\n✓\n\n\n\ndtypes\n\n✓\n✓\n\n\n\n\n\n\n\n\nshape\n\n✓\n\n\n\n\n\n\n\n\n\ncolumns\n\n✓\n\n\n\n\n\n\n\n\n\nisnull()\n\n✓\n✓\n\n\n\n\n\n\n\n\nnotnull()\n\n✓\n✓\n\n\n\n\n\n\n\n\ndropna()\n\n\n✓\n\n✓\n\n\n\n\n\n\nfillna()\n\n\n✓\n\n✓\n\n\n\n\n\n\nreplace()\n\n\n✓\n\n✓\n\n\n\n\n\n\nastype()\n\n\n✓\n\n✓\n\n\n\n\n\n\nrename()\n\n\n✓\n\n✓\n\n\n\n\n\n\ndrop()\n\n\n✓\n✓\n✓\n\n\n\n\n\n\nduplicated()\n\n✓\n✓\n\n\n\n\n\n\n\n\ndrop_duplicates()\n\n\n✓\n\n✓\n\n\n\n\n\n\nvalue_counts()\n\n✓\n\n\n\n\n\n✓\n\n\n\nunique()\n\n✓\n\n\n\n\n\n\n\n\n\nnunique()\n\n✓\n\n\n\n\n\n✓\n\n\n\nsample()\n\n✓\n\n✓\n\n\n\n\n\n\n\ncorr()\n\n✓\n\n\n\n\n\n✓\n✓\n\n\ncov()\n\n✓\n\n\n\n\n\n✓\n\n\n\ngroupby()\n\n\n\n\n\n\n✓\n\n\n\n\nagg()\n\n\n\n\n\n\n✓\n✓\n\n\n\napply()\n\n\n\n\n✓\n\n\n\n\n\n\nmerge()\n\n\n\n\n✓\n\n\n\n\n\n\njoin()\n\n\n\n\n✓\n\n\n\n\n\n\nconcat()\n\n\n\n\n✓\n\n\n\n\n\n\npivot()\n\n\n\n\n✓\n\n\n\n\n\n\nmelt()\n\n\n\n\n✓\n\n\n\n\n\n\nsort_values()\n\n\n\n\n\n✓\n\n\n\n\n\nnlargest()\n\n\n\n✓\n\n✓\n\n\n\n\n\nnsmallest()\n\n\n\n✓\n\n✓\n\n\n\n\n\nquery()\n\n\n\n✓\n\n\n\n\n\n\n\neval()\n\n\n\n\n✓\n\n\n\n\n\n\ncut()\n\n\n\n\n✓\n\n\n\n\n\n\nqcut()\n\n\n\n\n✓\n\n\n\n\n\n\nget_dummies()\n\n\n\n\n✓\n\n\n\n\n\n\niloc[]\n\n\n\n✓\n\n\n\n\n\n\n\nloc[]\n\n\n\n✓\n\n\n\n\n\n\n\nplot()\n\n✓\n\n\n\n\n\n\n✓\n\n\n\nNote: This table includes some of the most commonly used DataFrame methods, but it’s not exhaustive. Some methods may be applicable to multiple steps depending on the specific use case." + "text": "Before we can use the Variable Inspector in JupyterLab, we need to install the extension. Follow these steps:\n\nStart a new JupyterLab session in your web browser.\nClick on the “+” button in the top left corner to open the Launcher (it might already be opened).\nUnder “Other”, click on “Terminal” to open a new terminal session.\nIn the terminal, type the following command and press Enter:\npip install lckr-jupyterlab-variableinspector\nWait for the installation to complete. You should see a message indicating successful installation.\nOnce the installation is complete, you need to restart JupyterLab for the changes to take effect. To do this:\n\nSave all your open notebooks and files.\nClose all browser tabs with JupyterLab.\nLogin to https://workbench-1.bren.ucsb.edu again.\nRestart a JupyterLab session\n\n\nAfter restarting JupyterLab, the Variable Inspector extension should be available for use.\n\n\n\nNow that you have installed the Variable Inspector extension, here’s how to use it:\nOpen the Variable Inspector: - Menu: View > Activate Command Palette, then type “variable inspector” - Shortcut: Ctrl + Shift + I (Windows/Linux) or Cmd + Shift + I (Mac) - Right-click in an open notebook and select “Open Variable Inspector” (will be at the bottom of the list)\nThe Variable Inspector shows: - Variable names - Types - Values/shapes - Count (for collections)\n\n\n\n\n\n\nLimits to the Variable Inspector\n\n\n\nThe variable inspector is not suitable for use with large dataframes or large arrays. You should use standard commands like df.head(), df.tail(), df.info(), df.describe() to inspect large dataframes.\n\n\n\n\nCode\n# Example variables\nx = 5\ny = \"Hello\"\nz = [1, 2, 3]\n\n# These will appear in the Variable Inspector" }, { - "objectID": "course-materials/cheatsheets/workflow_methods.html#key-takeaways", - "href": "course-materials/cheatsheets/workflow_methods.html#key-takeaways", + "objectID": "course-materials/cheatsheets/JupyterLab.html#essential-magic-commands", + "href": "course-materials/cheatsheets/JupyterLab.html#essential-magic-commands", "title": "EDS 217 Cheatsheet", - "section": "Key Takeaways", - "text": "Key Takeaways\n\nImport primarily involves reading data from various sources.\nExploration methods help understand the structure and content of the data.\nCleaning methods focus on handling missing data, duplicates, and data type issues.\nFiltering/Selection methods allow you to subset your data based on various conditions.\nTransforming methods cover a wide range of data manipulation tasks.\nSorting methods help arrange data in a specific order.\nGrouping is often a precursor to aggregation operations.\nAggregating methods compute summary statistics on data.\nVisualizing methods help create graphical representations of the data.\n\nRemember that the applicability of methods can vary depending on the specific project and dataset. This table serves as a general guide to help you navigate the pandas DataFrame methods in the context of your course’s data science workflow. The links to the official documentation provide more detailed information about each method’s usage and parameters." + "section": "Essential Magic Commands", + "text": "Essential Magic Commands\nMagic commands start with % (line magics) or %% (cell magics). Note that available magic commands may vary depending on your Jupyter environment and installed extensions.\n\nViewing Variables\n\n\nCode\n# List all variables\n%whos\n\n# List just variable names\n%who\n\n\nVariable Type Data/Info\n----------------------------------\nojs_define function <function ojs_define at 0x1a6303e20>\nx int 5\ny str Hello\nz list n=3\nojs_define x y z \n\n\n\n\nRunning Shell Commands\n\n\nCode\n# Run a shell command\n!echo \"Hello from the shell!\"\n\n# Capture output in a variable\nfiles = !ls\nprint(files)\n\n\nHello from the shell!\n['JupyterLab.qmd', 'JupyterLab.quarto_ipynb', 'Pandas_Cheat_Sheet.pdf', 'bar_plots.html', 'bar_plots.qmd', 'chart_customization.qmd', 'comprehensions.qmd', 'control_flows.qmd', 'data_aggregation.html', 'data_aggregation.qmd', 'data_cleaning.html', 'data_cleaning.qmd', 'data_grouping.qmd', 'data_merging.html', 'data_merging.qmd', 'data_selection.qmd', 'dictionaries.html', 'dictionaries.qmd', 'first_steps.html', 'first_steps.ipynb', 'functions.html', 'functions.qmd', 'lists.html', 'lists.qmd', 'matplotlib.html', 'matplotlib.qmd', 'numpy.html', 'numpy.qmd', 'ocean_temperatures.csv', 'output.csv', 'pandas_dataframes.qmd', 'pandas_series.qmd', 'print.html', 'print.qmd', 'random_numbers.qmd', 'read_csv.qmd', 'seaborn.qmd', 'sets.qmd', 'setting_up_python.html', 'setting_up_python.qmd', 'timeseries.html', 'timeseries.qmd', 'workflow_methods.html', 'workflow_methods.qmd']" }, { - "objectID": "course-materials/cheatsheets/setting_up_python.html", - "href": "course-materials/cheatsheets/setting_up_python.html", + "objectID": "course-materials/cheatsheets/JupyterLab.html#useful-keyboard-shortcuts", + "href": "course-materials/cheatsheets/JupyterLab.html#useful-keyboard-shortcuts", "title": "EDS 217 Cheatsheet", - "section": "", - "text": "This guide will help you set up Python 3.10 and JupyterLab on your local machine using Miniconda. We’ll also install core data science libraries." + "section": "Useful Keyboard Shortcuts", + "text": "Useful Keyboard Shortcuts\n\n\n\n\n\n\n\n\nAction\nWindows/Linux\nMac\n\n\n\n\nRun cell\nShift + Enter\nShift + Enter\n\n\nRun cell and insert below\nAlt + Enter\nOption + Enter\n\n\nRun cell and select below\nCtrl + Enter\nCmd + Enter\n\n\nEnter command mode\nEsc\nEsc\n\n\nEnter edit mode\nEnter\nEnter\n\n\nSave notebook\nCtrl + S\nCmd + S\n\n\nInsert cell above\nA (in command mode)\nA (in command mode)\n\n\nInsert cell below\nB (in command mode)\nB (in command mode)\n\n\nCut cell\nX (in command mode)\nX (in command mode)\n\n\nCopy cell\nC (in command mode)\nC (in command mode)\n\n\nPaste cell\nV (in command mode)\nV (in command mode)\n\n\nUndo cell action\nZ (in command mode)\nZ (in command mode)\n\n\nChange to code cell\nY (in command mode)\nY (in command mode)\n\n\nChange to markdown cell\nM (in command mode)\nM (in command mode)\n\n\nSplit cell at cursor\nCtrl + Shift + -\nCmd + Shift + -\n\n\nMerge selected cells\nShift + M (in command mode)\nShift + M (in command mode)\n\n\nToggle line numbers\nShift + L (in command mode)\nShift + L (in command mode)\n\n\nToggle output\nO (in command mode)\nO (in command mode)" }, { - "objectID": "course-materials/cheatsheets/setting_up_python.html#step-0-opening-a-terminal", - "href": "course-materials/cheatsheets/setting_up_python.html#step-0-opening-a-terminal", + "objectID": "course-materials/cheatsheets/JupyterLab.html#tips-for-beginners", + "href": "course-materials/cheatsheets/JupyterLab.html#tips-for-beginners", "title": "EDS 217 Cheatsheet", - "section": "Step 0: Opening a Terminal", - "text": "Step 0: Opening a Terminal\nBefore we begin, you’ll need to know how to open a terminal (command-line interface) on your operating system:\n\nFor Windows:\n\nPress the Windows key + R to open the Run dialog.\nType cmd and press Enter. Alternatively, search for “Command Prompt” in the Start menu.\n\n\n\nFor macOS:\n\nPress Command + Space to open Spotlight Search.\nType “Terminal” and press Enter. Alternatively, go to Applications > Utilities > Terminal.\n\n\n\nFor Linux:\n\nMost Linux distributions use Ctrl + Alt + T as a keyboard shortcut to open the terminal.\nYou can also search for “Terminal” in your distribution’s application menu." + "section": "Tips for Beginners", + "text": "Tips for Beginners\n\nUse Tab for code completion\nAdd ? after a function name for more detailed help (e.g., print?)\nUse dir() to see available attributes/methods (e.g., dir(str))\nUse the help() command to get information about functions and objects." + }, + { + "objectID": "course-materials/cheatsheets/JupyterLab.html#resources-for-further-learning", + "href": "course-materials/cheatsheets/JupyterLab.html#resources-for-further-learning", + "title": "EDS 217 Cheatsheet", + "section": "Resources for Further Learning", + "text": "Resources for Further Learning\n\nJupyterLab Documentation\nIPython Documentation (for magic commands)\nJupyter Notebook Cheatsheet\nDataCamp JupyterLab Tutorial" }, { - "objectID": "course-materials/cheatsheets/setting_up_python.html#step-1-download-and-install-miniconda", - "href": "course-materials/cheatsheets/setting_up_python.html#step-1-download-and-install-miniconda", - "title": "EDS 217 Cheatsheet", - "section": "Step 1: Download and Install Miniconda", - "text": "Step 1: Download and Install Miniconda\n\nFor Windows:\n\nDownload the Miniconda installer for Windows from the official website.\nRun the installer and follow the prompts.\nDuring installation, make sure to add Miniconda to your PATH environment variable when prompted.\n\n\n\nFor macOS:\n\nDownload the Miniconda installer for macOS from the official website.\nOpen Terminal and navigate to the directory containing the downloaded file.\nRun the following command:\nbash Miniconda3-latest-MacOSX-x86_64.sh\nFollow the prompts and accept the license agreement.\n\n\n\nFor Linux:\n\nDownload the Miniconda installer for Linux from the official website.\nOpen a terminal and navigate to the directory containing the downloaded file.\nRun the following command:\nbash Miniconda3-latest-Linux-x86_64.sh\nFollow the prompts and accept the license agreement." + "objectID": "course-materials/lectures/05_Session_1A.html#what-is-a-groupby-object", + "href": "course-materials/lectures/05_Session_1A.html#what-is-a-groupby-object", + "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", + "section": "What is a GroupBy Object?", + "text": "What is a GroupBy Object?\n\n\nCreated when you use the groupby() function in pandas\nA plan for splitting data into groups, not the result itself\nLazy evaluation: computations occur only when an aggregation method is called\nContains:\n\nReference to the original DataFrame\nColumns to group by\nInternal dictionary mapping group keys to row indices" }, { - "objectID": "course-materials/cheatsheets/setting_up_python.html#step-2-set-up-python-3.10-and-core-libraries", - "href": "course-materials/cheatsheets/setting_up_python.html#step-2-set-up-python-3.10-and-core-libraries", - "title": "EDS 217 Cheatsheet", - "section": "Step 2: Set up Python 3.10 and Core Libraries", - "text": "Step 2: Set up Python 3.10 and Core Libraries\nOpen a new terminal or command prompt window to ensure the Miniconda installation is recognized.\nRun the following commands:\nconda install python=3.10\nconda install jupyter jupyterlab numpy pandas matplotlib seaborn\nThis will install Python 3.10, JupyterLab, and the core data science libraries in your base environment." + "objectID": "course-materials/lectures/05_Session_1A.html#structure-of-a-groupby-object", + "href": "course-materials/lectures/05_Session_1A.html#structure-of-a-groupby-object", + "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", + "section": "Structure of a GroupBy Object", + "text": "Structure of a GroupBy Object\n\n\nInternal dictionary structure:\n{\n group_key_1: [row_index_1, row_index_3, ...],\n group_key_2: [row_index_2, row_index_4, ...],\n ...\n}\nThis structure allows for efficient data access and aggregation\nActual data isn’t copied or split until necessary" }, { - "objectID": "course-materials/cheatsheets/setting_up_python.html#step-3-verify-installation", - "href": "course-materials/cheatsheets/setting_up_python.html#step-3-verify-installation", - "title": "EDS 217 Cheatsheet", - "section": "Step 3: Verify Installation", - "text": "Step 3: Verify Installation\n\nTo verify that Python 3.10 is installed, run:\npython --version\nTo launch JupyterLab, run:\njupyter lab\n\nThis should open JupyterLab in your default web browser. You can now create new notebooks and start coding!" + "objectID": "course-materials/lectures/05_Session_1A.html#groupby-example", + "href": "course-materials/lectures/05_Session_1A.html#groupby-example", + "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", + "section": "GroupBy Example", + "text": "GroupBy Example\n\nimport pandas as pd\n\ndf = pd.DataFrame({\n 'Category': ['A', 'B', 'A', 'B', 'A', 'B'],\n 'Value': [1, 2, 3, 4, 5, 6]\n})\n\ngrouped = df.groupby('Category')\n# No computation yet!\n\nresult = grouped.sum() # Now we compute\nprint(result)\n\n Value\nCategory \nA 9\nB 12" }, { - "objectID": "course-materials/cheatsheets/setting_up_python.html#additional-notes", - "href": "course-materials/cheatsheets/setting_up_python.html#additional-notes", - "title": "EDS 217 Cheatsheet", - "section": "Additional Notes", - "text": "Additional Notes\n\nTo update Miniconda and installed packages in the future, use:\nconda update --all\nWhile we’re using the base environment for this quick setup, it’s generally a good practice to create separate environments for different projects. You can explore this concept later as you become more familiar with conda." + "objectID": "course-materials/lectures/05_Session_1A.html#why-do-we-need-.copy", + "href": "course-materials/lectures/05_Session_1A.html#why-do-we-need-.copy", + "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", + "section": "Why Do We Need .copy()?", + "text": "Why Do We Need .copy()?\n\n\nMany pandas operations return views instead of copies\nViews are memory-efficient but can lead to unexpected modifications\n.copy() creates a new, independent object\nUse .copy() when you want to modify data without affecting the original" }, { - "objectID": "course-materials/cheatsheets/numpy.html", - "href": "course-materials/cheatsheets/numpy.html", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "import numpy as np\n\n\n\n\n\narr = np.array([1, 2, 3, 4, 5])\n\n\n\n# From Series\ns = pd.Series([1, 2, 3, 4, 5])\narr = s.to_numpy()\n\n# From DataFrame\ndf = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})\narr = df.to_numpy()\n\n\n\n\n\n\narr1 = np.array([1, 2, 3])\narr2 = np.array([4, 5, 6])\n\n# Addition\nresult = arr1 + arr2\n\n# Multiplication\nresult = arr1 * arr2\n\n# Division\nresult = arr1 / arr2\n\n\n\n# Square root\nsqrt_arr = np.sqrt(arr)\n\n# Exponential\nexp_arr = np.exp(arr)\n\n# Absolute value\nabs_arr = np.abs(arr)\n\n\n\n\n\n\n# Mean\nmean = np.mean(arr)\n\n# Median\nmedian = np.median(arr)\n\n# Standard deviation\nstd = np.std(arr)\n\n\n\n# Minimum\nmin_val = np.min(arr)\n\n# Maximum\nmax_val = np.max(arr)\n\n# Sum\ntotal = np.sum(arr)\n\n\n\n\n\n\narr = np.array([1, 2, 3, 4, 5, 6])\nreshaped = arr.reshape(2, 3)\n\n\n\ntransposed = arr.T\n\n\n\nflattened = arr.flatten()\n\n\n\n\n\n\n# Generate 5 random numbers between 0 and 1\nrandom_uniform = np.random.rand(5)\n\n# Generate 5 random integers between 1 and 10\nrandom_integers = np.random.randint(1, 11, 5)\n\n\n\nnp.random.seed(42) # For reproducibility\n\n\n\n\n\n\n# Check for NaN\nnp.isnan(arr)\n\n# Replace NaN with a value\nnp.nan_to_num(arr, nan=0.0)\n\n\n\n\n\n\n# Get unique values\nunique_values = np.unique(arr)\n\n# Get value counts (similar to pandas value_counts())\nvalues, counts = np.unique(arr, return_counts=True)\n\n\n\n# Similar to pandas' where, but returns an array\nresult = np.where(condition, x, y)\n\n\n\n# Concatenate arrays (similar to pd.concat())\nconcatenated = np.concatenate([arr1, arr2, arr3])\n\n\n\n\n\nPerformance: For large datasets, NumPy operations can be faster than pandas.\nMemory efficiency: NumPy arrays use less memory than pandas objects.\nSpecific mathematical operations: Some mathematical operations are more straightforward in NumPy.\nInterfacing with other libraries: Many scientific Python libraries use NumPy arrays.\n\nRemember, while these NumPy operations are useful, many have direct equivalents in pandas that work on Series and DataFrames. Always consider whether you can perform the operation directly in pandas before converting to NumPy arrays." + "objectID": "course-materials/lectures/05_Session_1A.html#views-vs.-copies-in-pandas", + "href": "course-materials/lectures/05_Session_1A.html#views-vs.-copies-in-pandas", + "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", + "section": "Views vs. Copies in Pandas", + "text": "Views vs. Copies in Pandas\n\n\nFiltering operations usually create views:\n\ndf[df['column'] > value]\ndf.loc[condition]\n\nSome operations create copies by default:\n\ndf.drop(columns=['col'])\ndf.dropna()\ndf.reset_index()\n\nBut it’s not always clear which operations do what!" }, { - "objectID": "course-materials/cheatsheets/numpy.html#importing-numpy", - "href": "course-materials/cheatsheets/numpy.html#importing-numpy", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "import numpy as np" + "objectID": "course-materials/lectures/05_Session_1A.html#when-to-use-.copy", + "href": "course-materials/lectures/05_Session_1A.html#when-to-use-.copy", + "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", + "section": "When to Use .copy()", + "text": "When to Use .copy()\n\n\nWhen assigning a slice of a DataFrame to a new variable\nBefore making changes to a DataFrame you want to keep separate\nIn functions where you don’t want to modify the input data\nWhen chaining operations and you’re unsure about view vs. copy behavior\nTo ensure you have an independent copy, regardless of the operation" }, { - "objectID": "course-materials/cheatsheets/numpy.html#creating-numpy-arrays", - "href": "course-materials/cheatsheets/numpy.html#creating-numpy-arrays", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "arr = np.array([1, 2, 3, 4, 5])\n\n\n\n# From Series\ns = pd.Series([1, 2, 3, 4, 5])\narr = s.to_numpy()\n\n# From DataFrame\ndf = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})\narr = df.to_numpy()" + "objectID": "course-materials/lectures/05_Session_1A.html#copy-example", + "href": "course-materials/lectures/05_Session_1A.html#copy-example", + "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", + "section": ".copy() Example", + "text": ".copy() Example\n\n# Filtering creates a view\ndf_view = df[df['Category'] == 'A']\ndf_view['Value'] += 10 # This modifies the original df!\n\n# Using copy() creates an independent DataFrame\ndf_copy = df[df['Category'] == 'A'].copy()\ndf_copy['Value'] += 10 # This doesn't affect the original df\n\nprint(\"Original df:\")\nprint(df)\nprint(\"\\nModified copy:\")\nprint(df_copy)\n\nOriginal df:\n Category Value\n0 A 1\n1 B 2\n2 A 3\n3 B 4\n4 A 5\n5 B 6\n\nModified copy:\n Category Value\n0 A 11\n2 A 13\n4 A 15" }, { - "objectID": "course-materials/cheatsheets/numpy.html#basic-array-operations", - "href": "course-materials/cheatsheets/numpy.html#basic-array-operations", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "arr1 = np.array([1, 2, 3])\narr2 = np.array([4, 5, 6])\n\n# Addition\nresult = arr1 + arr2\n\n# Multiplication\nresult = arr1 * arr2\n\n# Division\nresult = arr1 / arr2\n\n\n\n# Square root\nsqrt_arr = np.sqrt(arr)\n\n# Exponential\nexp_arr = np.exp(arr)\n\n# Absolute value\nabs_arr = np.abs(arr)" + "objectID": "course-materials/lectures/05_Session_1A.html#the-triple-constraint-dilemma", + "href": "course-materials/lectures/05_Session_1A.html#the-triple-constraint-dilemma", + "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", + "section": "The Triple Constraint Dilemma", + "text": "The Triple Constraint Dilemma\n\n\nIn software design, often you can only optimize two out of three:\n\nPerformance\nFlexibility\nEase of Use\n\nThis applies to data science tools like R and Python" }, { - "objectID": "course-materials/cheatsheets/numpy.html#statistical-operations", - "href": "course-materials/cheatsheets/numpy.html#statistical-operations", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "# Mean\nmean = np.mean(arr)\n\n# Median\nmedian = np.median(arr)\n\n# Standard deviation\nstd = np.std(arr)\n\n\n\n# Minimum\nmin_val = np.min(arr)\n\n# Maximum\nmax_val = np.max(arr)\n\n# Sum\ntotal = np.sum(arr)" + "objectID": "course-materials/lectures/05_Session_1A.html#r-vs-python-trade-offs", + "href": "course-materials/lectures/05_Session_1A.html#r-vs-python-trade-offs", + "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", + "section": "R vs Python: Trade-offs", + "text": "R vs Python: Trade-offs\n\n\nR\n\n✓✓ Ease of Use\n✓ General Flexibility\n✗ General Performance\n\n\nPython\n\n✓✓ Performance\n✓ General Flexibility\n✗ Ease of Use (for data tasks)" }, { - "objectID": "course-materials/cheatsheets/numpy.html#array-manipulation", - "href": "course-materials/cheatsheets/numpy.html#array-manipulation", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "arr = np.array([1, 2, 3, 4, 5, 6])\nreshaped = arr.reshape(2, 3)\n\n\n\ntransposed = arr.T\n\n\n\nflattened = arr.flatten()" + "objectID": "course-materials/lectures/05_Session_1A.html#r-strengths-and-limitations", + "href": "course-materials/lectures/05_Session_1A.html#r-strengths-and-limitations", + "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", + "section": "R: Strengths and Limitations", + "text": "R: Strengths and Limitations\n\n\nStrengths:\n\nIntuitive for statistical operations\nConsistent data manipulation with tidyverse\nExcellent for quick analyses and visualizations\n\nLimitations:\n\nCan be slower for very large datasets\nLess efficient memory usage (more frequent copying)\nLimited in general-purpose programming tasks" }, { - "objectID": "course-materials/cheatsheets/numpy.html#random-number-generation", - "href": "course-materials/cheatsheets/numpy.html#random-number-generation", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "# Generate 5 random numbers between 0 and 1\nrandom_uniform = np.random.rand(5)\n\n# Generate 5 random integers between 1 and 10\nrandom_integers = np.random.randint(1, 11, 5)\n\n\n\nnp.random.seed(42) # For reproducibility" + "objectID": "course-materials/lectures/05_Session_1A.html#python-strengths-and-limitations", + "href": "course-materials/lectures/05_Session_1A.html#python-strengths-and-limitations", + "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", + "section": "Python: Strengths and Limitations", + "text": "Python: Strengths and Limitations\n\n\nStrengths:\n\nEfficient for large-scale data processing\nVersatile for various programming tasks\nStrong in machine learning and deep learning\n\nLimitations:\n\nLess intuitive API for data manipulation (pandas)\nSteeper learning curve for data science tasks\nRequires more code for some statistical operations" }, { - "objectID": "course-materials/cheatsheets/numpy.html#working-with-missing-data", - "href": "course-materials/cheatsheets/numpy.html#working-with-missing-data", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "# Check for NaN\nnp.isnan(arr)\n\n# Replace NaN with a value\nnp.nan_to_num(arr, nan=0.0)" + "objectID": "course-materials/lectures/05_Session_1A.html#implications-for-data-science", + "href": "course-materials/lectures/05_Session_1A.html#implications-for-data-science", + "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", + "section": "Implications for Data Science", + "text": "Implications for Data Science\n\n\nR excels in statistical computing and quick analyses\nPython shines in large-scale data processing and diverse applications\nChoice depends on specific needs:\n\nProject scale\nPerformance requirements\nTeam expertise\nIntegration with other systems" }, { - "objectID": "course-materials/cheatsheets/numpy.html#useful-numpy-functions-for-pandas-users", - "href": "course-materials/cheatsheets/numpy.html#useful-numpy-functions-for-pandas-users", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "# Get unique values\nunique_values = np.unique(arr)\n\n# Get value counts (similar to pandas value_counts())\nvalues, counts = np.unique(arr, return_counts=True)\n\n\n\n# Similar to pandas' where, but returns an array\nresult = np.where(condition, x, y)\n\n\n\n# Concatenate arrays (similar to pd.concat())\nconcatenated = np.concatenate([arr1, arr2, arr3])" + "objectID": "course-materials/lectures/05_Session_1A.html#conclusion", + "href": "course-materials/lectures/05_Session_1A.html#conclusion", + "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", + "section": "Conclusion", + "text": "Conclusion\n\nUnderstanding these concepts helps in:\n\nChoosing the right tool for the job\nWriting efficient and correct code\nAppreciating the design decisions in data science tools\n\nBoth R and Python have their places in a data scientist’s toolkit\nConsider using both languages to leverage their respective strengths" }, { - "objectID": "course-materials/cheatsheets/numpy.html#when-to-use-numpy-with-pandas", - "href": "course-materials/cheatsheets/numpy.html#when-to-use-numpy-with-pandas", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "Performance: For large datasets, NumPy operations can be faster than pandas.\nMemory efficiency: NumPy arrays use less memory than pandas objects.\nSpecific mathematical operations: Some mathematical operations are more straightforward in NumPy.\nInterfacing with other libraries: Many scientific Python libraries use NumPy arrays.\n\nRemember, while these NumPy operations are useful, many have direct equivalents in pandas that work on Series and DataFrames. Always consider whether you can perform the operation directly in pandas before converting to NumPy arrays." + "objectID": "course-materials/lectures/05_Session_1A.html#questions", + "href": "course-materials/lectures/05_Session_1A.html#questions", + "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", + "section": "Questions?", + "text": "Questions?" }, { - "objectID": "course-materials/cheatsheets/lists.html", - "href": "course-materials/cheatsheets/lists.html", - "title": "EDS 217 Cheatsheet", + "objectID": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html", + "href": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html", + "title": "Interactive Session 1B", "section": "", - "text": "my_list = []\n\n\n\nmy_list = [1, 2, 3, 4, 5]\n\n\n\nmixed_list = [1, \"hello\", 3.14, True]\n\n\n\nnested_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]\n\n\n\n\n\n\nmy_list = [10, 20, 30, 40]\nprint(my_list[0]) # Output: 10\nprint(my_list[2]) # Output: 30\n\n\n\nprint(my_list[-1]) # Output: 40\n\n\n\nsublist = my_list[1:3] # Output: [20, 30]\n\n\n\n\n\n\nmy_list[1] = 25 # my_list becomes [10, 25, 30, 40]\n\n\n\nmy_list.append(50) # my_list becomes [10, 25, 30, 40, 50]\n\n\n\nmy_list.insert(1, 15) # my_list becomes [10, 15, 25, 30, 40, 50]\n\n\n\nmy_list.extend([60, 70]) # my_list becomes [10, 15, 25, 30, 40, 50, 60, 70]\n\n\n\n\n\n\nmy_list.remove(25) # my_list becomes [10, 15, 30, 40, 50, 60, 70]\n\n\n\ndel my_list[0] # my_list becomes [15, 30, 40, 50, 60, 70]\n\n\n\nlast_element = my_list.pop() # my_list becomes [15, 30, 40, 50, 60]\n\n\n\nelement = my_list.pop(2) # my_list becomes [15, 30, 50, 60]\n\n\n\n\n\n\nlength = len(my_list) # Output: 4\n\n\n\nis_in_list = 30 in my_list # Output: True\n\n\n\ncombined_list = my_list + [80, 90] # Output: [15, 30, 50, 60, 80, 90]\n\n\n\nrepeated_list = [1, 2, 3] * 3 # Output: [1, 2, 3, 1, 2, 3, 1, 2, 3]\n\n\n\n\n\n\nfor item in my_list:\n print(item)\n\n\n\nfor index, value in enumerate(my_list):\n print(f\"Index {index} has value {value}\")\n\n\n\n\n\n\nsquares = [x**2 for x in range(5)] # Output: [0, 1, 4, 9, 16]\n\n\n\nevens = [x for x in range(10) if x % 2 == 0] # Output: [0, 2, 4, 6, 8]\n\n\n\n\n\n\nmy_list.sort() # Sorts in place\n\n\n\nsorted_list = sorted(my_list) # Returns a sorted copy\n\n\n\nmy_list.reverse() # Reverses in place\n\n\n\ncount = my_list.count(30) # Output: 1\n\n\n\nindex = my_list.index(50) # Output: 2\n\n\n\n\n\n\n# Incorrect\nfor item in my_list:\n if item < 20:\n my_list.remove(item)\n\n# Correct (Using a copy)\nfor item in my_list[:]:\n if item < 20:\n my_list.remove(item)" + "text": "Now that you’ve seen the REPL in iPython, from now on in this class you will code in Jupyter Notebooks. Jupyter is an incredibly awesome and user-friendly integrated development environment (IDE). An IDE provides a place for data scientists to see and work with a bunch of different aspects of their work in a nice, organized interface." }, { - "objectID": "course-materials/cheatsheets/lists.html#creating-lists", - "href": "course-materials/cheatsheets/lists.html#creating-lists", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "my_list = []\n\n\n\nmy_list = [1, 2, 3, 4, 5]\n\n\n\nmixed_list = [1, \"hello\", 3.14, True]\n\n\n\nnested_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]" + "objectID": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#meet-jupyterlab", + "href": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#meet-jupyterlab", + "title": "Interactive Session 1B", + "section": "1. Meet JupyterLab", + "text": "1. Meet JupyterLab\nJupyterLab provides a nice user interface for data science, development, reporting, and collaboration (all of which you’ll learn about throughout the MEDS program) in one place.\n\nFeatures of JupyterLab as an IDE\n\nInteractive Computing: JupyterLab is designed primarily for interactive computing and data analysis. It supports live code execution, data visualization, and interactive widgets, which are key features of modern IDEs.\nMulti-language Support: While originally developed for Python, JupyterLab supports many other programming languages through the use of kernels, making it versatile for various programming tasks.\nRich Text Editing: It provides a rich text editor for creating and editing Jupyter Notebooks, which can contain both code and narrative text (Markdown), allowing for documentation and code to coexist.\nCode Execution: JupyterLab allows you to execute code cells and see the output immediately, making it suitable for testing and iterating on code quickly.\nFile Management: It includes a file manager for browsing and managing project files, similar to the file explorers found in traditional IDEs.\nExtensions and Customization: JupyterLab supports numerous extensions that can enhance its capabilities, such as version control integration, terminal access, and enhanced visualizations.\nIntegrated Tools: It has an integrated terminal, variable inspector, and other tools that are typically part of an IDE, providing a comprehensive environment for development.\n\n\n\nDifferences from Traditional IDEs\n\nFocus on Notebooks: Unlike many traditional IDEs that focus on scripting and full-scale software development, JupyterLab emphasizes the use of notebooks for exploratory data analysis and research.\nNon-linear Workflow: JupyterLab allows for a non-linear workflow, where users can execute cells out of order and iteratively modify and test code." }, { - "objectID": "course-materials/cheatsheets/lists.html#accessing-elements", - "href": "course-materials/cheatsheets/lists.html#accessing-elements", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "my_list = [10, 20, 30, 40]\nprint(my_list[0]) # Output: 10\nprint(my_list[2]) # Output: 30\n\n\n\nprint(my_list[-1]) # Output: 40\n\n\n\nsublist = my_list[1:3] # Output: [20, 30]" + "objectID": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#jupyterlab-interface", + "href": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#jupyterlab-interface", + "title": "Interactive Session 1B", + "section": "JupyterLab Interface", + "text": "JupyterLab Interface\n\nPrimary panes include the Main Work Area pane, Sidebar, and Menu Bar.\n\n\n\nAs you work, Jupyer Lab will add additional tabs/panes that contain figures and data inspectors, or even other file types. You can rearrange these panes to organize your workspace however you like.\n\nYou can check out the JupyterLab User Guide for tons of information and helpful tips!\nJupyterLab is a powerful interactive development environment (IDE) that allows you to work with Jupyter Notebooks, text editors, terminals, and other components in a single, integrated environment. It’s widely used in data science, scientific computing, and education." }, { - "objectID": "course-materials/cheatsheets/lists.html#modifying-lists", - "href": "course-materials/cheatsheets/lists.html#modifying-lists", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "my_list[1] = 25 # my_list becomes [10, 25, 30, 40]\n\n\n\nmy_list.append(50) # my_list becomes [10, 25, 30, 40, 50]\n\n\n\nmy_list.insert(1, 15) # my_list becomes [10, 15, 25, 30, 40, 50]\n\n\n\nmy_list.extend([60, 70]) # my_list becomes [10, 15, 25, 30, 40, 50, 60, 70]" + "objectID": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#getting-started-with-jupyter-notebooks-in-jupyterlab", + "href": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#getting-started-with-jupyter-notebooks-in-jupyterlab", + "title": "Interactive Session 1B", + "section": "Getting Started with Jupyter Notebooks in JupyterLab", + "text": "Getting Started with Jupyter Notebooks in JupyterLab\n\nCreating a New Notebook\n\nOpen JupyterLab: Once JupyterLab is running, you’ll see the JupyterLab interface with the file browser on the left.\nCreate a New Notebook:\n\nClick on the + button in the file browser to open a new Launcher tab.\nUnder the “Notebook” section, click “Python 3” to create a new Python notebook.\n\nRename the Notebook:\n\nClick on the notebook title (usually “Untitled”) at the top of the notebook interface.\nEnter a new name for your notebook and click “Rename”.\n\n\n\n\nUnderstanding the Notebook Interface\nThe Jupyter Notebook interface is divided into cells. There are two main types of cells:\n\nCode Cells: For writing and executing Python code.\nMarkdown Cells: For writing formatted text using Markdown syntax.\n\n\n\nWriting and Running Code\nLet’s start by writing some simple Python code in a code cell.\n\nAdd a Code Cell:\n\nClick inside the cell and start typing your Python code.\n\n\n\n# Simple Python code\nprint(\"Hello, Jupyter!\")\n\nRun the Code Cell:\n\nClick the “Run” button in the toolbar or press Shift + Enter to execute the code.\nThe output will be displayed directly below the cell.\n\n\n\n\nWriting Markdown\nMarkdown cells allow you to write formatted text. You can use Markdown to create headings, lists, links, and more.\n\nAdd a Markdown Cell:\n\nClick on the “+” button in the toolbar to add a new cell.\nChange the cell type to “Markdown” from the dropdown menu in the toolbar.\n\nWrite Markdown Text:\n\n# My First Markdown Cell\n\nThis is a simple example of a Markdown cell in JupyterLab.\n\n## Features of Markdown\n\n- **Bold Text**: Use `**text**` for **bold**.\n- **Italic Text**: Use `*text*` for *italic*.\n- **Lists**: Create bullet points using `-` or `*`.\n- **Links**: [JupyterLab Documentation](https://jupyterlab.readthedocs.io/)\n\nRender the Markdown:\n\nClick the “Run” button or press Shift + Enter to render the Markdown text.\n\n\n\n\nCombining Code and Markdown\nJupyter Notebooks are powerful because they allow you to combine code and markdown in a single document. This is useful for creating interactive tutorials, reports, and data analyses.\n\n\nRendering Images\nJupyter Notebooks can render images directly in the output cells, which is particularly useful for data visualization.\n\nExample: Displaying an Image\n\n\nCode\nfrom IPython.display import Image, display\n\n# Display an image\nimg_path = 'https://jupyterlab.readthedocs.io/en/stable/_images/interface-jupyterlab.png'\ndisplay(Image(url=img_path, width=700))\n\n\n\n\n\n\n\n\nInteractive Features\nJupyter Notebooks support interactive features, such as widgets, which enhance the interactivity of your notebooks.\n\nExample: Using Interactive Widgets\nWidgets allow users to interact with your code and visualize results dynamically.\n\n\nCode\nimport ipywidgets as widgets\n\n# Create a simple slider widget\nslider = widgets.IntSlider(value=50, min=0, max=100, step=1, description='Value:')\ndisplay(slider)\n\n\n\n\n\n\n\n\nSaving and Exporting Notebooks\n\nSave the Notebook:\n\nClick the save icon in the toolbar or press Ctrl + S (Cmd + S on macOS) to save your work.\n\nExport the Notebook:\n\nJupyterLab allows you to export notebooks to various formats, such as PDF or HTML. Go to File > Export Notebook As and choose your desired format." }, { - "objectID": "course-materials/cheatsheets/lists.html#removing-elements", - "href": "course-materials/cheatsheets/lists.html#removing-elements", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "my_list.remove(25) # my_list becomes [10, 15, 30, 40, 50, 60, 70]\n\n\n\ndel my_list[0] # my_list becomes [15, 30, 40, 50, 60, 70]\n\n\n\nlast_element = my_list.pop() # my_list becomes [15, 30, 40, 50, 60]\n\n\n\nelement = my_list.pop(2) # my_list becomes [15, 30, 50, 60]" + "objectID": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#tips-for-using-jupyterlab", + "href": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#tips-for-using-jupyterlab", + "title": "Interactive Session 1B", + "section": "Tips for Using JupyterLab", + "text": "Tips for Using JupyterLab\n\nKeyboard Shortcuts: Familiarize yourself with keyboard shortcuts to speed up your workflow. You can view shortcuts by clicking Help > Keyboard Shortcuts. You can also refer to our class Jupyter Keyboard Shortcuts Cheatsheet\nUsing the File Browser: Drag and drop files into the file browser to upload them to your workspace.\nUsing the Variable Inspector: The variable inspector shows variable names, types, values/shapes, and counts (for collections). Open the Variable Inspector using Menu: View > Activate Command Palette, then type “variable inspector.” Or use the keyboard shortcut: Ctrl + Shift + I (Windows/Linux) or Cmd + Shift + I (Mac)" }, { - "objectID": "course-materials/cheatsheets/lists.html#list-operations", - "href": "course-materials/cheatsheets/lists.html#list-operations", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "length = len(my_list) # Output: 4\n\n\n\nis_in_list = 30 in my_list # Output: True\n\n\n\ncombined_list = my_list + [80, 90] # Output: [15, 30, 50, 60, 80, 90]\n\n\n\nrepeated_list = [1, 2, 3] * 3 # Output: [1, 2, 3, 1, 2, 3, 1, 2, 3]" + "objectID": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#conclusion", + "href": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#conclusion", + "title": "Interactive Session 1B", + "section": "Conclusion", + "text": "Conclusion\nJupyterLab is a versatile tool that makes it easy to combine code, text, and visualizations in a single document. By mastering the basic functionality of Jupyter Notebooks, you can create powerful and interactive documents that enhance your data analysis and scientific computing tasks.\nFeel free to experiment with the code and markdown examples provided in this guide to deepen your understanding of JupyterLab. Happy coding!" }, { - "objectID": "course-materials/cheatsheets/lists.html#looping-through-lists", - "href": "course-materials/cheatsheets/lists.html#looping-through-lists", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "for item in my_list:\n print(item)\n\n\n\nfor index, value in enumerate(my_list):\n print(f\"Index {index} has value {value}\")" + "objectID": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#resources", + "href": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#resources", + "title": "Interactive Session 1B", + "section": "Resources", + "text": "Resources\nWe will get to know Jupyter Notebooks very well during the rest of this course, but here are even more resources you can use to learn and revisit:\n\nJupyter Notebook Gallery\nThere are many, many examples of textbooks, academic articles, journalism, analyses, and reports written in Jupyter Notebooks. Here is a link to a curated gallery containing many such examples. It’s worth exploring some of these just to get a sense of the diversity of applications and opportunities available using python and jupyter in data science!\n\n\nTutorials and Shortcourses\n\n1. Jupyter Notebook Documentation\n\nWebsite: Jupyter Documentation\nDescription: The official documentation for Jupyter Notebooks provides a comprehensive guide to installing, using, and customizing notebooks. It includes tutorials, tips, and examples to help you get started.\n\n\n\n2. Project Jupyter: Beginner’s Guide\n\nWebsite: Project Jupyter\nDescription: This page offers an interactive “Try Jupyter” experience, allowing you to run Jupyter Notebooks in the browser without installing anything locally. It is a great way to explore the basics of Jupyter in a hands-on manner.\n\n\n\n3. YouTube Tutorial Series by Corey Schafer\n\nVideo Playlist: Jupyter Notebooks Tutorial - Corey Schafer\nDescription: This YouTube series provides an in-depth introduction to Jupyter Notebooks. Corey Schafer covers installation, basic usage, and advanced features, making it easy to follow along and practice on your own.\n\n\n\n4. Jupyter Notebooks Beginner Guide - DataCamp\n\nWebsite: DataCamp Jupyter Notebook Tutorial\nDescription: This tutorial on DataCamp’s community blog offers a step-by-step guide to using Jupyter Notebooks for data science. It covers the basics and explores more advanced topics such as widgets and extensions.\n\n\n\n5. Real Python: Jupyter Notebook 101\n\nArticle: Jupyter Notebook 101\nDescription: This Real Python article introduces Jupyter Notebooks, covering installation, basic usage, and tips for using notebooks effectively. It is an excellent resource for Python developers who are new to Jupyter.\n\n\n\n6. Google Colab\n\nWebsite: Google Colab\nDescription: Google Colab is a free platform that lets you run Jupyter Notebooks in the cloud. You can find many tutorials and example notebooks on their site. For example, here is a link to a notebook they’ve created that includes many pandas snippets.\n\n\nEnd interactive session 1B" }, { - "objectID": "course-materials/cheatsheets/lists.html#list-comprehensions", - "href": "course-materials/cheatsheets/lists.html#list-comprehensions", - "title": "EDS 217 Cheatsheet", + "objectID": "course-materials/final_project.html", + "href": "course-materials/final_project.html", + "title": "Final Activity", "section": "", - "text": "squares = [x**2 for x in range(5)] # Output: [0, 1, 4, 9, 16]\n\n\n\nevens = [x for x in range(10) if x % 2 == 0] # Output: [0, 2, 4, 6, 8]" + "text": "In this final class activity, you will work in small groups (2-3) to develop a example data science workflow.\n\nImport Data\nExplore Data\nClean Data\nFilter Data\nSort Data\nTransform Data\nGroup Data\nAggregate Data\nVisualize Data" }, { - "objectID": "course-materials/cheatsheets/lists.html#list-methods", - "href": "course-materials/cheatsheets/lists.html#list-methods", - "title": "EDS 217 Cheatsheet", + "objectID": "course-materials/final_project.html#diy-python-data-scienceworkflow", + "href": "course-materials/final_project.html#diy-python-data-scienceworkflow", + "title": "Final Activity", "section": "", - "text": "my_list.sort() # Sorts in place\n\n\n\nsorted_list = sorted(my_list) # Returns a sorted copy\n\n\n\nmy_list.reverse() # Reverses in place\n\n\n\ncount = my_list.count(30) # Output: 1\n\n\n\nindex = my_list.index(50) # Output: 2" + "text": "In this final class activity, you will work in small groups (2-3) to develop a example data science workflow.\n\nImport Data\nExplore Data\nClean Data\nFilter Data\nSort Data\nTransform Data\nGroup Data\nAggregate Data\nVisualize Data" }, { - "objectID": "course-materials/cheatsheets/lists.html#common-list-pitfalls", - "href": "course-materials/cheatsheets/lists.html#common-list-pitfalls", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "# Incorrect\nfor item in my_list:\n if item < 20:\n my_list.remove(item)\n\n# Correct (Using a copy)\nfor item in my_list[:]:\n if item < 20:\n my_list.remove(item)" + "objectID": "course-materials/final_project.html#what-to-do", + "href": "course-materials/final_project.html#what-to-do", + "title": "Final Activity", + "section": "What to do", + "text": "What to do\nTo conduct this exercise, you should find a suitable dataset; it doesn’t need to be environmental data per se - be creative in your search! You should also focus on making a number of exploratory and analysis visualizations using seaborn. You should avoid planning any analysis that absolutely require mapping and focus on using only pandas, numpy, matplotlib, and seaborn libraries.\nYour final product will be a self-contained notebook that is well-documented with markdown and code comments that you will walk through as a presentation to the rest of the class on the final day.\nYour notebook should include each of the nine steps, even if you don’t need to do much in each of them.\n\n\n\n\n\n\nNote\n\n\n\nYou can include visualizations as part of your data exploration (step 2), or anywhere else it is helpful.\n\n\nAdditional figures and graphics are also welcome - you are encouraged to make your notebooks as engaging and visually interesting as possible." }, { - "objectID": "course-materials/cheatsheets/data_merging.html", - "href": "course-materials/cheatsheets/data_merging.html", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "To be added" + "objectID": "course-materials/final_project.html#syncing-your-data-to-github", + "href": "course-materials/final_project.html#syncing-your-data-to-github", + "title": "Final Activity", + "section": "Syncing your data to Github", + "text": "Syncing your data to Github\nHere are some directions for syncing your classwork with GitHub\n\nGeneral places to find fun data\nHere are some links to potential data resources that you can use to develop your analyses:\n\nKaggle\nData is Plural\nUS Data.gov\nZenodo\nR for Data Science\n\n\n\nOddly specific datasets\n\nCentral Park Squirrel Survey\nHarry Potter Characters Dataset\nSpotify Tracks\nLego Dataset" }, { - "objectID": "course-materials/cheatsheets/data_aggregation.html", - "href": "course-materials/cheatsheets/data_aggregation.html", - "title": "EDS 217 Cheatsheet", - "section": "", - "text": "To be added" + "objectID": "course-materials/final_project.html#using-google-drive-to-store-your-.csv-file.", + "href": "course-materials/final_project.html#using-google-drive-to-store-your-.csv-file.", + "title": "Final Activity", + "section": "Using Google Drive to store your .csv file.", + "text": "Using Google Drive to store your .csv file.\nOnce you’ve found a .csv file that you want to use, you should:\n\nSave your file to a google drive folder in your UCSB account.\nChange the sharing settings to allow anyone with a link to view your file.\nOpen the sharing dialog and copy the sharing link to your clipboard.\nUse the code below to download your file (you will need to add this code to the top of your notebook in the Import Data section)\n\n\n\n\n\n\n\nWarning\n\n\n\nFor this code to work on the workbench server, you will need to switch your kernel from 3.10.0 to 3.7.13. You can switch kernels by clicking on the kernel name in the upper right of your notebook.\n\n\n\n\nCode\nimport pandas as pd\nimport requests\n\ndef extract_file_id(url):\n \"\"\"Extract file id from Google Drive Sharing URL.\"\"\"\n return url.split(\"/\")[-2]\n\ndef df_from_gdrive_csv(url):\n \"\"\" Get the CSV file from a Google Drive Sharing URL.\"\"\"\n file_id = extract_file_id(url)\n URL = \"https://docs.google.com/uc?export=download\"\n session = requests.Session()\n response = session.get(URL, params={\"id\": file_id}, stream=True)\n return pd.read_csv(response.raw)\n\n# Example of how to use:\n# Note: your sharing link will be different, but should look like this:\nsharing_url = \"https://drive.google.com/file/d/1RlilHNG7BtvXT2Pm4OpgNvEjVJJZNaps/view?usp=share_link\"\ndf = df_from_gdrive_csv(sharing_url)\ndf.head()\n\n\n\n\n\n\n\n\n\ndate\nlocation\ntemperature\nsalinity\ndepth\n\n\n\n\n0\n2020-01-01\nPacific\n21.523585\nNaN\n200\n\n\n1\n2020-01-02\nPacific\n14.800079\n34.467264\n100\n\n\n2\n2020-01-03\nPacific\n23.752256\n35.016505\n100\n\n\n3\n2020-01-04\nPacific\n24.702824\n36.416944\n200\n\n\n4\n2020-01-05\nPacific\n10.244824\n35.807487\n1000" + }, + { + "objectID": "index.html#course-description", + "href": "index.html#course-description", + "title": "Python for Environmental Data Science", + "section": "Course Description", + "text": "Course Description\nProgramming skills are critical when working with, understanding, analyzing, and gleaning insights from environmental data. In the intensive EDS 217 course, students will develop fundamental skills in Python programming, data manipulation, and data visualization, specifically tailored for environmental data science applications.\nThe goal of EDS 217 (Python for Environmental Data Science) is to equip incoming MEDS students with the programming methods, skills, notation, and language commonly used in the python data science stack, which will be essential for their python-based data science courses and projects in the program as well as in their data science careers. By the end of the course, students should be able to:\n\nManipulate and analyze data using libraries like pandas and NumPy\nVisualize data using Matplotlib and Seaborn\nWrite, interpret, and debug Python scripts\nImplement basic algorithms for data processing\nUtilize logical operations, control flow, and functions in programming\nCollaborate with peers to solve group programming tasks, and communicate the process and results to the rest of the class" }, { - "objectID": "course-materials/lectures/lectures.html", - "href": "course-materials/lectures/lectures.html", - "title": "EDS 217 Lectures", - "section": "", - "text": "This page contains links to lecture materials for EDS 217.\n\nIntroduction to Python Data Science\n\n\nThe Zen of Python\n\n\nDebugging" + "objectID": "index.html#syncing-your-classwork-to-github", + "href": "index.html#syncing-your-classwork-to-github", + "title": "Python for Environmental Data Science", + "section": "Syncing your classwork to Github", + "text": "Syncing your classwork to Github\nHere are some directions for syncing your classwork with a GitHub repository" }, { - "objectID": "course-materials/lectures/04-next_steps.html", - "href": "course-materials/lectures/04-next_steps.html", - "title": "How to move from a beginner to a more advanced python user", - "section": "", - "text": "Taken from Talk Python to Me, Episode #427, with some modifications." + "objectID": "index.html#teaching-team", + "href": "index.html#teaching-team", + "title": "Python for Environmental Data Science", + "section": "Teaching Team", + "text": "Teaching Team\n\n\n\n\nInstructor\n\n\n\n\n\n\n\nKelly Caylor\nEmail: caylor@ucsb.edu\nLearn more: Bren profile\n\n\n\n\nTA\n\n\n\n\n\n\n\nAnna Boser\nEmail: anaboser@ucsb.edu\nLearn more: Bren profile" }, { - "objectID": "course-materials/lectures/04-next_steps.html#know-your-goals", - "href": "course-materials/lectures/04-next_steps.html#know-your-goals", - "title": "How to move from a beginner to a more advanced python user", - "section": "1. Know your goals", - "text": "1. Know your goals\n\nWhy are you learning python?\nWhy are you learning data science?" + "objectID": "course-materials/live-coding/4b_exploring_dataframes.html#overview", + "href": "course-materials/live-coding/4b_exploring_dataframes.html#overview", + "title": "Live Coding Session 4B", + "section": "Overview", + "text": "Overview\nIn this 45-minute session, we will explore the basics of pandas DataFrames - a fundamental data structure for data manipulation and analysis in Python. We’ll focus on essential operations that form the foundation of working with DataFrames." }, { - "objectID": "course-materials/lectures/04-next_steps.html#have-a-project-in-mind", - "href": "course-materials/lectures/04-next_steps.html#have-a-project-in-mind", - "title": "How to move from a beginner to a more advanced python user", - "section": "2. Have a project in mind", - "text": "2. Have a project in mind\n\nWhat do you want to do with it?\nUse python to solve a problem you are interested in solving.\nDon’t be afraid to work on personal projects.\n\n\nSome examples of my personal “problem-solving” projects\nBiobib - Python code to make my CV/Biobib from a google sheets/.csv file.\nTriumph - Python notebooks for a 1959 Triumph TR3A EV conversion project.\nStoplight - A simple python webapp for monitoring EDS217 course pace." + "objectID": "course-materials/live-coding/4b_exploring_dataframes.html#objectives", + "href": "course-materials/live-coding/4b_exploring_dataframes.html#objectives", + "title": "Live Coding Session 4B", + "section": "Objectives", + "text": "Objectives\n\nUnderstand the structure and basic properties of pandas DataFrames.\nLearn how to create and load DataFrames.\nApply methods for data selection and filtering.\nPerform basic data manipulation and analysis using DataFrames." }, { - "objectID": "course-materials/lectures/04-next_steps.html#dont-limit-your-learning-to-whats-needed-for-your-project", - "href": "course-materials/lectures/04-next_steps.html#dont-limit-your-learning-to-whats-needed-for-your-project", - "title": "How to move from a beginner to a more advanced python user", - "section": "3. Don’t limit your learning to what’s needed for your project", - "text": "3. Don’t limit your learning to what’s needed for your project\n\nLearn more than you need to know…\nMath: 3Blue1Brown\nPython Data Science: PyData\nData Visualization: Edward Tufte, Cole Nussbaumer-Knaflic, David McCandless\nBe curious about what’s possible, not just what’s necessary.\n…but try to use less than you think you need" + "objectID": "course-materials/live-coding/4b_exploring_dataframes.html#getting-started-5-minutes", + "href": "course-materials/live-coding/4b_exploring_dataframes.html#getting-started-5-minutes", + "title": "Live Coding Session 4B", + "section": "Getting Started (5 minutes)", + "text": "Getting Started (5 minutes)\n\nPrepare Your Environment:\n\nOpen JupyterLab and create a new notebook named “pandas_dataframes_intro”.\nDownload the sample dataset from here.\n\nParticipation:\n\nCode along with me during the session.\nAsk questions as we go - if you’re wondering about something, others probably are too!" }, { - "objectID": "course-materials/lectures/04-next_steps.html#read-good-code", - "href": "course-materials/lectures/04-next_steps.html#read-good-code", - "title": "How to move from a beginner to a more advanced python user", - "section": "4. Read good code", - "text": "4. Read good code\n\nLibraries and packages have great examples of code!\nRead the code (not just docs) of the packages you use.\n\nIt’s okay if you can’t understand it all. Often you can understand intent, but not what the code does. How would you have done it? Why did the author select a different approach?\n\nGithub is a great place to find code." + "objectID": "course-materials/live-coding/4b_exploring_dataframes.html#session-outline", + "href": "course-materials/live-coding/4b_exploring_dataframes.html#session-outline", + "title": "Live Coding Session 4B", + "section": "Session Outline", + "text": "Session Outline\n\n1. Introduction to pandas DataFrames (5 minutes)\n\nWhat are DataFrames?\nImporting pandas and creating a simple DataFrame\n\n\n\n2. Loading and Exploring Data (10 minutes)\n\nReading a CSV file into a DataFrame\nBasic DataFrame attributes and methods (shape, info, describe, head)\n\n\n\n3. Data Selection and Filtering (10 minutes)\n\nSelecting columns and rows\nBoolean indexing\n\n\n\n4. Basic Data Manipulation (10 minutes)\n\nAdding and removing columns\nHandling missing data\n\n\n\n5. Q&A and Wrap-up (5 minutes)\n\nAddress any questions\nRecap key points" }, { - "objectID": "course-materials/lectures/04-next_steps.html#know-your-tools", - "href": "course-materials/lectures/04-next_steps.html#know-your-tools", - "title": "How to move from a beginner to a more advanced python user", - "section": "5. Know your tools", - "text": "5. Know your tools\n\nLearn how to use your IDE (VSCode)\nLearn how to use your package manager (conda, mamba)\nLearn how to use your shell (bash, powershell, WSL)\nLearn how to use your version control system (git, Github Desktop)" + "objectID": "course-materials/live-coding/4b_exploring_dataframes.html#code-examples-well-cover", + "href": "course-materials/live-coding/4b_exploring_dataframes.html#code-examples-well-cover", + "title": "Live Coding Session 4B", + "section": "Code Examples We’ll Cover", + "text": "Code Examples We’ll Cover\nimport pandas as pd\n\n# Creating a DataFrame\ndf = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})\n\n# Loading data from CSV\ndf = pd.read_csv('sample_data.csv')\n\n# Basic exploration\nprint(df.shape)\ndf.info()\nprint(df.describe())\n\n# Selection and filtering\nselected_columns = df[['column1', 'column2']]\nfiltered_rows = df[df['column1'] > 5]\n\n# Data manipulation\ndf['new_column'] = df['column1'] * 2\ndf.dropna(inplace=True)" }, { - "objectID": "course-materials/lectures/04-next_steps.html#learn-how-to-test-your-code", - "href": "course-materials/lectures/04-next_steps.html#learn-how-to-test-your-code", - "title": "How to move from a beginner to a more advanced python user", - "section": "6. Learn how to test your code", - "text": "6. Learn how to test your code\n\nTesting code is part of writing code, and testing is a great way to learn!\nFocus on end-to-end (E2E) tests (rather than unit tests)\n\nUnit tests:\nDoes it work the way you expect it to (operation-centric)?\nEnd-to-end test:\nDoes it do what you want it to do (output-centric)?\n\n\nTesting for data science\nTesting with PyTest for data science" + "objectID": "course-materials/live-coding/4b_exploring_dataframes.html#after-the-session", + "href": "course-materials/live-coding/4b_exploring_dataframes.html#after-the-session", + "title": "Live Coding Session 4B", + "section": "After the Session", + "text": "After the Session\n\nReview your notes and try to replicate the exercises on your own.\nExperiment with the code using your own datasets.\nCheck out our class DataFrame cheatsheet for quick reference.\nFor more advanced features, explore the official pandas documentation." }, { - "objectID": "course-materials/lectures/04-next_steps.html#know-whats-good-enough-for-any-given-project", - "href": "course-materials/lectures/04-next_steps.html#know-whats-good-enough-for-any-given-project", - "title": "How to move from a beginner to a more advanced python user", - "section": "7. Know what’s good enough for any given project", - "text": "7. Know what’s good enough for any given project\n\nYou’re not writing code for a self-driving car or a pacemaker.\n\nDon’t over-engineer your code.\nDon’t over-optimize your code.\nSimple is better than complex." + "objectID": "course-materials/live-coding/2d_list_comprehensions.html", + "href": "course-materials/live-coding/2d_list_comprehensions.html", + "title": "Live Coding Session 2D", + "section": "", + "text": "In this session, we will be exploring List and Dictionary comprehensions together. Live coding is a great way to learn programming as it allows you to see the process of writing code in real-time, including how to deal with unexpected issues and debug errors." }, { - "objectID": "course-materials/lectures/04-next_steps.html#embrace-refactoring", - "href": "course-materials/lectures/04-next_steps.html#embrace-refactoring", - "title": "How to move from a beginner to a more advanced python user", - "section": "8. Embrace refactoring", - "text": "8. Embrace refactoring\nRefactoring is the process of changing your code without changing its behavior.\n\nShip of Theseus: If you replace every part of a ship, is it still the same ship?\n\n\nAs you learn more, you will find better ways to do things.\nDon’t be afraid to change your code.\nTests (especially end-to-end tests) help you refactor with confidence.\n“Code smells”… if it smells bad, it probably is bad.\n\nCode Smells\nComments can be a code smell; they can be a sign that your code is not clear enough." + "objectID": "course-materials/live-coding/2d_list_comprehensions.html#overview", + "href": "course-materials/live-coding/2d_list_comprehensions.html#overview", + "title": "Live Coding Session 2D", + "section": "", + "text": "In this session, we will be exploring List and Dictionary comprehensions together. Live coding is a great way to learn programming as it allows you to see the process of writing code in real-time, including how to deal with unexpected issues and debug errors." }, { - "objectID": "course-materials/lectures/04-next_steps.html#write-things-down", - "href": "course-materials/lectures/04-next_steps.html#write-things-down", - "title": "How to move from a beginner to a more advanced python user", - "section": "9. Write things down", - "text": "9. Write things down\n\nKeep an ideas notebook\n\nWrite down ideas for projects\nWrite down ideas for code\n\n\n\nWrite comments to yourself and others\n\n\nWrite documentation\n\nCode Documentation in Python\n\n\n\nWrite down questions (use your tools; github issues, etc…)" + "objectID": "course-materials/live-coding/2d_list_comprehensions.html#objectives", + "href": "course-materials/live-coding/2d_list_comprehensions.html#objectives", + "title": "Live Coding Session 2D", + "section": "Objectives", + "text": "Objectives\n\nUnderstand the fundamentals of comprehensions in Python.\nApply comprehensions in practical examples.\nDevelop the ability to troubleshoot and debug in a live setting." }, { - "objectID": "course-materials/lectures/04-next_steps.html#go-meet-people", - "href": "course-materials/lectures/04-next_steps.html#go-meet-people", - "title": "How to move from a beginner to a more advanced python user", - "section": "10. Go meet people!", - "text": "10. Go meet people!\n\nThe Python (and Data Science) community is great!\n\nGo to Python & Data Science meetups.\n\nCentral Coast Python\n\n\n\nGo to python and data science conferences.\n\nPyCon 2024 & 2025 will be in Pittsburgh, PA\nPyData (Conferences all over the world)\n\n\n\nGo to hackathons.\n\nSB Hacks (UCSB)\nMLH (Major League Hacking)\nHackathon.com (Hackathons all over the world)" + "objectID": "course-materials/live-coding/2d_list_comprehensions.html#overview-1", + "href": "course-materials/live-coding/2d_list_comprehensions.html#overview-1", + "title": "Live Coding Session 2D", + "section": "Overview", + "text": "Overview\nThis session introduces list and dictionary comprehensions, providing a comparison to traditional control flow methods. The goal is to help students understand the advantages of using comprehensions in Python and to practice writing their own.\nThe session is designed to be completed in 45 minutes, including setting up the notebook." }, { - "objectID": "course-materials/lectures/02_helpGPT.html", - "href": "course-materials/lectures/02_helpGPT.html", - "title": "Getting Help", - "section": "", - "text": "When you get an error, or an unexpected result, or you are not sure what to do…\n\n\n\nFinding help inside Python\nFinding help outside Python\n\n\n\n\nHow do we interrogate the data (and other objects) we encounter while coding?\n\nmy_var = 'some_unknown_thing'\nWhat is it?\n\nmy_var = 'some_unknown_thing'\ntype(my_var)\n\nstr\n\n\nThe type() command tells you what sort of thing an object is.\n\n\n\nHow do we interrogate the data (and other objects) we encounter while coding?\n\nmy_var = 'some_unknown_thing'\nWhat can I do with it?\n\nmy_var = ['my', 'list', 'of', 'things']\nmy_var = my_var + ['a', 'nother', 'list']\ndir(my_var)\n\n['__add__',\n '__class__',\n '__class_getitem__',\n '__contains__',\n '__delattr__',\n '__delitem__',\n '__dir__',\n '__doc__',\n '__eq__',\n '__format__',\n '__ge__',\n '__getattribute__',\n '__getitem__',\n '__gt__',\n '__hash__',\n '__iadd__',\n '__imul__',\n '__init__',\n '__init_subclass__',\n '__iter__',\n '__le__',\n '__len__',\n '__lt__',\n '__mul__',\n '__ne__',\n '__new__',\n '__reduce__',\n '__reduce_ex__',\n '__repr__',\n '__reversed__',\n '__rmul__',\n '__setattr__',\n '__setitem__',\n '__sizeof__',\n '__str__',\n '__subclasshook__',\n 'append',\n 'clear',\n 'copy',\n 'count',\n 'extend',\n 'index',\n 'insert',\n 'pop',\n 'remove',\n 'reverse',\n 'sort']\n\n\nThe dir() command tells you what attributes an object has.\n\n\n\n\n# using the dir command\nmy_list = ['a', 'b', 'c']\nlist(reversed(dir(my_list)))\nmy_list.sort?\n\nSignature: my_list.sort(*, key=None, reverse=False)\nDocstring:\nSort the list in ascending order and return None.\n\nThe sort is in-place (i.e. the list itself is modified) and stable (i.e. the\norder of two equal elements is maintained).\n\nIf a key function is given, apply it once to each list item and sort them,\nascending or descending, according to their function values.\n\nThe reverse flag can be set to sort in descending order.\nType: builtin_function_or_method\n\n\n\n\n\n__attributes__ are internal (or private) attributes associated with all python objects.\nThese are called “magic” or “dunder” methods.\ndunder → “double under” → __\n\n\n\nEverything in Python is an object, and every operation corresponds to a method.\n\n# __add__ and __mul__. __len__. (None). 2 Wrongs.\n\n3 + 4\n\n7\n\n\n\n\n\nGenerally, you will not have to worry about dunder methods.\nHere’s a shortcut function to look at only non-dunder methods\n\n\n\n\n\nYou can use the <tab> key in iPython (or Jupyter environments) to explore object methods. By default, only “public” (non-dunder) methods are returned.\n\n\n\nYou can usually just pause typing and VSCode will provide object introspection:\n\nstring = 'some letters'\n\n\n\n\n\nMost objects - especially packages and libraries - provide help documentation that can be accessed using the python helper function… called… help()\n\n# 3, help, str, soil...\nimport math\nhelp(math)\n\nHelp on module math:\n\nNAME\n math\n\nMODULE REFERENCE\n https://docs.python.org/3.10/library/math.html\n \n The following documentation is automatically generated from the Python\n source files. It may be incomplete, incorrect or include features that\n are considered implementation detail and may vary between Python\n implementations. When in doubt, consult the module reference at the\n location listed above.\n\nDESCRIPTION\n This module provides access to the mathematical functions\n defined by the C standard.\n\nFUNCTIONS\n acos(x, /)\n Return the arc cosine (measured in radians) of x.\n \n The result is between 0 and pi.\n \n acosh(x, /)\n Return the inverse hyperbolic cosine of x.\n \n asin(x, /)\n Return the arc sine (measured in radians) of x.\n \n The result is between -pi/2 and pi/2.\n \n asinh(x, /)\n Return the inverse hyperbolic sine of x.\n \n atan(x, /)\n Return the arc tangent (measured in radians) of x.\n \n The result is between -pi/2 and pi/2.\n \n atan2(y, x, /)\n Return the arc tangent (measured in radians) of y/x.\n \n Unlike atan(y/x), the signs of both x and y are considered.\n \n atanh(x, /)\n Return the inverse hyperbolic tangent of x.\n \n ceil(x, /)\n Return the ceiling of x as an Integral.\n \n This is the smallest integer >= x.\n \n comb(n, k, /)\n Number of ways to choose k items from n items without repetition and without order.\n \n Evaluates to n! / (k! * (n - k)!) when k <= n and evaluates\n to zero when k > n.\n \n Also called the binomial coefficient because it is equivalent\n to the coefficient of k-th term in polynomial expansion of the\n expression (1 + x)**n.\n \n Raises TypeError if either of the arguments are not integers.\n Raises ValueError if either of the arguments are negative.\n \n copysign(x, y, /)\n Return a float with the magnitude (absolute value) of x but the sign of y.\n \n On platforms that support signed zeros, copysign(1.0, -0.0)\n returns -1.0.\n \n cos(x, /)\n Return the cosine of x (measured in radians).\n \n cosh(x, /)\n Return the hyperbolic cosine of x.\n \n degrees(x, /)\n Convert angle x from radians to degrees.\n \n dist(p, q, /)\n Return the Euclidean distance between two points p and q.\n \n The points should be specified as sequences (or iterables) of\n coordinates. Both inputs must have the same dimension.\n \n Roughly equivalent to:\n sqrt(sum((px - qx) ** 2.0 for px, qx in zip(p, q)))\n \n erf(x, /)\n Error function at x.\n \n erfc(x, /)\n Complementary error function at x.\n \n exp(x, /)\n Return e raised to the power of x.\n \n expm1(x, /)\n Return exp(x)-1.\n \n This function avoids the loss of precision involved in the direct evaluation of exp(x)-1 for small x.\n \n fabs(x, /)\n Return the absolute value of the float x.\n \n factorial(x, /)\n Find x!.\n \n Raise a ValueError if x is negative or non-integral.\n \n floor(x, /)\n Return the floor of x as an Integral.\n \n This is the largest integer <= x.\n \n fmod(x, y, /)\n Return fmod(x, y), according to platform C.\n \n x % y may differ.\n \n frexp(x, /)\n Return the mantissa and exponent of x, as pair (m, e).\n \n m is a float and e is an int, such that x = m * 2.**e.\n If x is 0, m and e are both 0. Else 0.5 <= abs(m) < 1.0.\n \n fsum(seq, /)\n Return an accurate floating point sum of values in the iterable seq.\n \n Assumes IEEE-754 floating point arithmetic.\n \n gamma(x, /)\n Gamma function at x.\n \n gcd(*integers)\n Greatest Common Divisor.\n \n hypot(...)\n hypot(*coordinates) -> value\n \n Multidimensional Euclidean distance from the origin to a point.\n \n Roughly equivalent to:\n sqrt(sum(x**2 for x in coordinates))\n \n For a two dimensional point (x, y), gives the hypotenuse\n using the Pythagorean theorem: sqrt(x*x + y*y).\n \n For example, the hypotenuse of a 3/4/5 right triangle is:\n \n >>> hypot(3.0, 4.0)\n 5.0\n \n isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0)\n Determine whether two floating point numbers are close in value.\n \n rel_tol\n maximum difference for being considered \"close\", relative to the\n magnitude of the input values\n abs_tol\n maximum difference for being considered \"close\", regardless of the\n magnitude of the input values\n \n Return True if a is close in value to b, and False otherwise.\n \n For the values to be considered close, the difference between them\n must be smaller than at least one of the tolerances.\n \n -inf, inf and NaN behave similarly to the IEEE 754 Standard. That\n is, NaN is not close to anything, even itself. inf and -inf are\n only close to themselves.\n \n isfinite(x, /)\n Return True if x is neither an infinity nor a NaN, and False otherwise.\n \n isinf(x, /)\n Return True if x is a positive or negative infinity, and False otherwise.\n \n isnan(x, /)\n Return True if x is a NaN (not a number), and False otherwise.\n \n isqrt(n, /)\n Return the integer part of the square root of the input.\n \n lcm(*integers)\n Least Common Multiple.\n \n ldexp(x, i, /)\n Return x * (2**i).\n \n This is essentially the inverse of frexp().\n \n lgamma(x, /)\n Natural logarithm of absolute value of Gamma function at x.\n \n log(...)\n log(x, [base=math.e])\n Return the logarithm of x to the given base.\n \n If the base not specified, returns the natural logarithm (base e) of x.\n \n log10(x, /)\n Return the base 10 logarithm of x.\n \n log1p(x, /)\n Return the natural logarithm of 1+x (base e).\n \n The result is computed in a way which is accurate for x near zero.\n \n log2(x, /)\n Return the base 2 logarithm of x.\n \n modf(x, /)\n Return the fractional and integer parts of x.\n \n Both results carry the sign of x and are floats.\n \n nextafter(x, y, /)\n Return the next floating-point value after x towards y.\n \n perm(n, k=None, /)\n Number of ways to choose k items from n items without repetition and with order.\n \n Evaluates to n! / (n - k)! when k <= n and evaluates\n to zero when k > n.\n \n If k is not specified or is None, then k defaults to n\n and the function returns n!.\n \n Raises TypeError if either of the arguments are not integers.\n Raises ValueError if either of the arguments are negative.\n \n pow(x, y, /)\n Return x**y (x to the power of y).\n \n prod(iterable, /, *, start=1)\n Calculate the product of all the elements in the input iterable.\n \n The default start value for the product is 1.\n \n When the iterable is empty, return the start value. This function is\n intended specifically for use with numeric values and may reject\n non-numeric types.\n \n radians(x, /)\n Convert angle x from degrees to radians.\n \n remainder(x, y, /)\n Difference between x and the closest integer multiple of y.\n \n Return x - n*y where n*y is the closest integer multiple of y.\n In the case where x is exactly halfway between two multiples of\n y, the nearest even value of n is used. The result is always exact.\n \n sin(x, /)\n Return the sine of x (measured in radians).\n \n sinh(x, /)\n Return the hyperbolic sine of x.\n \n sqrt(x, /)\n Return the square root of x.\n \n tan(x, /)\n Return the tangent of x (measured in radians).\n \n tanh(x, /)\n Return the hyperbolic tangent of x.\n \n trunc(x, /)\n Truncates the Real x to the nearest Integral toward 0.\n \n Uses the __trunc__ magic method.\n \n ulp(x, /)\n Return the value of the least significant bit of the float x.\n\nDATA\n e = 2.718281828459045\n inf = inf\n nan = nan\n pi = 3.141592653589793\n tau = 6.283185307179586\n\nFILE\n /Users/kellycaylor/mambaforge/envs/eds217_2023/lib/python3.10/lib-dynload/math.cpython-310-darwin.so\n\n\n\n\n\n\n\nIn the iPython shell (or the Jupyter Notebook/Jupyter Lab environment), you can also access the help() command using ?.\n\nmath\n\n<module 'math' from '/Users/kellycaylor/mambaforge/envs/eds217_2023/lib/python3.10/lib-dynload/math.cpython-310-darwin.so'>\n\n\n\n\n\nIn the iPython shell (or the Jupyter Notebook/Jupyter Lab environment) you can use ?? to see the actual source code of python code\n\n\n\n?? only shows source code for for python functions that aren’t compiled to C code. Otherwise, it will show the same information as ?\n\n\n\n\n\n\n\n\n\n\nThe print command is the most commonly used tool for beginners to understand errors\n# This code generates a `TypeError` that \n# x is not the right kind of variable.\ndo_something(x) \nThe print command is the most commonly used debugging tool for beginners.\n\n\n\nPython has a string format called f-strings. These are strings that are prefixed with an f character and allow in-line variable substitution.\n\n# print using c-style format statements\nx = 3.45\nprint(f\"x = {x}\")\n\nx = 3.45\n\n\n\ndef do_something(x):\n x = x / 2 \n return x\n\n# This code generates a `TypeError` that \n# x is not the right kind of variable.\nx = 'f'\n# Check and see what is X?\nprint(\n f\"calling do_something() with x={x}\" # Python f-string\n)\n\ndo_something(x) \n\ncalling do_something() with x=f\n\n\n\n---------------------------------------------------------------------------\nTypeError Traceback (most recent call last)\nCell In[13], line 13\n 8 # Check and see what is X?\n 9 print(\n 10 f\"calling do_something() with x={x}\" # Python f-string\n 11 )\n---> 13 do_something(x)\n\nCell In[13], line 2, in do_something(x)\n 1 def do_something(x):\n----> 2 x = x / 2 \n 3 return x\n\nTypeError: unsupported operand type(s) for /: 'str' and 'int'\n\n\n\n\n\n\nAs of Fall 2002: - O’Rielly Books (Requires UCSB login) - My O’Rielly pdf library: https://bit.ly/eds-217-books (Requires UCSB login)\nAs of Fall, 2022: - Python Docs - Stack Overflow - Talk Python - Ask Python\nAs of Fall, 2023:\nLLMs.\n\nChatGPT - Need $ for GPT-4, 3.X fine debugger, but not always a great programmer.\nGitHub CoPilot - Should be able to get a free student account. Works great in VSCode; we will set this up together later in the course." + "objectID": "course-materials/live-coding/2d_list_comprehensions.html#setting-up-your-notebook-5-minutes", + "href": "course-materials/live-coding/2d_list_comprehensions.html#setting-up-your-notebook-5-minutes", + "title": "Live Coding Session 2D", + "section": "1. Setting Up Your Notebook (5 minutes)", + "text": "1. Setting Up Your Notebook (5 minutes)\nGoal: Start by having students set up their Jupyter notebook with markdown headers. This helps organize the session into distinct sections, making it easier for them to follow along and refer back to their work later.\n\nInstructions:\n\nCreate a new Jupyter notebook or open an existing one for this session.\nAdd markdown cells with the following headers, using ## for each header.\nPlace code cells between the headers where you’ll write and execute your code.\n\n\n\nHeader Texts:\n\nFirst markdown cell:\n## Review: Traditional Control Flow Approaches\nSecond markdown cell:\n## Introduction to List Comprehensions\nThird markdown cell:\n## Introduction to Dictionary Comprehensions\nFourth markdown cell:\n## Using Conditional Logic in Comprehensions\nFifth markdown cell:\n## Summary and Best Practices\nSixth markdown cell:\n## Reflections" }, { - "objectID": "course-materials/lectures/02_helpGPT.html#finding-help", - "href": "course-materials/lectures/02_helpGPT.html#finding-help", - "title": "Getting Help", - "section": "", - "text": "When you get an error, or an unexpected result, or you are not sure what to do…\n\n\n\nFinding help inside Python\nFinding help outside Python\n\n\n\n\nHow do we interrogate the data (and other objects) we encounter while coding?\n\nmy_var = 'some_unknown_thing'\nWhat is it?\n\nmy_var = 'some_unknown_thing'\ntype(my_var)\n\nstr\n\n\nThe type() command tells you what sort of thing an object is.\n\n\n\nHow do we interrogate the data (and other objects) we encounter while coding?\n\nmy_var = 'some_unknown_thing'\nWhat can I do with it?\n\nmy_var = ['my', 'list', 'of', 'things']\nmy_var = my_var + ['a', 'nother', 'list']\ndir(my_var)\n\n['__add__',\n '__class__',\n '__class_getitem__',\n '__contains__',\n '__delattr__',\n '__delitem__',\n '__dir__',\n '__doc__',\n '__eq__',\n '__format__',\n '__ge__',\n '__getattribute__',\n '__getitem__',\n '__gt__',\n '__hash__',\n '__iadd__',\n '__imul__',\n '__init__',\n '__init_subclass__',\n '__iter__',\n '__le__',\n '__len__',\n '__lt__',\n '__mul__',\n '__ne__',\n '__new__',\n '__reduce__',\n '__reduce_ex__',\n '__repr__',\n '__reversed__',\n '__rmul__',\n '__setattr__',\n '__setitem__',\n '__sizeof__',\n '__str__',\n '__subclasshook__',\n 'append',\n 'clear',\n 'copy',\n 'count',\n 'extend',\n 'index',\n 'insert',\n 'pop',\n 'remove',\n 'reverse',\n 'sort']\n\n\nThe dir() command tells you what attributes an object has.\n\n\n\n\n# using the dir command\nmy_list = ['a', 'b', 'c']\nlist(reversed(dir(my_list)))\nmy_list.sort?\n\nSignature: my_list.sort(*, key=None, reverse=False)\nDocstring:\nSort the list in ascending order and return None.\n\nThe sort is in-place (i.e. the list itself is modified) and stable (i.e. the\norder of two equal elements is maintained).\n\nIf a key function is given, apply it once to each list item and sort them,\nascending or descending, according to their function values.\n\nThe reverse flag can be set to sort in descending order.\nType: builtin_function_or_method\n\n\n\n\n\n__attributes__ are internal (or private) attributes associated with all python objects.\nThese are called “magic” or “dunder” methods.\ndunder → “double under” → __\n\n\n\nEverything in Python is an object, and every operation corresponds to a method.\n\n# __add__ and __mul__. __len__. (None). 2 Wrongs.\n\n3 + 4\n\n7\n\n\n\n\n\nGenerally, you will not have to worry about dunder methods.\nHere’s a shortcut function to look at only non-dunder methods\n\n\n\n\n\nYou can use the <tab> key in iPython (or Jupyter environments) to explore object methods. By default, only “public” (non-dunder) methods are returned.\n\n\n\nYou can usually just pause typing and VSCode will provide object introspection:\n\nstring = 'some letters'\n\n\n\n\n\nMost objects - especially packages and libraries - provide help documentation that can be accessed using the python helper function… called… help()\n\n# 3, help, str, soil...\nimport math\nhelp(math)\n\nHelp on module math:\n\nNAME\n math\n\nMODULE REFERENCE\n https://docs.python.org/3.10/library/math.html\n \n The following documentation is automatically generated from the Python\n source files. It may be incomplete, incorrect or include features that\n are considered implementation detail and may vary between Python\n implementations. When in doubt, consult the module reference at the\n location listed above.\n\nDESCRIPTION\n This module provides access to the mathematical functions\n defined by the C standard.\n\nFUNCTIONS\n acos(x, /)\n Return the arc cosine (measured in radians) of x.\n \n The result is between 0 and pi.\n \n acosh(x, /)\n Return the inverse hyperbolic cosine of x.\n \n asin(x, /)\n Return the arc sine (measured in radians) of x.\n \n The result is between -pi/2 and pi/2.\n \n asinh(x, /)\n Return the inverse hyperbolic sine of x.\n \n atan(x, /)\n Return the arc tangent (measured in radians) of x.\n \n The result is between -pi/2 and pi/2.\n \n atan2(y, x, /)\n Return the arc tangent (measured in radians) of y/x.\n \n Unlike atan(y/x), the signs of both x and y are considered.\n \n atanh(x, /)\n Return the inverse hyperbolic tangent of x.\n \n ceil(x, /)\n Return the ceiling of x as an Integral.\n \n This is the smallest integer >= x.\n \n comb(n, k, /)\n Number of ways to choose k items from n items without repetition and without order.\n \n Evaluates to n! / (k! * (n - k)!) when k <= n and evaluates\n to zero when k > n.\n \n Also called the binomial coefficient because it is equivalent\n to the coefficient of k-th term in polynomial expansion of the\n expression (1 + x)**n.\n \n Raises TypeError if either of the arguments are not integers.\n Raises ValueError if either of the arguments are negative.\n \n copysign(x, y, /)\n Return a float with the magnitude (absolute value) of x but the sign of y.\n \n On platforms that support signed zeros, copysign(1.0, -0.0)\n returns -1.0.\n \n cos(x, /)\n Return the cosine of x (measured in radians).\n \n cosh(x, /)\n Return the hyperbolic cosine of x.\n \n degrees(x, /)\n Convert angle x from radians to degrees.\n \n dist(p, q, /)\n Return the Euclidean distance between two points p and q.\n \n The points should be specified as sequences (or iterables) of\n coordinates. Both inputs must have the same dimension.\n \n Roughly equivalent to:\n sqrt(sum((px - qx) ** 2.0 for px, qx in zip(p, q)))\n \n erf(x, /)\n Error function at x.\n \n erfc(x, /)\n Complementary error function at x.\n \n exp(x, /)\n Return e raised to the power of x.\n \n expm1(x, /)\n Return exp(x)-1.\n \n This function avoids the loss of precision involved in the direct evaluation of exp(x)-1 for small x.\n \n fabs(x, /)\n Return the absolute value of the float x.\n \n factorial(x, /)\n Find x!.\n \n Raise a ValueError if x is negative or non-integral.\n \n floor(x, /)\n Return the floor of x as an Integral.\n \n This is the largest integer <= x.\n \n fmod(x, y, /)\n Return fmod(x, y), according to platform C.\n \n x % y may differ.\n \n frexp(x, /)\n Return the mantissa and exponent of x, as pair (m, e).\n \n m is a float and e is an int, such that x = m * 2.**e.\n If x is 0, m and e are both 0. Else 0.5 <= abs(m) < 1.0.\n \n fsum(seq, /)\n Return an accurate floating point sum of values in the iterable seq.\n \n Assumes IEEE-754 floating point arithmetic.\n \n gamma(x, /)\n Gamma function at x.\n \n gcd(*integers)\n Greatest Common Divisor.\n \n hypot(...)\n hypot(*coordinates) -> value\n \n Multidimensional Euclidean distance from the origin to a point.\n \n Roughly equivalent to:\n sqrt(sum(x**2 for x in coordinates))\n \n For a two dimensional point (x, y), gives the hypotenuse\n using the Pythagorean theorem: sqrt(x*x + y*y).\n \n For example, the hypotenuse of a 3/4/5 right triangle is:\n \n >>> hypot(3.0, 4.0)\n 5.0\n \n isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0)\n Determine whether two floating point numbers are close in value.\n \n rel_tol\n maximum difference for being considered \"close\", relative to the\n magnitude of the input values\n abs_tol\n maximum difference for being considered \"close\", regardless of the\n magnitude of the input values\n \n Return True if a is close in value to b, and False otherwise.\n \n For the values to be considered close, the difference between them\n must be smaller than at least one of the tolerances.\n \n -inf, inf and NaN behave similarly to the IEEE 754 Standard. That\n is, NaN is not close to anything, even itself. inf and -inf are\n only close to themselves.\n \n isfinite(x, /)\n Return True if x is neither an infinity nor a NaN, and False otherwise.\n \n isinf(x, /)\n Return True if x is a positive or negative infinity, and False otherwise.\n \n isnan(x, /)\n Return True if x is a NaN (not a number), and False otherwise.\n \n isqrt(n, /)\n Return the integer part of the square root of the input.\n \n lcm(*integers)\n Least Common Multiple.\n \n ldexp(x, i, /)\n Return x * (2**i).\n \n This is essentially the inverse of frexp().\n \n lgamma(x, /)\n Natural logarithm of absolute value of Gamma function at x.\n \n log(...)\n log(x, [base=math.e])\n Return the logarithm of x to the given base.\n \n If the base not specified, returns the natural logarithm (base e) of x.\n \n log10(x, /)\n Return the base 10 logarithm of x.\n \n log1p(x, /)\n Return the natural logarithm of 1+x (base e).\n \n The result is computed in a way which is accurate for x near zero.\n \n log2(x, /)\n Return the base 2 logarithm of x.\n \n modf(x, /)\n Return the fractional and integer parts of x.\n \n Both results carry the sign of x and are floats.\n \n nextafter(x, y, /)\n Return the next floating-point value after x towards y.\n \n perm(n, k=None, /)\n Number of ways to choose k items from n items without repetition and with order.\n \n Evaluates to n! / (n - k)! when k <= n and evaluates\n to zero when k > n.\n \n If k is not specified or is None, then k defaults to n\n and the function returns n!.\n \n Raises TypeError if either of the arguments are not integers.\n Raises ValueError if either of the arguments are negative.\n \n pow(x, y, /)\n Return x**y (x to the power of y).\n \n prod(iterable, /, *, start=1)\n Calculate the product of all the elements in the input iterable.\n \n The default start value for the product is 1.\n \n When the iterable is empty, return the start value. This function is\n intended specifically for use with numeric values and may reject\n non-numeric types.\n \n radians(x, /)\n Convert angle x from degrees to radians.\n \n remainder(x, y, /)\n Difference between x and the closest integer multiple of y.\n \n Return x - n*y where n*y is the closest integer multiple of y.\n In the case where x is exactly halfway between two multiples of\n y, the nearest even value of n is used. The result is always exact.\n \n sin(x, /)\n Return the sine of x (measured in radians).\n \n sinh(x, /)\n Return the hyperbolic sine of x.\n \n sqrt(x, /)\n Return the square root of x.\n \n tan(x, /)\n Return the tangent of x (measured in radians).\n \n tanh(x, /)\n Return the hyperbolic tangent of x.\n \n trunc(x, /)\n Truncates the Real x to the nearest Integral toward 0.\n \n Uses the __trunc__ magic method.\n \n ulp(x, /)\n Return the value of the least significant bit of the float x.\n\nDATA\n e = 2.718281828459045\n inf = inf\n nan = nan\n pi = 3.141592653589793\n tau = 6.283185307179586\n\nFILE\n /Users/kellycaylor/mambaforge/envs/eds217_2023/lib/python3.10/lib-dynload/math.cpython-310-darwin.so\n\n\n\n\n\n\n\nIn the iPython shell (or the Jupyter Notebook/Jupyter Lab environment), you can also access the help() command using ?.\n\nmath\n\n<module 'math' from '/Users/kellycaylor/mambaforge/envs/eds217_2023/lib/python3.10/lib-dynload/math.cpython-310-darwin.so'>\n\n\n\n\n\nIn the iPython shell (or the Jupyter Notebook/Jupyter Lab environment) you can use ?? to see the actual source code of python code\n\n\n\n?? only shows source code for for python functions that aren’t compiled to C code. Otherwise, it will show the same information as ?\n\n\n\n\n\n\n\n\n\n\nThe print command is the most commonly used tool for beginners to understand errors\n# This code generates a `TypeError` that \n# x is not the right kind of variable.\ndo_something(x) \nThe print command is the most commonly used debugging tool for beginners.\n\n\n\nPython has a string format called f-strings. These are strings that are prefixed with an f character and allow in-line variable substitution.\n\n# print using c-style format statements\nx = 3.45\nprint(f\"x = {x}\")\n\nx = 3.45\n\n\n\ndef do_something(x):\n x = x / 2 \n return x\n\n# This code generates a `TypeError` that \n# x is not the right kind of variable.\nx = 'f'\n# Check and see what is X?\nprint(\n f\"calling do_something() with x={x}\" # Python f-string\n)\n\ndo_something(x) \n\ncalling do_something() with x=f\n\n\n\n---------------------------------------------------------------------------\nTypeError Traceback (most recent call last)\nCell In[13], line 13\n 8 # Check and see what is X?\n 9 print(\n 10 f\"calling do_something() with x={x}\" # Python f-string\n 11 )\n---> 13 do_something(x)\n\nCell In[13], line 2, in do_something(x)\n 1 def do_something(x):\n----> 2 x = x / 2 \n 3 return x\n\nTypeError: unsupported operand type(s) for /: 'str' and 'int'\n\n\n\n\n\n\nAs of Fall 2002: - O’Rielly Books (Requires UCSB login) - My O’Rielly pdf library: https://bit.ly/eds-217-books (Requires UCSB login)\nAs of Fall, 2022: - Python Docs - Stack Overflow - Talk Python - Ask Python\nAs of Fall, 2023:\nLLMs.\n\nChatGPT - Need $ for GPT-4, 3.X fine debugger, but not always a great programmer.\nGitHub CoPilot - Should be able to get a free student account. Works great in VSCode; we will set this up together later in the course." + "objectID": "course-materials/live-coding/2d_list_comprehensions.html#session-format", + "href": "course-materials/live-coding/2d_list_comprehensions.html#session-format", + "title": "Live Coding Session 2D", + "section": "Session Format", + "text": "Session Format\n\nIntroduction\n\nBrief discussion about the topic and its importance in data science.\n\n\n\nDemonstration\n\nI will demonstrate code examples live. Follow along and write the code into your own Jupyter notebook.\n\n\n\nPractice\n\nYou will have the opportunity to try exercises on your own to apply what you’ve learned.\n\n\n\nQ&A\n\nWe will have a Q&A session at the end where you can ask specific questions about the code, concepts, or issues encountered during the session." }, { - "objectID": "course-materials/lectures/02_helpGPT.html#how-to-move-from-a-beginner-to-a-more-advanced-python-user", - "href": "course-materials/lectures/02_helpGPT.html#how-to-move-from-a-beginner-to-a-more-advanced-python-user", - "title": "Getting Help", - "section": "How to move from a beginner to a more `advanced`` python user", - "text": "How to move from a beginner to a more `advanced`` python user\nTaken from Talk Python to Me, Episode #427, with some modifications." + "objectID": "course-materials/live-coding/2d_list_comprehensions.html#after-the-session", + "href": "course-materials/live-coding/2d_list_comprehensions.html#after-the-session", + "title": "Live Coding Session 2D", + "section": "After the Session", + "text": "After the Session\n\nReview your notes and try to replicate the exercises on your own.\nExperiment with the code by modifying parameters or adding new features to deepen your understanding.\nCheck out our class comprehensions cheatsheet." }, { - "objectID": "course-materials/lectures/02_helpGPT.html#know-your-goals", - "href": "course-materials/lectures/02_helpGPT.html#know-your-goals", - "title": "Getting Help", - "section": "Know your goals", - "text": "Know your goals\n\nWhy are you learning python?\nWhy are you learning data science?" + "objectID": "course-materials/interactive-sessions/interactive-session-git.html", + "href": "course-materials/interactive-sessions/interactive-session-git.html", + "title": "Sidebar", + "section": "", + "text": "Welcome to git (xkcd)" }, { - "objectID": "course-materials/lectures/02_helpGPT.html#have-a-project-in-mind", - "href": "course-materials/lectures/02_helpGPT.html#have-a-project-in-mind", - "title": "Getting Help", - "section": "Have a project in mind", - "text": "Have a project in mind\n\nWhat do you want to do with it?\nUse python to solve a problem you are interested in solving.\nDon’t be afraid to work on personal projects.\n\n\nSome examples of my personal “problem-solving” projects\nBiobib - Python code to make my CV/Biobib from a google sheets/.csv file.\nTriumph - Python notebooks for a 1959 Triumph TR3A EV conversion project.\nStoplight - A simple python webapp for monitoring EDS217 course pace." + "objectID": "course-materials/interactive-sessions/interactive-session-git.html#setting-up-git-for-collaborating-with-notebooks", + "href": "course-materials/interactive-sessions/interactive-session-git.html#setting-up-git-for-collaborating-with-notebooks", + "title": "Sidebar", + "section": "1. Setting up git for collaborating with Notebooks", + "text": "1. Setting up git for collaborating with Notebooks\n\nCleaning Up Jupyter Notebook Files Before Committing to Git\nIn data science workflows, particularly when collaborating using Jupyter Notebooks, it’s important to maintain a clean and efficient Git repository. This guide will help you set up your environment to automatically remove outputs from .ipynb files before committing them, which improves collaboration and reduces repository size.\n\n\nWhy Clean Up .ipynb Files?\n\nReduced Size: Outputs can bloat file sizes, making repositories larger and slower to clone.\nFewer Conflicts: Output cells can cause merge conflicts when multiple people edit the same file.\nEncouraged Reproducibility: Keeping notebooks free of outputs encourages others to run the notebooks themselves.\n\n\n\nStep-by-Step Setup\n\nStep 1: Check if jq is Installed\n\nOpen Terminal: Access your terminal application.\nCheck for jq: Run the following command to see if jq is installed and check its version:\njq --version\nVerify Version: Ensure the output is jq-1.5 or higher. If jq is installed and the version is at least 1.5, you can proceed to the next steps. If not, see the installation note below.\n\n\n\nStep 2: Configure Git to Use a Global Attributes File\n\nOpen ~/.gitconfig: Use nano to edit this file:\nnano ~/.gitconfig\nAdd the Configuration: Copy and paste the following lines:\n[core]\nattributesfile = ~/.gitattributes_global\n\n[filter \"nbstrip_full\"]\nclean = \"jq --indent 1 \\\n '(.cells[] | select(has(\\\"outputs\\\")) | .outputs) = [] \\\n | (.cells[] | select(has(\\\"execution_count\\\")) | .execution_count) = null \\\n | .metadata = {\\\"language_info\\\": {\\\"name\\\": \\\"python\\\", \\\"pygments_lexer\\\": \\\"ipython3\\\"}} \\\n | .cells[].metadata = {} \\\n '\"\nsmudge = cat\nrequired = true\nSave and Exit: Press CTRL + X, then Y, and Enter to save the file.\n\n\n\nStep 3: Create a Global Git Attributes File\n\nOpen ~/.gitattributes_global: Use nano to edit this file:\nnano ~/.gitattributes_global\nAdd the Following Line:\n*.ipynb filter=nbstrip_full\nSave and Exit: Press CTRL + X, then Y, and Enter.\n\n\n\n\nHow This Works\n\nfilter \"nbstrip_full\": This filter uses the jq command to strip outputs and reset execution counts in .ipynb files.\nclean: Removes outputs when files are staged for commit.\nsmudge: Ensures the original content is restored upon checkout.\nrequired: Enforces the use of the filter for the specified files.\n\n\n\nBenefits for Python Environmental Data Science Workflows\n\nEfficiency: Smaller files mean faster repository operations.\nCollaboration: Fewer conflicts facilitate teamwork.\nReproducibility: Encourages consistent execution across environments." }, { - "objectID": "course-materials/lectures/02_helpGPT.html#dont-limit-your-learning-to-whats-needed-for-your-project", - "href": "course-materials/lectures/02_helpGPT.html#dont-limit-your-learning-to-whats-needed-for-your-project", - "title": "Getting Help", - "section": "Don’t limit your learning to what’s needed for your project", - "text": "Don’t limit your learning to what’s needed for your project\n\nLearn more than you need to know…\n…but try to use less than you think you need" + "objectID": "course-materials/interactive-sessions/interactive-session-git.html#optional-installing-jq", + "href": "course-materials/interactive-sessions/interactive-session-git.html#optional-installing-jq", + "title": "Sidebar", + "section": "Optional: Installing jq", + "text": "Optional: Installing jq\nIf jq is not installed or needs to be updated, follow these instructions for your operating system.\n\nWindows\n\nDownload jq:\n\nVisit the jq downloads page and download the Windows executable (jq-win64.exe).\n\nAdd to PATH:\n\nMove the jq-win64.exe to a directory included in your system’s PATH or rename it to jq.exe and place it in C:\\Windows\\System32.\n\nVerify Installation:\n\nOpen Command Prompt and run jq --version to ensure it’s correctly installed.\n\n\n\n\nmacOS\n\nUsing Homebrew:\n\nHomebrew is a package manager for macOS that simplifies the installation of software. It’s widely used for installing command-line tools and other utilities. If you don’t have Homebrew installed, you can follow the instructions on the Homebrew website.\nOnce Homebrew is installed, open Terminal and run the following command to install jq:\nbrew install jq\n\nVerify Installation:\n\nRun jq --version to confirm it is installed and at least version 1.5.\n\n\nBy following these steps, you ensure that your Jupyter Notebook files remain clean and efficient within your Git repositories, enhancing collaboration and reproducibility in your workflows.\n\n\n\nAdditional Note\nFor Linux users, you can typically install jq using your package manager, such as apt on Debian-based systems or yum on Red Hat-based systems:\n# Debian-based systems\nsudo apt-get install jq\n\n# Red Hat-based systems\nsudo yum install jq\n\n\nEnd interactive session 2A" }, { - "objectID": "course-materials/lectures/02_helpGPT.html#read-good-code", - "href": "course-materials/lectures/02_helpGPT.html#read-good-code", - "title": "Getting Help", - "section": "Read good code", - "text": "Read good code\n\nLibraries and packages have great examples of code.\nRead the code (not just docs) of the packages you use.\nGithub is a great place to find code." + "objectID": "course-materials/day9.html#class-materials", + "href": "course-materials/day9.html#class-materials", + "title": "Building a Python Data Science Workflow", + "section": "Class materials", + "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 9 / morning\nWorking on Final Data Science Project (all morning)\n\n\n\nday 9 / afternoon\nData Science Project Presentations (all afternoon)" }, { - "objectID": "course-materials/lectures/02_helpGPT.html#know-your-tools", - "href": "course-materials/lectures/02_helpGPT.html#know-your-tools", - "title": "Getting Help", - "section": "Know your tools", - "text": "Know your tools\n\nLearn how to use your IDE (VSCode)\nLearn how to use your package manager (conda)\nLearn how to use your shell (bash)\nLearn how to use your version control system (git)\n\n\nLearn how to test your code\n\nTesting is part of programming.\nTesting is a great way to learn.\nFocus on end-to-end (E2E) tests (rather than unit tests)\n\nUnit tests:\nDoes it work the way you expect it to (operation-centric)?\nEnd-to-end test:\nDoes it do what you want it to do (output-centric)?" + "objectID": "course-materials/day9.html#syncing-your-classwork-to-github", + "href": "course-materials/day9.html#syncing-your-classwork-to-github", + "title": "Building a Python Data Science Workflow", + "section": "Syncing your classwork to Github", + "text": "Syncing your classwork to Github\nHere are some directions for syncing your classwork with a GitHub repository" }, { - "objectID": "course-materials/lectures/02_helpGPT.html#know-whats-good-enough-for-any-given-project", - "href": "course-materials/lectures/02_helpGPT.html#know-whats-good-enough-for-any-given-project", - "title": "Getting Help", - "section": "Know what’s good enough for any given project", - "text": "Know what’s good enough for any given project\n\nYou’re not writing code for a self-driving car or a pacemaker.\n\nDon’t over-engineer your code.\nDon’t over-optimize your code.\nSimple is better than complex." + "objectID": "course-materials/day9.html#end-of-day-practice", + "href": "course-materials/day9.html#end-of-day-practice", + "title": "Building a Python Data Science Workflow", + "section": "End-of-day practice", + "text": "End-of-day practice\nEnd of Class! Congratulations!!" }, { - "objectID": "course-materials/lectures/02_helpGPT.html#embrace-refactoring", - "href": "course-materials/lectures/02_helpGPT.html#embrace-refactoring", - "title": "Getting Help", - "section": "Embrace refactoring", - "text": "Embrace refactoring\nRefactoring is the process of changing your code without changing its behavior.\n\nShip of Theseus: If you replace every part of a ship, is it still the same ship?\n\n\nAs you learn more, you will find better ways to do things.\nDon’t be afraid to change your code.\nTests (especially end-to-end tests) help you refactor with confidence.\n“Code smells”… if it smells bad, it probably is bad.\n\nCode Smells\nComments can be a code smell; they can be a sign that your code is not clear enough." + "objectID": "course-materials/day9.html#additional-resources", + "href": "course-materials/day9.html#additional-resources", + "title": "Building a Python Data Science Workflow", + "section": "Additional Resources", + "text": "Additional Resources" }, { - "objectID": "course-materials/lectures/02_helpGPT.html#write-things-down", - "href": "course-materials/lectures/02_helpGPT.html#write-things-down", - "title": "Getting Help", - "section": "Write things down", - "text": "Write things down\n\nKeep an ideas notebook\n\nWrite down ideas for projects\nWrite down ideas for code\n\n\n\nWrite comments to yourself and others\n\n\nWrite documentation\n\n\nWrite down questions (use your tools; github issues, etc.)" + "objectID": "course-materials/day7.html#class-materials", + "href": "course-materials/day7.html#class-materials", + "title": "Data Handling and Visualization, Day 2", + "section": "Class materials", + "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 7 / morning\n📊 Data visualization (Part I)\n📊 Data visualization (Part II)\n\n\nday 7 / afternoon\n🙌 Coding Colab: Exploring data through visualizations" }, { - "objectID": "course-materials/lectures/02_helpGPT.html#go-meet-people", - "href": "course-materials/lectures/02_helpGPT.html#go-meet-people", - "title": "Getting Help", - "section": "Go meet people!", - "text": "Go meet people!\n\nThe Python (and Data Science) community is great!\n\nGo to Python & Data Science meetups.\n\nCentral Coast Python\n\n\n\nGo to python and data science conferences.\n\nPyCon 2024 & 2025 will be in Pittsburgh, PA\nPyData (Conferences all over the world)\n\n\n\nGo to hackathons.\n\nSB Hacks (UCSB)\nMLH (Major League Hacking)\nHackathon.com (Hackathons all over the world)" + "objectID": "course-materials/day7.html#end-of-day-practice", + "href": "course-materials/day7.html#end-of-day-practice", + "title": "Data Handling and Visualization, Day 2", + "section": "End-of-day practice", + "text": "End-of-day practice\nComplete the following tasks / activities before heading home for the day!\n\n Day 7 Practice: 🌲 USDA Plant Hardiness Zones 🌴" }, { - "objectID": "course-materials/lectures/00_intro_to_python.html", - "href": "course-materials/lectures/00_intro_to_python.html", - "title": "Lecture 1 - Intro to Python and Environmental Data Science", - "section": "", - "text": "Course Webpage: https://eds-217-essential-python.github.io\n\n\n\n\ndata_science.jpg\n\n\n\n\n\nenvironmental_data_science.jpg\n\n\n\n\n\n🐍 What Python?\n❓ Why Python?\n💻 How Python?\n\n\n“Python is powerful… and fast; plays well with others; runs everywhere; is friendly & easy to learn; is Open.”\n\n\n\n\nPython is a general-purpose, object-oriented programming language that emphasizes code readability through its generous use of white space. Released in 1989, Python is easy to learn and a favorite of programmers and developers.\n\n\n(Python, C, C++, Java, Javascript, R, Pascal) - Take less time to write - Shorter and easier to read - Portable, meaning that they can run on different kinds of computers with few or no modifications.\nThe engine that translates and runs Python is called the Python Interpreter\n\n\"\"\" \nEntering code into this notebook cell \nand pressing [SHIFT-ENTER] will cause the \npython interpreter to execute the code\n\"\"\"\n\n \nprint(\"Hello world!\")\nprint(\"[from this notebook cell]\")\n\nHello world!\n[from this notebook cell]\n\n\n\n\"\"\"\nAlternatively, you can run a \nany python script file (.py file)\nso long as it contains valid\npython code.\n\"\"\"\n!python hello_world.py\n\nHello world!\n[from hello_world.py]\n\n\n\n \n\n\n\n\nNatural languages are the languages that people speak. They are not designed (although they are subjected to various degrees of “order”) and evolve naturally.\nFormal languages are languages that are designed by people for specific applications. - Mathematical Notation \\(E=mc^2\\) - Chemical Notation: \\(\\text{H}_2\\text{O}\\)\nProgramming languages are formal languages that have been designed to express computations.\nParsing: The process of figuring out what the structure of a sentence or statement is (in a natural language you do this subconsciously).\nFormal Languages have strict syntax for tokens and structure:\n\nMathematical syntax error: \\(E=\\$m🦆_2\\) (bad tokens & bad structure)\nChemical syntax error: \\(\\text{G}_3\\text{Z}\\) (bad tokens, but structure is okay)\n\n\n\n\n\nAmbiguity: Natural languages are full of ambiguity, which people parse using contextual clues. Formal languages are nearly or completely unambiguous; any statement has exactly one meaning, regardless of context.\nRedundancy: In order to make up for ambiguity, natural languages employ lots of redundancy. Formal languages are less redundant and more concise.\nLiteralness: Formal languages mean exactly what they say. Natural languages employ idioms and metaphors.\n\nThe inherent differences between familiar natural languages and unfamiliar formal languages creates one of the greatest challenges in learning to code.\n\n\n\n\npoetry: Words are used for sound and meaning. Ambiguity is common and often deliberate.\nprose: The literal meaning of words is important, and the structure contributes meaning. Amenable to analysis but still often ambiguous.\nprogram: Meaning is unambiguous and literal, and can be understood entirely by analysis of the tokens and structure.\n\n\n\n\nFormal languages are very dense, so it takes longer to read them.\nStructure is very important, so it is usually not a good idea to read from top to bottom, left to right. Instead, learn to parse the program in your head, identifying the tokens and interpreting the structure.\nDetails matter. Little things like spelling errors and bad punctuation, which you can get away with in natural languages, will make a big difference in a formal language.\n\n\n\n\n\n\nIBM: R vs. Python\nPython is a multi-purpose language with a readable syntax that’s easy to learn. Programmers use Python to delve into data analysis or use machine learning in scalable production environments.\nR is built by statisticians and leans heavily into statistical models and specialized analytics. Data scientists use R for deep statistical analysis, supported by just a few lines of code and beautiful data visualizations.\nIn general, R is better for initial exploratory analyses, statistical analyses, and data visualization.\nIn general, Python is better for working with APIs, writing maintainable, production-ready code, working with a diverse array of data, and building machine learning or AI workflows.\nBoth languages can do anything. Most data science teams use both languages. (and others too.. Matlab, Javascript, Go, Fortran, etc…)\n\nfrom IPython.lib.display import YouTubeVideo\nYouTubeVideo('GVvfNgszdU0')\n\n\n \n \n\n\n\n\nAnaconda State of Data Science\nData from 2021: \n\n\n\n\nThe data are available here…\nBut, unfortunately, they changed the format of the responses concerning language use between 2022 and 2023. But we can take look at the 2022 data…\nLet’s do some python data science!\n\n# First, we need to gather our tools\nimport pandas as pd # This is the most common data science package used in python!\nimport matplotlib.pyplot as plt # This is the most widely-used plotting package.\n\nimport requests # This package helps us make https requests \nimport io # This package is good at handling input/output streams\n\n\n# Here's the url for the 2022 data. It has a similar structure to the 2021 data, so we can compare them.\nurl = \"https://static.anaconda.cloud/content/Anaconda_2022_State_of_Data_Science_+Raw_Data.csv\"\nresponse = requests.get(url)\n\n\n# Read the response into a dataframe, using the io.StringIO function to feed the response.txt.\n# Also, skip the first three rows\ndf = pd.read_csv(io.StringIO(response.text), skiprows=3)\n\n# Our very first dataframe!\ndf.head()\n\n# Jupyter notebook cells only output the last value requested...\n\n\n\n\n\n\n\n\nIn which country is your primary residence?\nWhich of the following age groups best describes you?\nWhat is the highest level of education you've achieved?\nGender: How do you identify? - Selected Choice\nThe organization I work for is best classified as a:\nWhat is your primary role? - Selected Choice\nFor how many years have you been in your current role?\nWhat position did you hold prior to this? - Selected Choice\nHow would you rate your job satisfaction in your current role?\nWhat would cause you to leave your current employer for a new job? Please select the top option besides pay/benefits. - Selected Choice\n...\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Help choose the best model types to solve specific problems\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Speed up the ML pipeline by automating certain workflows (data cleaning, etc.)\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Tune the model once performance (such as accuracy, etc.) starts to degrade\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Other (please indicate)\nWhat do you think is the biggest problem in the data science/AI/ML space today? - Selected Choice\nWhat tools and resources do you feel are lacking for data scientists who want to learn and develop their skills? (Select all that apply). - Selected Choice\nHow do you typically learn about new tools and topics relevant to your role? (Select all that apply). - Selected Choice\nWhat are you most hoping to see from the data science industry this year? - Selected Choice\nWhat do you believe is the biggest challenge in the open-source community today? - Selected Choice\nHave supply chain disruption problems, such as the ongoing chip shortage, impacted your access to computing resources?\n\n\n\n\n0\nUnited States\n26-41\nDoctoral degree\nMale\nEducational institution\nData Scientist\n1-2 years\nData Scientist\nVery satisfied\nMore flexibility with my work hours\n...\n4.0\n2.0\n5.0\n6.0\nA reduction in job opportunities caused by aut...\nHands-on projects,Mentorship opportunities\nReading technical books, blogs, newsletters, a...\nFurther innovation in the open-source data sci...\nUndermanagement\nNo\n\n\n1\nUnited States\n42-57\nDoctoral degree\nMale\nCommercial (for-profit) entity\nProduct Manager\n5-6 years\nNaN\nVery satisfied\nMore responsibility/opportunity for career adv...\n...\n2.0\n5.0\n4.0\n6.0\nSocial impacts from bias in data and models\nTailored learning paths\nFree video content (e.g. YouTube)\nMore specialized data science hardware\nPublic trust\nYes\n\n\n2\nIndia\n18-25\nBachelor's degree\nFemale\nEducational institution\nData Scientist\nNaN\nNaN\nNaN\nNaN\n...\n1.0\n4.0\n2.0\n6.0\nA reduction in job opportunities caused by aut...\nHands-on projects,Mentorship opportunities\nReading technical books, blogs, newsletters, a...\nFurther innovation in the open-source data sci...\nUndermanagement\nI'm not sure\n\n\n3\nUnited States\n42-57\nBachelor's degree\nMale\nCommercial (for-profit) entity\nProfessor/Instructor/Researcher\n10+ years\nNaN\nModerately satisfied\nMore responsibility/opportunity for career adv...\n...\n1.0\n5.0\n4.0\n6.0\nSocial impacts from bias in data and models\nHands-on projects\nReading technical books, blogs, newsletters, a...\nNew optimized models that allow for more compl...\nTalent shortage\nNo\n\n\n4\nSingapore\n18-25\nHigh School or equivalent\nMale\nNaN\nStudent\nNaN\nNaN\nNaN\nNaN\n...\n4.0\n2.0\n3.0\n6.0\nSocial impacts from bias in data and models\nCommunity engagement and learning platforms,Ta...\nReading technical books, blogs, newsletters, a...\nFurther innovation in the open-source data sci...\nUndermanagement\nYes\n\n\n\n\n5 rows × 120 columns\n\n\n\n\n\n# Jupyter notebook cells only output the last value... unless you use print commands!\nprint(f'Number of survey responses: {len(df)}')\nprint(f'Number of survey questions: {len(df.columns)}') \n\nNumber of survey responses: 3493\nNumber of survey questions: 120\n\n\n\n# 1. Filter the dataframe to only the questions about programming language usage, and \nfiltered_df = df.filter(like='How often do you use the following languages?').copy() # Use copy to force python to make a new copy of the data, not just a reference to a subset.\n\n# 2. Rename the columns to just be the programming languages, without the question preamble\nfiltered_df.rename(columns=lambda x: x.split('-')[-1].strip() if '-' in x else x, inplace=True)\n\nprint(filtered_df.columns)\n\nIndex(['Python', 'R', 'Java', 'JavaScript', 'C/C++', 'C#', 'Julia', 'HTML/CSS',\n 'Bash/Shell', 'SQL', 'Go', 'PHP', 'Rust', 'TypeScript',\n 'Other (please indicate below)'],\n dtype='object')\n\n\n\n# Show the unique values of the `Python` column\nprint(filtered_df['Python'].unique())\n\n['Frequently' 'Sometimes' 'Always' 'Never' 'Rarely' nan]\n\n\n\n# Calculate the percentage of each response for each language\npercentage_df = filtered_df.apply(lambda x: x.value_counts(normalize=True).fillna(0) * 100).transpose()\n\n# Remove the last row, which is the \"Other\" category\npercentage_df = percentage_df[:-1]\n\n# Sort the DataFrame based on the 'Always' responses\nsorted_percentage_df = percentage_df.sort_values(by='Always', ascending=True)\n\n\n# Let's get ready to plot the 2022 data...\nfrom IPython.display import display\n\n# We are going to use the display command to update our figure over multiple cells. \n# This usually isn't necessary, but it's helpful here to see how each set of commands updates the figure\n\n# Define the custom order for plotting\norder = ['Always', 'Frequently', 'Sometimes', 'Rarely', 'Never']\n\ncolors = {\n 'Always': (8/255, 40/255, 81/255), # Replace R1, G1, B1 with the RGB values for 'Dark Blue'\n 'Frequently': (12/255, 96/255, 152/255), # Replace R2, G2, B2 with the RGB values for 'Light Ocean Blue'\n 'Sometimes': (16/255, 146/255, 136/255), # and so on...\n 'Rarely': (11/255, 88/255, 73/255),\n 'Never': (52/255, 163/255, 32/255)\n}\n\n\n# Make the plot\nfig, ax = plt.subplots(figsize=(10, 7))\nsorted_percentage_df[order].plot(kind='barh', stacked=True, ax=ax, color=[colors[label] for label in order])\nax.set_xlabel('Percentage')\nax.set_title('Frequency of Language Usage, 2022',y=1.05)\n\nplt.show() # This command draws our figure. \n\n\n\n\n\n\n\n\n\n# Add labels across the top, like in the original graph\n\n# Get the patches for the top-most bar\nnum_languages = len(sorted_percentage_df)\n\npatches = ax.patches[num_languages-1::num_languages]\n# Calculate the cumulative width of the patches for the top-most bar\ncumulative_widths = [0] * len(order)\nwidths = [patch.get_width() for patch in patches]\nfor i, width in enumerate(widths):\n cumulative_widths[i] = width + (cumulative_widths[i-1] if i > 0 else 0)\n\n\n\n# Add text labels above the bars\nfor i, (width, label) in enumerate(zip(cumulative_widths, order)):\n # Get the color of the current bar segment\n # Calculate the position for the text label\n position = width - (patches[i].get_width() / 2)\n # Add the text label to the plot\n # Adjust the y-coordinate for the text label\n y_position = len(sorted_percentage_df) - 0.3 # Adjust the 0.3 value as needed\n ax.text(position, y_position, label, ha='center', color=colors[label], fontweight='bold')\n\n# Remove the legend\nax.legend().set_visible(False)\n\n#plt.show()\ndisplay(fig) # This command shows our updated figure (we can't re-use \"plt.show()\")\n\n\n\n\n\n\n\n\n\n# Add percentage values inside each patch\nfor patch in ax.patches:\n # Get the width and height of the patch\n width, height = patch.get_width(), patch.get_height()\n \n # Calculate the position for the text label\n x = patch.get_x() + width / 2\n y = patch.get_y() + height / 2\n \n # Get the percentage value for the current patch\n percentage = \"{:.0f}%\".format(width)\n \n # Add the text label to the plot\n ax.text(x, y, percentage, ha='center', va='center', color='white', fontweight='bold')\n\ndisplay(fig) # Let's see those nice text labels!\n\n\n\n\n\n\n\n\n\n# Clean up the figure to remove spines and unecessary labels/ticks, etc..\n\n# Remove x-axis label\nax.set_xlabel('')\n\n# Remove the spines\nax.spines['top'].set_visible(False)\nax.spines['right'].set_visible(False)\nax.spines['bottom'].set_visible(False)\nax.spines['left'].set_visible(False)\n\n# Remove the y-axis tick marks\nax.tick_params(axis='y', which='both', length=0)\n\n# Remove the x-axis tick marks and labels\nax.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)\n\ndisplay(fig) # Now 100% less visually cluttered!" + "objectID": "course-materials/day7.html#additional-resources", + "href": "course-materials/day7.html#additional-resources", + "title": "Data Handling and Visualization, Day 2", + "section": "Additional Resources", + "text": "Additional Resources" }, { - "objectID": "course-materials/lectures/00_intro_to_python.html#lecture-agenda", - "href": "course-materials/lectures/00_intro_to_python.html#lecture-agenda", - "title": "Lecture 1 - Intro to Python and Environmental Data Science", - "section": "", - "text": "🐍 What Python?\n❓ Why Python?\n💻 How Python?\n\n\n“Python is powerful… and fast; plays well with others; runs everywhere; is friendly & easy to learn; is Open.”" + "objectID": "course-materials/day5.html#class-materials", + "href": "course-materials/day5.html#class-materials", + "title": "Transforming Data in Pandas", + "section": "Class materials", + "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 5 / morning\n📝 Selecting and Filtering in Pandas\n🐼 Cleaning Data\n\n\nday 5 / afternoon\n🙌 Coding Colab: Cleaning DataFrames\nEnd-of-day practice" + }, + { + "objectID": "course-materials/day5.html#end-of-day-practice", + "href": "course-materials/day5.html#end-of-day-practice", + "title": "Transforming Data in Pandas", + "section": "End-of-day practice", + "text": "End-of-day practice\nComplete the following tasks / activities before heading home for the day!\n\n Day 5 Practice: 🍌 Analyzing the “Banana Index” 🍌" }, { - "objectID": "course-materials/lectures/00_intro_to_python.html#what-is-python", - "href": "course-materials/lectures/00_intro_to_python.html#what-is-python", - "title": "Lecture 1 - Intro to Python and Environmental Data Science", - "section": "", - "text": "Python is a general-purpose, object-oriented programming language that emphasizes code readability through its generous use of white space. Released in 1989, Python is easy to learn and a favorite of programmers and developers.\n\n\n(Python, C, C++, Java, Javascript, R, Pascal) - Take less time to write - Shorter and easier to read - Portable, meaning that they can run on different kinds of computers with few or no modifications.\nThe engine that translates and runs Python is called the Python Interpreter\n\n\"\"\" \nEntering code into this notebook cell \nand pressing [SHIFT-ENTER] will cause the \npython interpreter to execute the code\n\"\"\"\n\n \nprint(\"Hello world!\")\nprint(\"[from this notebook cell]\")\n\nHello world!\n[from this notebook cell]\n\n\n\n\"\"\"\nAlternatively, you can run a \nany python script file (.py file)\nso long as it contains valid\npython code.\n\"\"\"\n!python hello_world.py\n\nHello world!\n[from hello_world.py]\n\n\n\n \n\n\n\n\nNatural languages are the languages that people speak. They are not designed (although they are subjected to various degrees of “order”) and evolve naturally.\nFormal languages are languages that are designed by people for specific applications. - Mathematical Notation \\(E=mc^2\\) - Chemical Notation: \\(\\text{H}_2\\text{O}\\)\nProgramming languages are formal languages that have been designed to express computations.\nParsing: The process of figuring out what the structure of a sentence or statement is (in a natural language you do this subconsciously).\nFormal Languages have strict syntax for tokens and structure:\n\nMathematical syntax error: \\(E=\\$m🦆_2\\) (bad tokens & bad structure)\nChemical syntax error: \\(\\text{G}_3\\text{Z}\\) (bad tokens, but structure is okay)\n\n\n\n\n\nAmbiguity: Natural languages are full of ambiguity, which people parse using contextual clues. Formal languages are nearly or completely unambiguous; any statement has exactly one meaning, regardless of context.\nRedundancy: In order to make up for ambiguity, natural languages employ lots of redundancy. Formal languages are less redundant and more concise.\nLiteralness: Formal languages mean exactly what they say. Natural languages employ idioms and metaphors.\n\nThe inherent differences between familiar natural languages and unfamiliar formal languages creates one of the greatest challenges in learning to code.\n\n\n\n\npoetry: Words are used for sound and meaning. Ambiguity is common and often deliberate.\nprose: The literal meaning of words is important, and the structure contributes meaning. Amenable to analysis but still often ambiguous.\nprogram: Meaning is unambiguous and literal, and can be understood entirely by analysis of the tokens and structure.\n\n\n\n\nFormal languages are very dense, so it takes longer to read them.\nStructure is very important, so it is usually not a good idea to read from top to bottom, left to right. Instead, learn to parse the program in your head, identifying the tokens and interpreting the structure.\nDetails matter. Little things like spelling errors and bad punctuation, which you can get away with in natural languages, will make a big difference in a formal language." + "objectID": "course-materials/day5.html#additional-resources", + "href": "course-materials/day5.html#additional-resources", + "title": "Transforming Data in Pandas", + "section": "Additional Resources", + "text": "Additional Resources" }, { - "objectID": "course-materials/lectures/00_intro_to_python.html#why-python", - "href": "course-materials/lectures/00_intro_to_python.html#why-python", - "title": "Lecture 1 - Intro to Python and Environmental Data Science", - "section": "", - "text": "IBM: R vs. Python\nPython is a multi-purpose language with a readable syntax that’s easy to learn. Programmers use Python to delve into data analysis or use machine learning in scalable production environments.\nR is built by statisticians and leans heavily into statistical models and specialized analytics. Data scientists use R for deep statistical analysis, supported by just a few lines of code and beautiful data visualizations.\nIn general, R is better for initial exploratory analyses, statistical analyses, and data visualization.\nIn general, Python is better for working with APIs, writing maintainable, production-ready code, working with a diverse array of data, and building machine learning or AI workflows.\nBoth languages can do anything. Most data science teams use both languages. (and others too.. Matlab, Javascript, Go, Fortran, etc…)\n\nfrom IPython.lib.display import YouTubeVideo\nYouTubeVideo('GVvfNgszdU0')\n\n\n \n \n\n\n\n\nAnaconda State of Data Science\nData from 2021:" + "objectID": "course-materials/day3.html#class-materials", + "href": "course-materials/day3.html#class-materials", + "title": "Control and Comprehensions", + "section": "Class materials", + "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 3 / morning\n📝 Control Flows\n🙌 Coding Colab: Control flows and data science\n\n\nday 3 / afternoon\n🐼 Intro to Arrays and Series\n🙌 Coding Colab: Working with Pandas Series" }, { - "objectID": "course-materials/lectures/00_intro_to_python.html#what-about-2023-data", - "href": "course-materials/lectures/00_intro_to_python.html#what-about-2023-data", - "title": "Lecture 1 - Intro to Python and Environmental Data Science", - "section": "", - "text": "The data are available here…\nBut, unfortunately, they changed the format of the responses concerning language use between 2022 and 2023. But we can take look at the 2022 data…\nLet’s do some python data science!\n\n# First, we need to gather our tools\nimport pandas as pd # This is the most common data science package used in python!\nimport matplotlib.pyplot as plt # This is the most widely-used plotting package.\n\nimport requests # This package helps us make https requests \nimport io # This package is good at handling input/output streams\n\n\n# Here's the url for the 2022 data. It has a similar structure to the 2021 data, so we can compare them.\nurl = \"https://static.anaconda.cloud/content/Anaconda_2022_State_of_Data_Science_+Raw_Data.csv\"\nresponse = requests.get(url)\n\n\n# Read the response into a dataframe, using the io.StringIO function to feed the response.txt.\n# Also, skip the first three rows\ndf = pd.read_csv(io.StringIO(response.text), skiprows=3)\n\n# Our very first dataframe!\ndf.head()\n\n# Jupyter notebook cells only output the last value requested...\n\n\n\n\n\n\n\n\nIn which country is your primary residence?\nWhich of the following age groups best describes you?\nWhat is the highest level of education you've achieved?\nGender: How do you identify? - Selected Choice\nThe organization I work for is best classified as a:\nWhat is your primary role? - Selected Choice\nFor how many years have you been in your current role?\nWhat position did you hold prior to this? - Selected Choice\nHow would you rate your job satisfaction in your current role?\nWhat would cause you to leave your current employer for a new job? Please select the top option besides pay/benefits. - Selected Choice\n...\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Help choose the best model types to solve specific problems\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Speed up the ML pipeline by automating certain workflows (data cleaning, etc.)\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Tune the model once performance (such as accuracy, etc.) starts to degrade\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Other (please indicate)\nWhat do you think is the biggest problem in the data science/AI/ML space today? - Selected Choice\nWhat tools and resources do you feel are lacking for data scientists who want to learn and develop their skills? (Select all that apply). - Selected Choice\nHow do you typically learn about new tools and topics relevant to your role? (Select all that apply). - Selected Choice\nWhat are you most hoping to see from the data science industry this year? - Selected Choice\nWhat do you believe is the biggest challenge in the open-source community today? - Selected Choice\nHave supply chain disruption problems, such as the ongoing chip shortage, impacted your access to computing resources?\n\n\n\n\n0\nUnited States\n26-41\nDoctoral degree\nMale\nEducational institution\nData Scientist\n1-2 years\nData Scientist\nVery satisfied\nMore flexibility with my work hours\n...\n4.0\n2.0\n5.0\n6.0\nA reduction in job opportunities caused by aut...\nHands-on projects,Mentorship opportunities\nReading technical books, blogs, newsletters, a...\nFurther innovation in the open-source data sci...\nUndermanagement\nNo\n\n\n1\nUnited States\n42-57\nDoctoral degree\nMale\nCommercial (for-profit) entity\nProduct Manager\n5-6 years\nNaN\nVery satisfied\nMore responsibility/opportunity for career adv...\n...\n2.0\n5.0\n4.0\n6.0\nSocial impacts from bias in data and models\nTailored learning paths\nFree video content (e.g. YouTube)\nMore specialized data science hardware\nPublic trust\nYes\n\n\n2\nIndia\n18-25\nBachelor's degree\nFemale\nEducational institution\nData Scientist\nNaN\nNaN\nNaN\nNaN\n...\n1.0\n4.0\n2.0\n6.0\nA reduction in job opportunities caused by aut...\nHands-on projects,Mentorship opportunities\nReading technical books, blogs, newsletters, a...\nFurther innovation in the open-source data sci...\nUndermanagement\nI'm not sure\n\n\n3\nUnited States\n42-57\nBachelor's degree\nMale\nCommercial (for-profit) entity\nProfessor/Instructor/Researcher\n10+ years\nNaN\nModerately satisfied\nMore responsibility/opportunity for career adv...\n...\n1.0\n5.0\n4.0\n6.0\nSocial impacts from bias in data and models\nHands-on projects\nReading technical books, blogs, newsletters, a...\nNew optimized models that allow for more compl...\nTalent shortage\nNo\n\n\n4\nSingapore\n18-25\nHigh School or equivalent\nMale\nNaN\nStudent\nNaN\nNaN\nNaN\nNaN\n...\n4.0\n2.0\n3.0\n6.0\nSocial impacts from bias in data and models\nCommunity engagement and learning platforms,Ta...\nReading technical books, blogs, newsletters, a...\nFurther innovation in the open-source data sci...\nUndermanagement\nYes\n\n\n\n\n5 rows × 120 columns\n\n\n\n\n\n# Jupyter notebook cells only output the last value... unless you use print commands!\nprint(f'Number of survey responses: {len(df)}')\nprint(f'Number of survey questions: {len(df.columns)}') \n\nNumber of survey responses: 3493\nNumber of survey questions: 120\n\n\n\n# 1. Filter the dataframe to only the questions about programming language usage, and \nfiltered_df = df.filter(like='How often do you use the following languages?').copy() # Use copy to force python to make a new copy of the data, not just a reference to a subset.\n\n# 2. Rename the columns to just be the programming languages, without the question preamble\nfiltered_df.rename(columns=lambda x: x.split('-')[-1].strip() if '-' in x else x, inplace=True)\n\nprint(filtered_df.columns)\n\nIndex(['Python', 'R', 'Java', 'JavaScript', 'C/C++', 'C#', 'Julia', 'HTML/CSS',\n 'Bash/Shell', 'SQL', 'Go', 'PHP', 'Rust', 'TypeScript',\n 'Other (please indicate below)'],\n dtype='object')\n\n\n\n# Show the unique values of the `Python` column\nprint(filtered_df['Python'].unique())\n\n['Frequently' 'Sometimes' 'Always' 'Never' 'Rarely' nan]\n\n\n\n# Calculate the percentage of each response for each language\npercentage_df = filtered_df.apply(lambda x: x.value_counts(normalize=True).fillna(0) * 100).transpose()\n\n# Remove the last row, which is the \"Other\" category\npercentage_df = percentage_df[:-1]\n\n# Sort the DataFrame based on the 'Always' responses\nsorted_percentage_df = percentage_df.sort_values(by='Always', ascending=True)\n\n\n# Let's get ready to plot the 2022 data...\nfrom IPython.display import display\n\n# We are going to use the display command to update our figure over multiple cells. \n# This usually isn't necessary, but it's helpful here to see how each set of commands updates the figure\n\n# Define the custom order for plotting\norder = ['Always', 'Frequently', 'Sometimes', 'Rarely', 'Never']\n\ncolors = {\n 'Always': (8/255, 40/255, 81/255), # Replace R1, G1, B1 with the RGB values for 'Dark Blue'\n 'Frequently': (12/255, 96/255, 152/255), # Replace R2, G2, B2 with the RGB values for 'Light Ocean Blue'\n 'Sometimes': (16/255, 146/255, 136/255), # and so on...\n 'Rarely': (11/255, 88/255, 73/255),\n 'Never': (52/255, 163/255, 32/255)\n}\n\n\n# Make the plot\nfig, ax = plt.subplots(figsize=(10, 7))\nsorted_percentage_df[order].plot(kind='barh', stacked=True, ax=ax, color=[colors[label] for label in order])\nax.set_xlabel('Percentage')\nax.set_title('Frequency of Language Usage, 2022',y=1.05)\n\nplt.show() # This command draws our figure. \n\n\n\n\n\n\n\n\n\n# Add labels across the top, like in the original graph\n\n# Get the patches for the top-most bar\nnum_languages = len(sorted_percentage_df)\n\npatches = ax.patches[num_languages-1::num_languages]\n# Calculate the cumulative width of the patches for the top-most bar\ncumulative_widths = [0] * len(order)\nwidths = [patch.get_width() for patch in patches]\nfor i, width in enumerate(widths):\n cumulative_widths[i] = width + (cumulative_widths[i-1] if i > 0 else 0)\n\n\n\n# Add text labels above the bars\nfor i, (width, label) in enumerate(zip(cumulative_widths, order)):\n # Get the color of the current bar segment\n # Calculate the position for the text label\n position = width - (patches[i].get_width() / 2)\n # Add the text label to the plot\n # Adjust the y-coordinate for the text label\n y_position = len(sorted_percentage_df) - 0.3 # Adjust the 0.3 value as needed\n ax.text(position, y_position, label, ha='center', color=colors[label], fontweight='bold')\n\n# Remove the legend\nax.legend().set_visible(False)\n\n#plt.show()\ndisplay(fig) # This command shows our updated figure (we can't re-use \"plt.show()\")\n\n\n\n\n\n\n\n\n\n# Add percentage values inside each patch\nfor patch in ax.patches:\n # Get the width and height of the patch\n width, height = patch.get_width(), patch.get_height()\n \n # Calculate the position for the text label\n x = patch.get_x() + width / 2\n y = patch.get_y() + height / 2\n \n # Get the percentage value for the current patch\n percentage = \"{:.0f}%\".format(width)\n \n # Add the text label to the plot\n ax.text(x, y, percentage, ha='center', va='center', color='white', fontweight='bold')\n\ndisplay(fig) # Let's see those nice text labels!\n\n\n\n\n\n\n\n\n\n# Clean up the figure to remove spines and unecessary labels/ticks, etc..\n\n# Remove x-axis label\nax.set_xlabel('')\n\n# Remove the spines\nax.spines['top'].set_visible(False)\nax.spines['right'].set_visible(False)\nax.spines['bottom'].set_visible(False)\nax.spines['left'].set_visible(False)\n\n# Remove the y-axis tick marks\nax.tick_params(axis='y', which='both', length=0)\n\n# Remove the x-axis tick marks and labels\nax.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)\n\ndisplay(fig) # Now 100% less visually cluttered!" + "objectID": "course-materials/day3.html#end-of-day-practice", + "href": "course-materials/day3.html#end-of-day-practice", + "title": "Control and Comprehensions", + "section": "End-of-day practice", + "text": "End-of-day practice\nComplete the following tasks / activities before heading home for the day!\n\n Day 3 Practice: Using Pandas Series for Data Analysis" }, { - "objectID": "course-materials/lectures/00_intro_to_python.html#the-end", - "href": "course-materials/lectures/00_intro_to_python.html#the-end", - "title": "Lecture 1 - Intro to Python and Environmental Data Science", - "section": "The End", - "text": "The End" + "objectID": "course-materials/day3.html#additional-resources", + "href": "course-materials/day3.html#additional-resources", + "title": "Control and Comprehensions", + "section": "Additional Resources", + "text": "Additional Resources" }, { - "objectID": "course-materials/cheatsheets/first_steps.html", - "href": "course-materials/cheatsheets/first_steps.html", - "title": "Python Basics Cheat Sheet", - "section": "", - "text": "Variables are containers for storing data values.\nPython has no command for declaring a variable: it is created the moment you first assign a value to it.\n\nx = 5\nname = \"Alice\"\n\n\n\n\nPython has various data types including:\n\nint (integer): A whole number, positive or negative, without decimals.\nfloat (floating point number): A number, positive or negative, containing one or more decimals.\nstr (string): A sequence of characters in quotes.\nbool (boolean): Represents True or False.\n\n\nage = 30 # int\ntemperature = 20.5 # float\nname = \"Bob\" # str\nis_valid = True # bool\n\n\n\n\n\n\n\nUsed with numeric values to perform common mathematical operations:\n\n\n\n\nOperator\nDescription\n\n\n\n\n+\nAddition\n\n\n-\nSubtraction\n\n\n*\nMultiplication\n\n\n/\nDivision\n\n\n%\nModulus\n\n\n**\nExponentiation\n\n\n//\nFloor division\n\n\n\n\n\n\nx = 10\ny = 3\nprint(x + y) # 13\nprint(x - y) # 7\nprint(x * y) # 30\nprint(x / y) # 3.3333\nprint(x % y) # 1\nprint(x ** y) # 1000\nprint(x // y) # 3\n\n\n\n\nUsed to combine conditional statements:\n\n\n\n\nOperator\nDescription\n\n\n\n\nand\nReturns True if both statements are true\n\n\nor\nReturns True if one of the statements is true\n\n\nnot\nReverse the result, returns False if the result is true\n\n\n\n\n\n\nx = True\ny = False\nprint(x and y) # False\nprint(x or y) # True\nprint(not x) # False\n\n\n\n\nStrings in Python are surrounded by either single quotation marks, or double quotation marks.\n\nhello = \"Hello\"\nworld = 'World'\nprint(hello + \" \" + world) # Hello World\n\nStrings can be indexed with the first character having index 0.\n\na = \"Hello, World!\"\nprint(a[1]) # e\n\nSlicing strings:\n\nb = \"Hello, World!\"\nprint(b[2:5]) # llo\n\n\n\n\n# This is a comment\nprint(\"Hello, World!\") # Prints Hello, World!\nThis cheatsheet covers the very basics to get you started with Python. Experiment with these concepts to understand them better!" + "objectID": "course-materials/day1.html#class-materials", + "href": "course-materials/day1.html#class-materials", + "title": "Intro to Python and JupyterLab", + "section": "Class materials", + "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 1 / morning\nPython and Data Science\n⚒️ Meet JupyterLab ⚒️ Coding in Jupyter Notebooks\n\n\n\n\n\n\n\nday 1 / afternoon\n🐍 Exploring Data Types and Methods\n🐍 Variables & Operators" }, { - "objectID": "course-materials/cheatsheets/first_steps.html#variables-and-data-types", - "href": "course-materials/cheatsheets/first_steps.html#variables-and-data-types", - "title": "Python Basics Cheat Sheet", - "section": "", - "text": "Variables are containers for storing data values.\nPython has no command for declaring a variable: it is created the moment you first assign a value to it.\n\nx = 5\nname = \"Alice\"\n\n\n\n\nPython has various data types including:\n\nint (integer): A whole number, positive or negative, without decimals.\nfloat (floating point number): A number, positive or negative, containing one or more decimals.\nstr (string): A sequence of characters in quotes.\nbool (boolean): Represents True or False.\n\n\nage = 30 # int\ntemperature = 20.5 # float\nname = \"Bob\" # str\nis_valid = True # bool" + "objectID": "course-materials/day1.html#end-of-day-practice", + "href": "course-materials/day1.html#end-of-day-practice", + "title": "Intro to Python and JupyterLab", + "section": "End-of-day practice", + "text": "End-of-day practice\nOur last task today is to work through an example of a Python data science workflow. This exercise “skips ahead” to content we will be learning later in the course, but provides a preview of where we are headed and how easy it is to do data science in python!\n\n Day 1 Practice: Example Python Data Science Workflow" }, { - "objectID": "course-materials/cheatsheets/first_steps.html#basic-operations", - "href": "course-materials/cheatsheets/first_steps.html#basic-operations", - "title": "Python Basics Cheat Sheet", - "section": "", - "text": "Used with numeric values to perform common mathematical operations:\n\n\n\n\nOperator\nDescription\n\n\n\n\n+\nAddition\n\n\n-\nSubtraction\n\n\n*\nMultiplication\n\n\n/\nDivision\n\n\n%\nModulus\n\n\n**\nExponentiation\n\n\n//\nFloor division\n\n\n\n\n\n\nx = 10\ny = 3\nprint(x + y) # 13\nprint(x - y) # 7\nprint(x * y) # 30\nprint(x / y) # 3.3333\nprint(x % y) # 1\nprint(x ** y) # 1000\nprint(x // y) # 3\n\n\n\n\nUsed to combine conditional statements:\n\n\n\n\nOperator\nDescription\n\n\n\n\nand\nReturns True if both statements are true\n\n\nor\nReturns True if one of the statements is true\n\n\nnot\nReverse the result, returns False if the result is true\n\n\n\n\n\n\nx = True\ny = False\nprint(x and y) # False\nprint(x or y) # True\nprint(not x) # False\n\n\n\n\nStrings in Python are surrounded by either single quotation marks, or double quotation marks.\n\nhello = \"Hello\"\nworld = 'World'\nprint(hello + \" \" + world) # Hello World\n\nStrings can be indexed with the first character having index 0.\n\na = \"Hello, World!\"\nprint(a[1]) # e\n\nSlicing strings:\n\nb = \"Hello, World!\"\nprint(b[2:5]) # llo" + "objectID": "course-materials/day1.html#additional-resources", + "href": "course-materials/day1.html#additional-resources", + "title": "Intro to Python and JupyterLab", + "section": "Additional Resources", + "text": "Additional Resources\nNA" }, { - "objectID": "course-materials/cheatsheets/first_steps.html#printing-and-commenting", - "href": "course-materials/cheatsheets/first_steps.html#printing-and-commenting", - "title": "Python Basics Cheat Sheet", + "objectID": "course-materials/cheatsheets/timeseries.html", + "href": "course-materials/cheatsheets/timeseries.html", + "title": "EDS 217 Cheatsheet", "section": "", - "text": "# This is a comment\nprint(\"Hello, World!\") # Prints Hello, World!\nThis cheatsheet covers the very basics to get you started with Python. Experiment with these concepts to understand them better!" + "text": "To be added" }, { - "objectID": "course-materials/coding-colabs/6b_preprocess.html", - "href": "course-materials/coding-colabs/6b_preprocess.html", - "title": "", + "objectID": "course-materials/cheatsheets/print.html", + "href": "course-materials/cheatsheets/print.html", + "title": "EDS 217 Cheatsheet", "section": "", - "text": "import pandas as pd\n\n# Load the CO2 dataset\nco2_url = \"https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.csv\"\nco2_df = pd.read_csv(co2_url, comment='#', header=1, \n names=['Year', 'Month', 'DecimalDate', 'MonthlyAverage', \n 'Deseasonalized', 'DaysInMonth', 'StdDev', 'Uncertainty'])\n\n\nco2_df.head()\n\n\n\n\n\n\n\n\nYear\nMonth\nDecimalDate\nMonthlyAverage\nDeseasonalized\nDaysInMonth\nStdDev\nUncertainty\n\n\n\n\n0\n1958\n4\n1958.2877\n317.45\n315.16\n-1\n-9.99\n-0.99\n\n\n1\n1958\n5\n1958.3699\n317.51\n314.69\n-1\n-9.99\n-0.99\n\n\n2\n1958\n6\n1958.4548\n317.27\n315.15\n-1\n-9.99\n-0.99\n\n\n3\n1958\n7\n1958.5370\n315.87\n315.20\n-1\n-9.99\n-0.99\n\n\n4\n1958\n8\n1958.6219\n314.93\n316.21\n-1\n-9.99\n-0.99\n\n\n\n\n\n\n\n\n\n# Convert Year and Month to datetime\nco2_df['Date'] = pd.to_datetime(co2_df['Year'].astype(str) + '-' + co2_df['Month'].astype(str) + '-01')\n\n# Select only the Date and MonthlyAverage columns\nco2_df = co2_df[['Date', 'MonthlyAverage']].rename(columns={'MonthlyAverage': 'CO2Concentration'})\n\n# Sort by date and reset index\nco2_df = co2_df.sort_values('Date').reset_index(drop=True)\n\n# Save to CSV\nco2_df.to_csv('monthly_co2_concentration.csv', index=False)\n\nprint(co2_df.head())\n\n Date CO2Concentration\n0 1958-04-01 317.45\n1 1958-05-01 317.51\n2 1958-06-01 317.27\n3 1958-07-01 315.87\n4 1958-08-01 314.93" + "text": "The print() function outputs the specified message to the screen. It is often used for debugging to display the values of variables and program status during code execution.\n\n\nprint(\"Hello, World!\")\n\n\n\nx = 10\ny = 20\nprint(x)\nprint(y)" }, { - "objectID": "course-materials/lectures/01_the_zen_of_python.html", - "href": "course-materials/lectures/01_the_zen_of_python.html", - "title": "The Zen of Python", + "objectID": "course-materials/cheatsheets/print.html#basic-usage-of-print", + "href": "course-materials/cheatsheets/print.html#basic-usage-of-print", + "title": "EDS 217 Cheatsheet", "section": "", - "text": "# What is the Zen of Python??\nimport this\n\nThe Zen of Python, by Tim Peters\n\nBeautiful is better than ugly.\nExplicit is better than implicit.\nSimple is better than complex.\nComplex is better than complicated.\nFlat is better than nested.\nSparse is better than dense.\nReadability counts.\nSpecial cases aren't special enough to break the rules.\nAlthough practicality beats purity.\nErrors should never pass silently.\nUnless explicitly silenced.\nIn the face of ambiguity, refuse the temptation to guess.\nThere should be one-- and preferably only one --obvious way to do it.\nAlthough that way may not be obvious at first unless you're Dutch.\nNow is better than never.\nAlthough never is often better than *right* now.\nIf the implementation is hard to explain, it's a bad idea.\nIf the implementation is easy to explain, it may be a good idea.\nNamespaces are one honking great idea -- let's do more of those!" + "text": "The print() function outputs the specified message to the screen. It is often used for debugging to display the values of variables and program status during code execution.\n\n\nprint(\"Hello, World!\")\n\n\n\nx = 10\ny = 20\nprint(x)\nprint(y)" }, { - "objectID": "course-materials/lectures/01_the_zen_of_python.html#python-errors", - "href": "course-materials/lectures/01_the_zen_of_python.html#python-errors", - "title": "The Zen of Python", - "section": "Python Errors", - "text": "Python Errors\nThere are two types of errors in Python: SyntaxErrors and Exceptions.\n\nSyntaxErrors\nA SyntaxError happens when the Python language interpreter (the parser) detects an incorrectly formatted statement.\nThis code is trying to divide two numbers, but there are mismatched parentheses. What happens when we run it?\n>>> print( 5 / 4 ))\n\nprint( 5 / 4 ))\n\n\n Cell In[1], line 1\n print( 5 / 4 ))\n ^\nSyntaxError: unmatched ')'\n\n\n\n\nWhen python says SyntaxError, you should read this as I don't know what you want me to do!?\nOften the error includes some indication of where the problem is, although this indication can sometimes be misleading if the detection occurs far away from the syntax problem that created the error. Often the interpreter will attempt to explain what the problem is!\n\n\nExceptions\nAn Exception happens the code you have written violates the Python language specification.\nThis code is trying to divide zero by 0. Its syntax is correct. But what happens when we run it?\n>>> print( 0 / 0 )\n\ntry:\n print( 0 / 0 ) \nexcept ZeroDivisionError:\n print(f\"It didn't work because you tried to divide by zero\")\n\nIt didn't work because you tried to divide by zero\n\n\nWhen python says anything other than SyntaxError, you should read this as You are asking to do something I can't do\nIn this case, the ZeroDivisionError is raised because the Python language specification does not allow for division by zero.\n\n\nTypes of Exceptions\nPython has a lot of builtin Errors that correspond to the definition of the Python language.\nA few common Exceptions you will see include TypeError, IndexError, and KeyError.\n\n\nTypeError\nA TypeError is raised when you try to perform a valid method on an inappropriate data type.\n\n# TypeError Examples:\n'a' + 3\n\n\n\nIndexError\nAn IndexError is raised when you try to access an undefined element of a sequence. Sequences are structured data types whose elements are stored in a specific order. A list is an example of a sequence.\n\n# IndexError Example:\nmy_list = ['a', 'b', 'c', 'd']\nmy_list[4]\n\n\n---------------------------------------------------------------------------\nIndexError Traceback (most recent call last)\nCell In[14], line 3\n 1 # IndexError Example:\n 2 my_list = ['a', 'b', 'c', 'd']\n----> 3 my_list[4]\n\nIndexError: list index out of range\n\n\n\n\n\nKeyError\nA KeyError is raised when you try to perform a valid method on an inappropriate data type.\n\n# KeyError Examples:\n\nmy_dict = {'column_1': 'definition 1', 'another_word': 'a second definition'}\nmy_dict['column1']\n\n\n---------------------------------------------------------------------------\nKeyError Traceback (most recent call last)\nCell In[16], line 4\n 1 # KeyError Examples:\n 3 my_dict = {'column_1': 'definition 1', 'another_word': 'a second definition'}\n----> 4 my_dict['column1']\n\nKeyError: 'column1'\n\n\n\n\n\nDeciphering Tracebacks\nWhen an exception is raised in python the interpreter generates a “Traceback” that shows where and why the error occurred. Generally, the REPL has most detailed Traceback information, although Jupyter Notebooks and iPython interactive shells also provide necessary information to debug any exception.\n\n# defining a function\ndef multiply(num1, num2):\n result = num1 * num2\n print(results)\n \n# calling the function\nmultiply(10, 2)\n\n\n---------------------------------------------------------------------------\nNameError Traceback (most recent call last)\nCell In[17], line 7\n 4 print(results)\n 6 # calling the function\n----> 7 multiply(10, 2)\n\nCell In[17], line 4, in multiply(num1, num2)\n 2 def multiply(num1, num2):\n 3 result = num1 * num2\n----> 4 print(results)\n\nNameError: name 'results' is not defined\n\n\n\n\n## The End" + "objectID": "course-materials/cheatsheets/print.html#combining-text-and-variables", + "href": "course-materials/cheatsheets/print.html#combining-text-and-variables", + "title": "EDS 217 Cheatsheet", + "section": "Combining Text and Variables", + "text": "Combining Text and Variables\nYou can combine text and variables in the print() function to make the output more informative. Here are 4 different ways:\n\nThis is a note.\n\n\nMake sure to check your data types before processing.\n\n\n1. Using Comma Separation\nname = \"Alice\"\nage = 30\nprint(\"Name:\", name, \"Age:\", age)\n\n\n2. Using String Formatting\n\nf-string (Formatted String Literal) - Python 3.6+ [PREFERRED]\nname = \"Bob\"\nage = 25\nprint(f\"Name: {name}, Age: {age}\")\n\n\nformat() Method [CAN BE USEFUL IN COMPLICATED PRINT STATEMENTS]\nname = \"Carol\"\nage = 22\nprint(\"Name: {}, Age: {}\".format(name, age))\n\n\nOld % formatting [NOT RECOMMENDED]\nname = \"Dave\"\nage = 28\nprint(\"Name: %s, Age: %d\" % (name, age))" }, { - "objectID": "course-materials/lectures/03-debugging.html", - "href": "course-materials/lectures/03-debugging.html", - "title": "Debugging with the VS code debugger", - "section": "", - "text": "debug.png\nWhen your code doesn’t work as expected, you might: 1. Use print statements 1. Ask ChatGPT what’s wrong with your code\nPrint statements can be annoying to put all over the place and sometimes ChatGPT doesn’t know. So what if you could step into your code and run it line by line, interactively, to figure out what was wrong? This is what VS code’s debugger does for you." + "objectID": "course-materials/cheatsheets/print.html#debugging-with-print", + "href": "course-materials/cheatsheets/print.html#debugging-with-print", + "title": "EDS 217 Cheatsheet", + "section": "Debugging with Print", + "text": "Debugging with Print\nUse print() to display intermediate values in your code to understand how data changes step by step.\n\nExample: Debugging a Loop\nfor i in range(5):\n print(f\"Current value of i: {i}\")\n\n\nExample: Checking Function Outputs\ndef add(a, b):\n result = a + b\n print(f\"Adding {a} + {b} = {result}\")\n return result\n\nadd(5, 3)" }, { - "objectID": "course-materials/lectures/03-debugging.html#setup", - "href": "course-materials/lectures/03-debugging.html#setup", - "title": "Debugging with the VS code debugger", - "section": "Setup", - "text": "Setup\nGet the VS code python extension. 1. Go to the left hand bar in VS code and click “Extensions” 1. Search “Python” 1. Install the extension called “Python” by Microsoft\nNote: VS code has a known bug where sometimes the Debug this cell option disappears. If this happens, you unfortunately need to restart VS code. This is not an issue if debugging .py files though, which is likely where you’ll end up using the debugger the most anyways." + "objectID": "course-materials/cheatsheets/matplotlib.html", + "href": "course-materials/cheatsheets/matplotlib.html", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "Tip\n\n\n\nFor 🤯 inspiration + 👩‍💻 code, check out the Python Graph Gallery" }, { - "objectID": "course-materials/lectures/03-debugging.html#using-the-debugger", - "href": "course-materials/lectures/03-debugging.html#using-the-debugger", - "title": "Debugging with the VS code debugger", - "section": "Using the debugger", - "text": "Using the debugger\nWe will practice using the debugger with practice 2-1, problem 4.\nFirst, let’s check out how the debugger works by placing a breakpoint on the first line of the cell where we define variables and stepping through it.\nSee how the variables appear under your Run and Debug tab, and try using the Debug Console to print or manipulate the varibles as you step throught the code.\n\n# Import numpy for mean calculations\nimport numpy as np\n\ntempF_2015 = [61.628, 61.7, 61.808, 61.448, 61.52, 61.538, 61.394, 61.52, 61.61, 62.042, 61.988, 62.168]\ntempF_2016 = [62.186, 62.546, 62.528, 62.06, 61.79, 61.52, 61.61, 61.916, 61.718, 61.682, 61.736, 61.628]\ntempF_2017 = [61.916, 62.132, 62.168, 61.772, 61.718, 61.376, 61.556, 61.646, 61.466, 61.7, 61.664, 61.754]\ntempF_2018 = [61.556, 61.61, 61.664, 61.682, 61.556, 61.466, 61.556, 61.448, 61.52, 61.916, 61.556, 61.718]\ntempF_2019 = [61.754, 61.79, 62.186, 61.898, 61.61, 61.7, 61.772, 61.79, 61.754, 61.898, 61.862, 62.042]\ntempF_2020 = [62.186, 62.312, 62.186, 62.114, 61.898, 61.736, 61.7, 61.646, 61.862, 61.664, 62.06, 61.538]\ntempF_2021 = [61.538, 61.232, 61.664, 61.43, 61.484, 61.592, 61.736, 61.556, 61.736, 61.88, 61.772, 61.61]\n\n# List of yearly lists (may or may not be useful)\ntempF_list = [tempF_2015, tempF_2016, tempF_2017, tempF_2018, tempF_2019, tempF_2020, tempF_2021]\n\n# List of years (probably useful)\nyears = [2015, 2016, 2017, 2018, 2019, 2020, 2021]\n\nBeing able to walk through code line by line is especially helpful when you can step into loops or functions that you would otherwise need print statements to see what is happening inside of.\nTry placing a breakpoint on the first line and seeing how the variables change when you step through this for loop.\n\nfor tempF_year,year in zip(tempF_list,years):\n print(f\"{year} temperatures: {tempF_year}\")\n\n[61.628, 61.7, 61.808, 61.448, 61.52, 61.538, 61.394, 61.52, 61.61, 62.042, 61.988, 62.168]\n2015 temperatures: [61.628, 61.7, 61.808, 61.448, 61.52, 61.538, 61.394, 61.52, 61.61, 62.042, 61.988, 62.168]\n2016 temperatures: [62.186, 62.546, 62.528, 62.06, 61.79, 61.52, 61.61, 61.916, 61.718, 61.682, 61.736, 61.628]\n2017 temperatures: [61.916, 62.132, 62.168, 61.772, 61.718, 61.376, 61.556, 61.646, 61.466, 61.7, 61.664, 61.754]\n2018 temperatures: [61.556, 61.61, 61.664, 61.682, 61.556, 61.466, 61.556, 61.448, 61.52, 61.916, 61.556, 61.718]\n2019 temperatures: [61.754, 61.79, 62.186, 61.898, 61.61, 61.7, 61.772, 61.79, 61.754, 61.898, 61.862, 62.042]\n2020 temperatures: [62.186, 62.312, 62.186, 62.114, 61.898, 61.736, 61.7, 61.646, 61.862, 61.664, 62.06, 61.538]\n2021 temperatures: [61.538, 61.232, 61.664, 61.43, 61.484, 61.592, 61.736, 61.556, 61.736, 61.88, 61.772, 61.61]\n\n\nNow let’s try debugging some code. Suppose your friend has written some code to solve problem 4a but is running into an error. Let’s try using the debugger to fix it:\nPractice 2-1 problem 4a: Calculate the monthly global temperature anomalies (deviation from the mean) in °C for 2015-2021. The mean global land-ocean surface temperature calculated over the 20th century was 15.6°C.\n\n# 4a. Monthly temperature anomalies\n\n# Innitialize a list of yearly lists of anomalies in °C\nanomC_list = [] \n\n# Iterate through yearly lists\nfor tempsF in tempF_list:\n # Empty list for a single year\n anomC = []\n # Iterate through a single year\n for tempF in tempsF:\n # Convert temp from F to C\n tempC = tempF - 32.0 * (5/9)\n # Calculate temp anomaly in °C\n anomC = tempC - 15.6\n # Append to anomC\n anomC.append(round(anomC,2))\n # Add list of anomalies in °C to anomC_list (list of lists)\n anomC_list.append(anomC)\n\nfor anom_list,year in zip(anomC_list,years):\n print(f\"{year} anomalies: {anom_list}\")\n\n28.25022222222222\n\n\n\n---------------------------------------------------------------------------\nAttributeError Traceback (most recent call last)\nCell In[2], line 17\n 15 anomC = tempC - 15.6\n 16 # Append to anomC\n---> 17 anomC.append(round(anomC,2))\n 18 # Add list of anomalies in °C to anomC_list (list of lists)\n 19 anomC_list.append(anomC)\n\nAttributeError: 'float' object has no attribute 'append'\n\n\n\nNow you try it on your own. Try using the debugger to debug the following answer to question 4b:\nCreate a new list with the mean monthly global surface temperature anomalies in °C for 2015-2023 (i.e. calculate the mean temperature anomaly for each month and put these values in a list).\n\n# 4b. Mean monthly temperature anomaly, 2015-2021\n\n# Empty list for monthly mean temperature anomalies\nmonthly_means = []\n# Set up generic counter loop with 12 iterations\nfor i in range(12 + 1):\n # Generate list of temperature anomalies for each month by extracting the ith value from each sublist.\n monthly = []\n # generate a list of anomalies for month i with anomalies from all years in it\n for anom in anomC_list:\n # get the ith anomaly (ith month) from each year list\n monthly.append(anoms[i + 1])\n # Calculate mean for month i\n monthly_mean = np.sum(monthly)\n # Add mean for month i to list of means\n monthly_means.append(round(monthly_means,4))\n\n# Print list of mean monthly temperature anomalies, 2015-2018.\nprint(f\"monthly_means: {monthly_means}\")" + "objectID": "course-materials/cheatsheets/matplotlib.html#basic-setup", + "href": "course-materials/cheatsheets/matplotlib.html#basic-setup", + "title": "EDS 217 Cheatsheet", + "section": "Basic Setup", + "text": "Basic Setup\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport numpy as np\n\n# Set style for seaborn (optional)\nsns.set_style(\"whitegrid\")" }, { - "objectID": "course-materials/lectures/99_dry_vs_wet.html", - "href": "course-materials/lectures/99_dry_vs_wet.html", - "title": "EDS 217, Lecture 4: DRY 🏜 vs. WET 🌊", - "section": "", - "text": "dry.jpg" + "objectID": "course-materials/cheatsheets/matplotlib.html#creating-a-figure-and-axes", + "href": "course-materials/cheatsheets/matplotlib.html#creating-a-figure-and-axes", + "title": "EDS 217 Cheatsheet", + "section": "Creating a Figure and Axes", + "text": "Creating a Figure and Axes\n# Create a new figure and axis\nfig, ax = plt.subplots(figsize=(10, 6))\n\n# Create multiple subplots\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))" }, { - "objectID": "course-materials/lectures/99_dry_vs_wet.html#dry-vs.-wet", - "href": "course-materials/lectures/99_dry_vs_wet.html#dry-vs.-wet", - "title": "EDS 217, Lecture 4: DRY 🏜 vs. WET 🌊", - "section": "DRY vs. WET", - "text": "DRY vs. WET\nIf DRY means “Don’t Repeat Yourself”… then WET means “Write Every Time”, or “We Enjoy Typing”\nDon’t write WET code!\n\nHow to DRY out your code\nWe write DRY code - or we DRY out WET code - through a combination of abstraction and normalization." + "objectID": "course-materials/cheatsheets/matplotlib.html#common-plot-types", + "href": "course-materials/cheatsheets/matplotlib.html#common-plot-types", + "title": "EDS 217 Cheatsheet", + "section": "Common Plot Types", + "text": "Common Plot Types\n\nLine Plot\nx = np.linspace(0, 10, 100)\ny = np.sin(x)\nax.plot(x, y, label='sin(x)')\n\n\nScatter Plot\nx = np.random.rand(50)\ny = np.random.rand(50)\nplt.scatter(x, y, alpha=0.5)\n\n\nBar Plot\ncategories = ['A', 'B', 'C', 'D']\nvalues = [3, 7, 2, 5]\nplt.bar(categories, values)\n\n\nHistogram\ndata = np.random.randn(1000)\nax.hist(data, bins=30, edgecolor='black')" }, { - "objectID": "course-materials/lectures/99_dry_vs_wet.html#abstraction", - "href": "course-materials/lectures/99_dry_vs_wet.html#abstraction", - "title": "EDS 217, Lecture 4: DRY 🏜 vs. WET 🌊", - "section": "Abstraction", - "text": "Abstraction\nThe “principle of abstraction” aims to reduce duplication of information (usually code) in a program whenever it is practical to do so:\n“Each significant piece of functionality in a program should be implemented in just one place in the source code. Where similar functions are carried out by distinct pieces of code, it is generally beneficial to combine them into one by abstracting out the varying parts.”\nBenjamin C. Pierce - Types and Programming Languages\n\nAbstraction Example\nThe easiest way to understand abstraction is to see it in action. Here’s an example that you are already familiar with; determining the energy emitted by an object as a function of its temperature:\n\\(Q = \\epsilon \\sigma T^4\\)\nwhere \\(\\epsilon\\) is an object’s emmissivity, \\(\\sigma\\) is the Stefan-Boltzmann constant, and \\(T\\) is temperature in degrees Kelvin.\n\n\nAbstraction Example\nWe might write the following code to determine \\(Q\\):\n\n# How much energy is emitted by an object at a certain temperature?\nε = 1 # emissivity [-]\nσ = 5.67e-8 # stefan-boltzmann constant [W/T^4]\nT_C = 40 # temperature [deg-C]\n\nQ = ε * σ * (T_C+273.15)**4\nprint(Q)\n\n\n\nAbstraction Example\nBut this code is going to get very WET very fast.\n\n# How much energy is emitted by an object at a certain temperature?\nε = 1 # emissivity [-]\nσ = 5.67e-8 # stefan-boltzmann constant [W/m^2/K^4]\nT_C = 40 # temperature [deg-C]\n\nQ = ε * σ * (T_C+273.15)**4\n\n# New T value? Different epsilon? What about a bunch of T values?\nT_2 = 30\n\nQ2 = ε * σ * (T_2+273.15)**4\n\n\n\n\nAbstraction Example\nHere’s a DRY version obtained using abstraction:\n\n# energy.py contains a function to calculate Q from T \nfrom energy import Q \n\nT = 40 # deg-C\nE = Q(T, unit='C')\n\n\n\nAbstraction Summary, Part 1\n\nWe keep our code DRY by using abstraction. In addition to functions, python also provides Classes as another important way to create abstractions.\nFunctions and Classes are the subject of this tomorrow’s exercise.\n\n\n\nAbstraction Summary, Part 2\n\nIn general, the process of keeping code DRY through successive layers of abstraction is known as re-factoring.\nThe “Rule of Three” states that you should probably consider refactoring (i.e. adding abstraction) whenever you find your code doing the same thing three times or more." + "objectID": "course-materials/cheatsheets/matplotlib.html#customizing-plots", + "href": "course-materials/cheatsheets/matplotlib.html#customizing-plots", + "title": "EDS 217 Cheatsheet", + "section": "Customizing Plots", + "text": "Customizing Plots\n\nLabels and Title\nax.set_xlabel('X-axis label')\nax.set_ylabel('Y-axis label')\nax.set_title('Plot Title')\n\n\nLegend\nax.legend()\n\n\nAxis Limits\nax.set_xlim(0, 10)\nax.set_ylim(-1, 1)\n\n\nGrid\nax.grid(True, linestyle='--', alpha=0.7)\n\n\nTicks\nax.set_xticks([0, 2, 4, 6, 8, 10])\nax.set_yticks([-1, -0.5, 0, 0.5, 1])" }, { - "objectID": "course-materials/lectures/99_dry_vs_wet.html#normalization", - "href": "course-materials/lectures/99_dry_vs_wet.html#normalization", - "title": "EDS 217, Lecture 4: DRY 🏜 vs. WET 🌊", - "section": "Normalization", - "text": "Normalization\nNormalization is the process of structuring data in order to reduce redundancy and improve integrity." + "objectID": "course-materials/cheatsheets/matplotlib.html#color-and-style", + "href": "course-materials/cheatsheets/matplotlib.html#color-and-style", + "title": "EDS 217 Cheatsheet", + "section": "Color and Style", + "text": "Color and Style\n\nChanging Colors\nax.plot(x, y, color='r') # 'r' for red\nax.scatter(x, y, c='blue')\n\n\nLine Styles\nax.plot(x, y, linestyle='--') # dashed line\nax.plot(x, y, ls=':') # dotted line\n\n\nMarker Styles\nax.plot(x, y, marker='o') # circles\nax.plot(x, y, marker='s') # squares" }, { - "objectID": "course-materials/lectures/99_dry_vs_wet.html#normalization-1", - "href": "course-materials/lectures/99_dry_vs_wet.html#normalization-1", - "title": "EDS 217, Lecture 4: DRY 🏜 vs. WET 🌊", - "section": "Normalization", - "text": "Normalization\nSome of the key principles of Normalization include:\n\nAll data have a Primary Key, which uniquely identifies a record. Usually, in python, this key is called an Index.\nAtomic columns, meaning entries contain a single value. This means no collections should appear as elements within a data table. (i.e. “cells” in structured data should not contain lists!)\nNo transitive dependencies. This means that there should not be implicit associations between columns within data tables.\n\n\nPrimary Keys\nThis form of normalization is easy to obtain, as the idea of an Index is embedded in almost any Python data structure, and a core component of data structures witin pandas, which is the most popular data science library in python (coming next week!).\n\n\nPrimary Keys\n\n# All DataFrames in pandas are created with an index (i.e unique primary key)\nimport pandas as pd\naverage_high_temps = [18.3, 18.3, 18.9, 20.6, 21.1, 21.7,\n 23.9, 24.4, 23.9, 22.8, 20.6, 18.3]\nsb_high_temp = pd.DataFrame(\n average_high_temps, # This list will become a single column of values\n columns=['Average_High_Temperature'] # This is the name of the column\n) # NOTE: use sb_high_temp.head() py->month_list\n#sb_high_temp.index = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']\nsb_high_temp.head()\n\n\n\nAtomic Columns\nThe idea of atomic columns is that each element in a data structure should contain a unique value. This requirement is harder to obtain and you will sometimes violate it.\n\n# import pandas as pd\naverage_high_temps = [18.3, 18.3, 18.9, 20.6, 21.1, 21.7, 23.9, 24.4, 23.9, 22.8, 20.6, 18.3]\naverage_rainfall = [110.7, 119.1, 74.2, 31.5, 8.4, 2.3, 0.5, 1.3, 3.6, 22.9, 45.5, 77.2]\n\n# THIS DATAFRAME IS NOT ATOMIC. EACH ELEMENT IN THE COLUMN IS A LIST.\nsb_climate = pd.DataFrame([\n [average_high_temps, # The first column will contain a list.\n average_rainfall]], # The second column will also contain a list.\n columns=['Monthly Average Temp', 'Monthly Average Rainfall'] # Column names\n)\nsb_climate.head()\n\n\n\nAtomic Columns\nThe idea of atomic columns is that each element in a data structure should contain a unique value. This requirement is harder to obtain and you will sometimes violate it.\n\nimport pandas as pd\naverage_high_temps = [18.3, 18.3, 18.9, 20.6, 21.1, 21.7, 23.9, 24.4, 23.9, 22.8, 20.6, 18.3]\naverage_rainfall = [110.7, 119.1, 74.2, 31.5, 8.4, 2.3, 0.5, 1.3, 3.6, 22.9, 45.5, 77.2]\n\n# THIS DATAFRAME IS ATOMIC. EACH ELEMENT IN THE COLUMN IS A SINGLE VALUE.\nsb_climate = pd.DataFrame({ # Using a dict to create the data frame.\n 'Average_High_Temperature':average_high_temps, # This is the first column\n 'Average_Rainfall':average_rainfall # This is the second column\n})\nsb_climate.index = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']\nsb_climate.head()\n\n\n\nTransitive Dependencies\nThe idea of transitive dependencies is the inclusion of multiple associated attributes within the same data structure.\n\nTransitive dependencies make updating data very difficult, but they can be helpful in analyzying data.\nSo we should only introduce them in data that we will not be editing.\n\nUsually environmental data, and especially timeseries, are rarely modified after creation. So we don’t need to worry as much about these dependencies.\nFor example, contrast a data record of “temperatures through time” to a data record of “user contacts in a social network”.\n\n\nTransitive Dependencies\nThe idea of transitive dependencies is the inclusion of multiple associated attributes within the same data structure.\n\nimport pandas as pd\naverage_high_temps = [18.3, 18.3, 18.9, 20.6, 21.1, 21.7, 23.9, 24.4, 23.9, 22.8, 20.6, 18.3]\naverage_rainfall = [110.7, 119.1, 74.2, 31.5, 8.4, 2.3, 0.5, 1.3, 3.6, 22.9, 45.5, 77.2]\n\n# TRANSITIVE ASSOCIATIONS EXIST BETWEEN MONTHS AND SEASONS IN THIS DATAFRAME:\nmonth = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']\nseason = ['Winter', 'Winter', 'Spring', 'Spring', 'Spring', 'Summer', 'Summer', 'Summer', 'Fall', 'Fall', 'Fall', 'Winter']\nsb_climate = pd.DataFrame({ # Using a dict to create the data frame.\n 'Month': month, # Adding month as the first column of the data frame\n 'Season': season, # Adding the season for each month (this is a transitive dependency)\n 'Avg_High_Temp':average_high_temps, # This is the third column\n 'Avg_Rain':average_rainfall # This is the fourth column\n})\nsb_climate.head()\n\n\n\nNormalization Summary\nIn general, for data analysis, basic normalization is handled for you.\n\nFor read only data with fixed associations, a lack of normalization is manageable.\nHowever, many analyses are easier if you structure your data in ways that are as normalized as possible.\nIf you are collecting data then it is important to develop an organization structure that is normalized." + "objectID": "course-materials/cheatsheets/matplotlib.html#saving-and-displaying", + "href": "course-materials/cheatsheets/matplotlib.html#saving-and-displaying", + "title": "EDS 217 Cheatsheet", + "section": "Saving and Displaying", + "text": "Saving and Displaying\n\nSaving the Figure\nplt.savefig('my_plot.png', dpi=300, bbox_inches='tight')\n\n\nDisplaying the Plot\nplt.show()" }, { - "objectID": "course-materials/lectures/99_dry_vs_wet.html#the-end", - "href": "course-materials/lectures/99_dry_vs_wet.html#the-end", - "title": "EDS 217, Lecture 4: DRY 🏜 vs. WET 🌊", - "section": "The End", - "text": "The End" + "objectID": "course-materials/cheatsheets/matplotlib.html#useful-tips-for-seaborn-users", + "href": "course-materials/cheatsheets/matplotlib.html#useful-tips-for-seaborn-users", + "title": "EDS 217 Cheatsheet", + "section": "Useful Tips for Seaborn Users", + "text": "Useful Tips for Seaborn Users\n\nUse plt.subplots() to create custom layouts that Seaborn doesn’t provide.\nAccess the underlying Matplotlib Axes object in Seaborn plots:\ng = sns.scatterplot(x='x', y='y', data=df)\ng.set_xlabel('Custom X Label')\nCombine Seaborn and Matplotlib in the same figure:\nfig, ax = plt.subplots()\nsns.scatterplot(x='x', y='y', data=df, ax=ax)\nax.plot(x, y, color='r', linestyle='--')\nUse Matplotlib’s plt.tight_layout() to automatically adjust subplot parameters for better spacing.\n\nRemember, most Seaborn functions return a Matplotlib Axes object, allowing you to further customize your plots using Matplotlib functions." }, { - "objectID": "cheatsheets.html", - "href": "cheatsheets.html", - "title": "Cheatsheets", + "objectID": "course-materials/cheatsheets/dictionaries.html", + "href": "course-materials/cheatsheets/dictionaries.html", + "title": "EDS 217 Cheatsheet", "section": "", - "text": "Setting up Python\nJupyterLab Variable Inspection and Shortcuts" + "text": "my_dict = {}\n\n\n\nmy_dict = {\"name\": \"Alice\", \"age\": 30}\n\n\n\nmixed_dict = {\"number\": 42, \"text\": \"hello\", \"list\": [1, 2, 3], \"flag\": True}\n\n\n\nnested_dict = {\n \"person1\": {\"name\": \"Alice\", \"age\": 25},\n \"person2\": {\"name\": \"Bob\", \"age\": 30}\n}" }, { - "objectID": "cheatsheets.html#setup-tools", - "href": "cheatsheets.html#setup-tools", - "title": "Cheatsheets", + "objectID": "course-materials/cheatsheets/dictionaries.html#creating-dictionaries", + "href": "course-materials/cheatsheets/dictionaries.html#creating-dictionaries", + "title": "EDS 217 Cheatsheet", "section": "", - "text": "Setting up Python\nJupyterLab Variable Inspection and Shortcuts" + "text": "my_dict = {}\n\n\n\nmy_dict = {\"name\": \"Alice\", \"age\": 30}\n\n\n\nmixed_dict = {\"number\": 42, \"text\": \"hello\", \"list\": [1, 2, 3], \"flag\": True}\n\n\n\nnested_dict = {\n \"person1\": {\"name\": \"Alice\", \"age\": 25},\n \"person2\": {\"name\": \"Bob\", \"age\": 30}\n}" }, { - "objectID": "cheatsheets.html#python-basics", - "href": "cheatsheets.html#python-basics", - "title": "Cheatsheets", - "section": "🐍 Python Basics", - "text": "🐍 Python Basics\n\nPython Basics\nPython Functions\nLists\nDictionaries\nSets\nControl Flows\nList and Dictionary Comprehensions\nprint()" + "objectID": "course-materials/cheatsheets/dictionaries.html#accessing-elements", + "href": "course-materials/cheatsheets/dictionaries.html#accessing-elements", + "title": "EDS 217 Cheatsheet", + "section": "2. Accessing Elements", + "text": "2. Accessing Elements\n\n2.1 Access by Key\nprint(my_dict[\"name\"]) # Output: Alice\n\n\n2.2 Safely Access Using .get()\nprint(my_dict.get(\"name\")) # Output: Alice\nprint(my_dict.get(\"profession\", \"Unknown\")) # Output: Unknown if not present" }, { - "objectID": "cheatsheets.html#numpy", - "href": "cheatsheets.html#numpy", - "title": "Cheatsheets", - "section": "🔢 NumPy", - "text": "🔢 NumPy\n\nNumpy Basics\nNumpy Random Number Generation" + "objectID": "course-materials/cheatsheets/dictionaries.html#modifying-dictionaries", + "href": "course-materials/cheatsheets/dictionaries.html#modifying-dictionaries", + "title": "EDS 217 Cheatsheet", + "section": "3. Modifying Dictionaries", + "text": "3. Modifying Dictionaries\n\n3.1 Add or Update an Element\nmy_dict[\"profession\"] = \"Engineer\" # Adds a new key or updates if exists\n\n\n3.2 Remove Elements\ndel my_dict[\"age\"] # Removes the key 'age'\nprofession = my_dict.pop(\"profession\", \"No profession found\") # Removes and returns\nmy_dict.clear() # Clears all elements" }, { - "objectID": "cheatsheets.html#pandas", - "href": "cheatsheets.html#pandas", - "title": "Cheatsheets", - "section": "🐼 Pandas", - "text": "🐼 Pandas\n\nPandas 1-pager PDF\nPandas Series\nPandas DataFrames\nPandas DataFrame Methods and Data Science Workflows\nread_csv()\nData Cleaning\nData Filtering & Selection\nData Grouping & Aggregation\nMerging and Joining Data\nWorking with Timeseries" + "objectID": "course-materials/cheatsheets/dictionaries.html#dictionary-operations", + "href": "course-materials/cheatsheets/dictionaries.html#dictionary-operations", + "title": "EDS 217 Cheatsheet", + "section": "4. Dictionary Operations", + "text": "4. Dictionary Operations\n\n4.1 Check if Key Exists\n\"name\" in my_dict # Returns True if 'name' is a key\n\n\n4.2 Iterate Through Keys, Values, or Items\nfor key in my_dict.keys():\n print(key)\nfor value in my_dict.values():\n print(value)\nfor key, value in my_dict.items():\n print(f\"{key}: {value}\")\n\n\n4.3 Dictionary Comprehensions\nsquared = {x: x**2 for x in range(5)} # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}\n\n\n4.4 Merge Dictionaries\ndict1 = {\"name\": \"Alice\", \"age\": 25}\ndict2 = {\"city\": \"New York\", \"age\": 30}\nmerged = {**dict1, **dict2} # Python 3.5+ method" }, { - "objectID": "cheatsheets.html#data-visualization", - "href": "cheatsheets.html#data-visualization", - "title": "Cheatsheets", - "section": "📊 Data Visualization", - "text": "📊 Data Visualization\n\nSeaborn Basics\nMatplotlib Basics\nplt.bar() [Matplotlib]\nChart Customization" + "objectID": "course-materials/cheatsheets/dictionaries.html#common-dictionary-methods", + "href": "course-materials/cheatsheets/dictionaries.html#common-dictionary-methods", + "title": "EDS 217 Cheatsheet", + "section": "5. Common Dictionary Methods", + "text": "5. Common Dictionary Methods\n\n5.1 Get Dictionary Length\nlen(my_dict) # Returns the number of key-value pairs\n\n\n5.2 Copy a Dictionary\nnew_dict = my_dict.copy() # Creates a shallow copy of the dictionary\n\n\n5.3 Get All Keys or Values\nall_keys = list(my_dict.keys())\nall_values = list(my_dict.values())\n\n\n5.4 Update Dictionary\nmy_dict.update({\"age\": 26, \"city\": \"Boston\"}) # Updates and adds multiple keys\n\n\n5.5 Set Default Value for Key\nmy_dict.setdefault(\"age\", 29) # Sets 'age' to 29 if key is not present" }, - { - "objectID": "course-materials/cheatsheets/bar_plots.html", - "href": "course-materials/cheatsheets/bar_plots.html", + { + "objectID": "course-materials/cheatsheets/dictionaries.html#common-dictionary-pitfalls", + "href": "course-materials/cheatsheets/dictionaries.html#common-dictionary-pitfalls", "title": "EDS 217 Cheatsheet", - "section": "", - "text": "Step 1: Import Libraries\nFirst, you need to import the necessary libraries. We’ll use matplotlib.pyplot for plotting.\nimport matplotlib.pyplot as plt\n\n\nStep 2: Prepare Your Data\nCreate lists or arrays for the categories (x-axis) and their corresponding values (y-axis). Here’s a simple example:\ncategories = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']\nvalues = [-22.89, -20.7, -20.69, -11.76, -0.8, 8.59, 11.22, 7.23, -0.11, -10.54, -18.34, -21.44]\n\n\nStep 3: Create the Bar Plot\nUse the bar() function from pyplot to create the bar plot. Pass the categories and values as arguments.\nplt.bar(categories, values)\n\n\nStep 4: Add Labels and Title\nYou can enhance your plot by adding a title and labels for the x-axis and y-axis.\nplt.xlabel('Categories')\nplt.ylabel('Values')\nplt.title('Simple Bar Plot')\n\n\nStep 5: Display the Plot\nFinally, use plt.show() to display the plot.\nplt.show()\n\n\nComplete Code Example\nHere’s the complete code to create a simple bar plot:\nimport matplotlib.pyplot as plt\n\n# Data for the plot\ncategories = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']\nvalues = [-22.89, -20.7, -20.69, -11.76, -0.8, 8.59, 11.22, 7.23, -0.11, -10.54, -18.34, -21.44]\n\n# Create the bar plot\nplt.bar(categories, values)\n\n# Add labels and title\nplt.xlabel('Months')\nplt.ylabel('Average Temperature, deg-C')\nplt.title('Toolik Lake LTER Average Temperatures, 2008-2019')\n\n# Display the plot\nplt.show()\n\n\nCustomizing the Bar Plot\nYou can further customize your bar plot with additional options:\n\nColor: Set the color of the bars using the color parameter.\nplt.bar(categories, values, color='skyblue')\nWidth: Adjust the width of the bars using the width parameter.\nplt.bar(categories, values, width=0.5)\nAdd Grid: Make the plot easier to read by adding a grid.\nplt.grid(axis='y', linestyle='--', alpha=0.7)\nHorizontal Bar Plot: Use barh() for horizontal bar plots.\nplt.barh(categories, values, color='skyblue')\n\n\n\nExample with Customizations\nimport matplotlib.pyplot as plt\n\n# Data for the plot\ncategories = ['Category A', 'Category B', 'Category C', 'Category D']\nvalues = [23, 17, 35, 29]\n\n# Create the bar plot with customizations\nplt.bar(categories, values, color='skyblue', width=0.5)\n\n# Add labels and title\nplt.xlabel('Categories')\nplt.ylabel('Values')\nplt.title('Simple Bar Plot')\n\n# Add grid\nplt.grid(axis='y', linestyle='--', alpha=0.7)\n\n# Display the plot\nplt.show()" + "section": "6. Common Dictionary Pitfalls", + "text": "6. Common Dictionary Pitfalls\n\n6.1 Avoid Modifying a Dictionary While Iterating\n# Incorrect\nfor key in my_dict:\n if key.startswith('a'):\n del my_dict[key]\n\n# Correct (Using a copy of keys)\nfor key in list(my_dict.keys()):\n if key.startswith('a'):\n del my_dict[key]" }, { "objectID": "course-materials/cheatsheets/data_cleaning.html", @@ -2016,1103 +2107,1110 @@ "text": "Resources for More Information\n\nPandas Documentation\n10 Minutes to Pandas\nPandas Cheat Sheet\n\nRemember, these are just some of the most common operations for cleaning DataFrames. As you become more comfortable with pandas, you’ll discover many more powerful functions and methods to help you clean and manipulate your data effectively." }, { - "objectID": "course-materials/cheatsheets/dictionaries.html", - "href": "course-materials/cheatsheets/dictionaries.html", + "objectID": "course-materials/cheatsheets/bar_plots.html", + "href": "course-materials/cheatsheets/bar_plots.html", "title": "EDS 217 Cheatsheet", "section": "", - "text": "my_dict = {}\n\n\n\nmy_dict = {\"name\": \"Alice\", \"age\": 30}\n\n\n\nmixed_dict = {\"number\": 42, \"text\": \"hello\", \"list\": [1, 2, 3], \"flag\": True}\n\n\n\nnested_dict = {\n \"person1\": {\"name\": \"Alice\", \"age\": 25},\n \"person2\": {\"name\": \"Bob\", \"age\": 30}\n}" + "text": "Step 1: Import Libraries\nFirst, you need to import the necessary libraries. We’ll use matplotlib.pyplot for plotting.\nimport matplotlib.pyplot as plt\n\n\nStep 2: Prepare Your Data\nCreate lists or arrays for the categories (x-axis) and their corresponding values (y-axis). Here’s a simple example:\ncategories = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']\nvalues = [-22.89, -20.7, -20.69, -11.76, -0.8, 8.59, 11.22, 7.23, -0.11, -10.54, -18.34, -21.44]\n\n\nStep 3: Create the Bar Plot\nUse the bar() function from pyplot to create the bar plot. Pass the categories and values as arguments.\nplt.bar(categories, values)\n\n\nStep 4: Add Labels and Title\nYou can enhance your plot by adding a title and labels for the x-axis and y-axis.\nplt.xlabel('Categories')\nplt.ylabel('Values')\nplt.title('Simple Bar Plot')\n\n\nStep 5: Display the Plot\nFinally, use plt.show() to display the plot.\nplt.show()\n\n\nComplete Code Example\nHere’s the complete code to create a simple bar plot:\nimport matplotlib.pyplot as plt\n\n# Data for the plot\ncategories = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']\nvalues = [-22.89, -20.7, -20.69, -11.76, -0.8, 8.59, 11.22, 7.23, -0.11, -10.54, -18.34, -21.44]\n\n# Create the bar plot\nplt.bar(categories, values)\n\n# Add labels and title\nplt.xlabel('Months')\nplt.ylabel('Average Temperature, deg-C')\nplt.title('Toolik Lake LTER Average Temperatures, 2008-2019')\n\n# Display the plot\nplt.show()\n\n\nCustomizing the Bar Plot\nYou can further customize your bar plot with additional options:\n\nColor: Set the color of the bars using the color parameter.\nplt.bar(categories, values, color='skyblue')\nWidth: Adjust the width of the bars using the width parameter.\nplt.bar(categories, values, width=0.5)\nAdd Grid: Make the plot easier to read by adding a grid.\nplt.grid(axis='y', linestyle='--', alpha=0.7)\nHorizontal Bar Plot: Use barh() for horizontal bar plots.\nplt.barh(categories, values, color='skyblue')\n\n\n\nExample with Customizations\nimport matplotlib.pyplot as plt\n\n# Data for the plot\ncategories = ['Category A', 'Category B', 'Category C', 'Category D']\nvalues = [23, 17, 35, 29]\n\n# Create the bar plot with customizations\nplt.bar(categories, values, color='skyblue', width=0.5)\n\n# Add labels and title\nplt.xlabel('Categories')\nplt.ylabel('Values')\nplt.title('Simple Bar Plot')\n\n# Add grid\nplt.grid(axis='y', linestyle='--', alpha=0.7)\n\n# Display the plot\nplt.show()" }, { - "objectID": "course-materials/cheatsheets/dictionaries.html#creating-dictionaries", - "href": "course-materials/cheatsheets/dictionaries.html#creating-dictionaries", - "title": "EDS 217 Cheatsheet", + "objectID": "cheatsheets.html", + "href": "cheatsheets.html", + "title": "Cheatsheets", "section": "", - "text": "my_dict = {}\n\n\n\nmy_dict = {\"name\": \"Alice\", \"age\": 30}\n\n\n\nmixed_dict = {\"number\": 42, \"text\": \"hello\", \"list\": [1, 2, 3], \"flag\": True}\n\n\n\nnested_dict = {\n \"person1\": {\"name\": \"Alice\", \"age\": 25},\n \"person2\": {\"name\": \"Bob\", \"age\": 30}\n}" + "text": "Setting up Python\nJupyterLab Variable Inspection and Shortcuts" }, { - "objectID": "course-materials/cheatsheets/dictionaries.html#accessing-elements", - "href": "course-materials/cheatsheets/dictionaries.html#accessing-elements", - "title": "EDS 217 Cheatsheet", - "section": "2. Accessing Elements", - "text": "2. Accessing Elements\n\n2.1 Access by Key\nprint(my_dict[\"name\"]) # Output: Alice\n\n\n2.2 Safely Access Using .get()\nprint(my_dict.get(\"name\")) # Output: Alice\nprint(my_dict.get(\"profession\", \"Unknown\")) # Output: Unknown if not present" + "objectID": "cheatsheets.html#setup-tools", + "href": "cheatsheets.html#setup-tools", + "title": "Cheatsheets", + "section": "", + "text": "Setting up Python\nJupyterLab Variable Inspection and Shortcuts" }, { - "objectID": "course-materials/cheatsheets/dictionaries.html#modifying-dictionaries", - "href": "course-materials/cheatsheets/dictionaries.html#modifying-dictionaries", - "title": "EDS 217 Cheatsheet", - "section": "3. Modifying Dictionaries", - "text": "3. Modifying Dictionaries\n\n3.1 Add or Update an Element\nmy_dict[\"profession\"] = \"Engineer\" # Adds a new key or updates if exists\n\n\n3.2 Remove Elements\ndel my_dict[\"age\"] # Removes the key 'age'\nprofession = my_dict.pop(\"profession\", \"No profession found\") # Removes and returns\nmy_dict.clear() # Clears all elements" + "objectID": "cheatsheets.html#python-basics", + "href": "cheatsheets.html#python-basics", + "title": "Cheatsheets", + "section": "🐍 Python Basics", + "text": "🐍 Python Basics\n\nPython Basics\nPython Functions\nLists\nDictionaries\nSets\nControl Flows\nList and Dictionary Comprehensions\nprint()" }, { - "objectID": "course-materials/cheatsheets/dictionaries.html#dictionary-operations", - "href": "course-materials/cheatsheets/dictionaries.html#dictionary-operations", - "title": "EDS 217 Cheatsheet", - "section": "4. Dictionary Operations", - "text": "4. Dictionary Operations\n\n4.1 Check if Key Exists\n\"name\" in my_dict # Returns True if 'name' is a key\n\n\n4.2 Iterate Through Keys, Values, or Items\nfor key in my_dict.keys():\n print(key)\nfor value in my_dict.values():\n print(value)\nfor key, value in my_dict.items():\n print(f\"{key}: {value}\")\n\n\n4.3 Dictionary Comprehensions\nsquared = {x: x**2 for x in range(5)} # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}\n\n\n4.4 Merge Dictionaries\ndict1 = {\"name\": \"Alice\", \"age\": 25}\ndict2 = {\"city\": \"New York\", \"age\": 30}\nmerged = {**dict1, **dict2} # Python 3.5+ method" + "objectID": "cheatsheets.html#numpy", + "href": "cheatsheets.html#numpy", + "title": "Cheatsheets", + "section": "🔢 NumPy", + "text": "🔢 NumPy\n\nNumpy Basics\nNumpy Random Number Generation" }, { - "objectID": "course-materials/cheatsheets/dictionaries.html#common-dictionary-methods", - "href": "course-materials/cheatsheets/dictionaries.html#common-dictionary-methods", - "title": "EDS 217 Cheatsheet", - "section": "5. Common Dictionary Methods", - "text": "5. Common Dictionary Methods\n\n5.1 Get Dictionary Length\nlen(my_dict) # Returns the number of key-value pairs\n\n\n5.2 Copy a Dictionary\nnew_dict = my_dict.copy() # Creates a shallow copy of the dictionary\n\n\n5.3 Get All Keys or Values\nall_keys = list(my_dict.keys())\nall_values = list(my_dict.values())\n\n\n5.4 Update Dictionary\nmy_dict.update({\"age\": 26, \"city\": \"Boston\"}) # Updates and adds multiple keys\n\n\n5.5 Set Default Value for Key\nmy_dict.setdefault(\"age\", 29) # Sets 'age' to 29 if key is not present" + "objectID": "cheatsheets.html#pandas", + "href": "cheatsheets.html#pandas", + "title": "Cheatsheets", + "section": "🐼 Pandas", + "text": "🐼 Pandas\n\nPandas 1-pager PDF\nPandas Series\nPandas DataFrames\nPandas DataFrame Methods and Data Science Workflows\nread_csv()\nData Cleaning\nData Filtering & Selection\nData Grouping & Aggregation\nMerging and Joining Data\nWorking with Timeseries" }, { - "objectID": "course-materials/cheatsheets/dictionaries.html#common-dictionary-pitfalls", - "href": "course-materials/cheatsheets/dictionaries.html#common-dictionary-pitfalls", - "title": "EDS 217 Cheatsheet", - "section": "6. Common Dictionary Pitfalls", - "text": "6. Common Dictionary Pitfalls\n\n6.1 Avoid Modifying a Dictionary While Iterating\n# Incorrect\nfor key in my_dict:\n if key.startswith('a'):\n del my_dict[key]\n\n# Correct (Using a copy of keys)\nfor key in list(my_dict.keys()):\n if key.startswith('a'):\n del my_dict[key]" + "objectID": "cheatsheets.html#data-visualization", + "href": "cheatsheets.html#data-visualization", + "title": "Cheatsheets", + "section": "📊 Data Visualization", + "text": "📊 Data Visualization\n\nSeaborn Basics\nMatplotlib Basics\nplt.bar() [Matplotlib]\nChart Customization" }, { - "objectID": "course-materials/cheatsheets/matplotlib.html", - "href": "course-materials/cheatsheets/matplotlib.html", - "title": "EDS 217 Cheatsheet", + "objectID": "course-materials/lectures/99_dry_vs_wet.html", + "href": "course-materials/lectures/99_dry_vs_wet.html", + "title": "EDS 217, Lecture 4: DRY 🏜 vs. WET 🌊", "section": "", - "text": "Tip\n\n\n\nFor 🤯 inspiration + 👩‍💻 code, check out the Python Graph Gallery" + "text": "dry.jpg" }, { - "objectID": "course-materials/cheatsheets/matplotlib.html#basic-setup", - "href": "course-materials/cheatsheets/matplotlib.html#basic-setup", - "title": "EDS 217 Cheatsheet", - "section": "Basic Setup", - "text": "Basic Setup\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport numpy as np\n\n# Set style for seaborn (optional)\nsns.set_style(\"whitegrid\")" + "objectID": "course-materials/lectures/99_dry_vs_wet.html#dry-vs.-wet", + "href": "course-materials/lectures/99_dry_vs_wet.html#dry-vs.-wet", + "title": "EDS 217, Lecture 4: DRY 🏜 vs. WET 🌊", + "section": "DRY vs. WET", + "text": "DRY vs. WET\nIf DRY means “Don’t Repeat Yourself”… then WET means “Write Every Time”, or “We Enjoy Typing”\nDon’t write WET code!\n\nHow to DRY out your code\nWe write DRY code - or we DRY out WET code - through a combination of abstraction and normalization." }, { - "objectID": "course-materials/cheatsheets/matplotlib.html#creating-a-figure-and-axes", - "href": "course-materials/cheatsheets/matplotlib.html#creating-a-figure-and-axes", - "title": "EDS 217 Cheatsheet", - "section": "Creating a Figure and Axes", - "text": "Creating a Figure and Axes\n# Create a new figure and axis\nfig, ax = plt.subplots(figsize=(10, 6))\n\n# Create multiple subplots\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))" + "objectID": "course-materials/lectures/99_dry_vs_wet.html#abstraction", + "href": "course-materials/lectures/99_dry_vs_wet.html#abstraction", + "title": "EDS 217, Lecture 4: DRY 🏜 vs. WET 🌊", + "section": "Abstraction", + "text": "Abstraction\nThe “principle of abstraction” aims to reduce duplication of information (usually code) in a program whenever it is practical to do so:\n“Each significant piece of functionality in a program should be implemented in just one place in the source code. Where similar functions are carried out by distinct pieces of code, it is generally beneficial to combine them into one by abstracting out the varying parts.”\nBenjamin C. Pierce - Types and Programming Languages\n\nAbstraction Example\nThe easiest way to understand abstraction is to see it in action. Here’s an example that you are already familiar with; determining the energy emitted by an object as a function of its temperature:\n\\(Q = \\epsilon \\sigma T^4\\)\nwhere \\(\\epsilon\\) is an object’s emmissivity, \\(\\sigma\\) is the Stefan-Boltzmann constant, and \\(T\\) is temperature in degrees Kelvin.\n\n\nAbstraction Example\nWe might write the following code to determine \\(Q\\):\n\n# How much energy is emitted by an object at a certain temperature?\nε = 1 # emissivity [-]\nσ = 5.67e-8 # stefan-boltzmann constant [W/T^4]\nT_C = 40 # temperature [deg-C]\n\nQ = ε * σ * (T_C+273.15)**4\nprint(Q)\n\n\n\nAbstraction Example\nBut this code is going to get very WET very fast.\n\n# How much energy is emitted by an object at a certain temperature?\nε = 1 # emissivity [-]\nσ = 5.67e-8 # stefan-boltzmann constant [W/m^2/K^4]\nT_C = 40 # temperature [deg-C]\n\nQ = ε * σ * (T_C+273.15)**4\n\n# New T value? Different epsilon? What about a bunch of T values?\nT_2 = 30\n\nQ2 = ε * σ * (T_2+273.15)**4\n\n\n\n\nAbstraction Example\nHere’s a DRY version obtained using abstraction:\n\n# energy.py contains a function to calculate Q from T \nfrom energy import Q \n\nT = 40 # deg-C\nE = Q(T, unit='C')\n\n\n\nAbstraction Summary, Part 1\n\nWe keep our code DRY by using abstraction. In addition to functions, python also provides Classes as another important way to create abstractions.\nFunctions and Classes are the subject of this tomorrow’s exercise.\n\n\n\nAbstraction Summary, Part 2\n\nIn general, the process of keeping code DRY through successive layers of abstraction is known as re-factoring.\nThe “Rule of Three” states that you should probably consider refactoring (i.e. adding abstraction) whenever you find your code doing the same thing three times or more." }, { - "objectID": "course-materials/cheatsheets/matplotlib.html#common-plot-types", - "href": "course-materials/cheatsheets/matplotlib.html#common-plot-types", - "title": "EDS 217 Cheatsheet", - "section": "Common Plot Types", - "text": "Common Plot Types\n\nLine Plot\nx = np.linspace(0, 10, 100)\ny = np.sin(x)\nax.plot(x, y, label='sin(x)')\n\n\nScatter Plot\nx = np.random.rand(50)\ny = np.random.rand(50)\nplt.scatter(x, y, alpha=0.5)\n\n\nBar Plot\ncategories = ['A', 'B', 'C', 'D']\nvalues = [3, 7, 2, 5]\nplt.bar(categories, values)\n\n\nHistogram\ndata = np.random.randn(1000)\nax.hist(data, bins=30, edgecolor='black')" + "objectID": "course-materials/lectures/99_dry_vs_wet.html#normalization", + "href": "course-materials/lectures/99_dry_vs_wet.html#normalization", + "title": "EDS 217, Lecture 4: DRY 🏜 vs. WET 🌊", + "section": "Normalization", + "text": "Normalization\nNormalization is the process of structuring data in order to reduce redundancy and improve integrity." }, { - "objectID": "course-materials/cheatsheets/matplotlib.html#customizing-plots", - "href": "course-materials/cheatsheets/matplotlib.html#customizing-plots", - "title": "EDS 217 Cheatsheet", - "section": "Customizing Plots", - "text": "Customizing Plots\n\nLabels and Title\nax.set_xlabel('X-axis label')\nax.set_ylabel('Y-axis label')\nax.set_title('Plot Title')\n\n\nLegend\nax.legend()\n\n\nAxis Limits\nax.set_xlim(0, 10)\nax.set_ylim(-1, 1)\n\n\nGrid\nax.grid(True, linestyle='--', alpha=0.7)\n\n\nTicks\nax.set_xticks([0, 2, 4, 6, 8, 10])\nax.set_yticks([-1, -0.5, 0, 0.5, 1])" + "objectID": "course-materials/lectures/99_dry_vs_wet.html#normalization-1", + "href": "course-materials/lectures/99_dry_vs_wet.html#normalization-1", + "title": "EDS 217, Lecture 4: DRY 🏜 vs. WET 🌊", + "section": "Normalization", + "text": "Normalization\nSome of the key principles of Normalization include:\n\nAll data have a Primary Key, which uniquely identifies a record. Usually, in python, this key is called an Index.\nAtomic columns, meaning entries contain a single value. This means no collections should appear as elements within a data table. (i.e. “cells” in structured data should not contain lists!)\nNo transitive dependencies. This means that there should not be implicit associations between columns within data tables.\n\n\nPrimary Keys\nThis form of normalization is easy to obtain, as the idea of an Index is embedded in almost any Python data structure, and a core component of data structures witin pandas, which is the most popular data science library in python (coming next week!).\n\n\nPrimary Keys\n\n# All DataFrames in pandas are created with an index (i.e unique primary key)\nimport pandas as pd\naverage_high_temps = [18.3, 18.3, 18.9, 20.6, 21.1, 21.7,\n 23.9, 24.4, 23.9, 22.8, 20.6, 18.3]\nsb_high_temp = pd.DataFrame(\n average_high_temps, # This list will become a single column of values\n columns=['Average_High_Temperature'] # This is the name of the column\n) # NOTE: use sb_high_temp.head() py->month_list\n#sb_high_temp.index = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']\nsb_high_temp.head()\n\n\n\nAtomic Columns\nThe idea of atomic columns is that each element in a data structure should contain a unique value. This requirement is harder to obtain and you will sometimes violate it.\n\n# import pandas as pd\naverage_high_temps = [18.3, 18.3, 18.9, 20.6, 21.1, 21.7, 23.9, 24.4, 23.9, 22.8, 20.6, 18.3]\naverage_rainfall = [110.7, 119.1, 74.2, 31.5, 8.4, 2.3, 0.5, 1.3, 3.6, 22.9, 45.5, 77.2]\n\n# THIS DATAFRAME IS NOT ATOMIC. EACH ELEMENT IN THE COLUMN IS A LIST.\nsb_climate = pd.DataFrame([\n [average_high_temps, # The first column will contain a list.\n average_rainfall]], # The second column will also contain a list.\n columns=['Monthly Average Temp', 'Monthly Average Rainfall'] # Column names\n)\nsb_climate.head()\n\n\n\nAtomic Columns\nThe idea of atomic columns is that each element in a data structure should contain a unique value. This requirement is harder to obtain and you will sometimes violate it.\n\nimport pandas as pd\naverage_high_temps = [18.3, 18.3, 18.9, 20.6, 21.1, 21.7, 23.9, 24.4, 23.9, 22.8, 20.6, 18.3]\naverage_rainfall = [110.7, 119.1, 74.2, 31.5, 8.4, 2.3, 0.5, 1.3, 3.6, 22.9, 45.5, 77.2]\n\n# THIS DATAFRAME IS ATOMIC. EACH ELEMENT IN THE COLUMN IS A SINGLE VALUE.\nsb_climate = pd.DataFrame({ # Using a dict to create the data frame.\n 'Average_High_Temperature':average_high_temps, # This is the first column\n 'Average_Rainfall':average_rainfall # This is the second column\n})\nsb_climate.index = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']\nsb_climate.head()\n\n\n\nTransitive Dependencies\nThe idea of transitive dependencies is the inclusion of multiple associated attributes within the same data structure.\n\nTransitive dependencies make updating data very difficult, but they can be helpful in analyzying data.\nSo we should only introduce them in data that we will not be editing.\n\nUsually environmental data, and especially timeseries, are rarely modified after creation. So we don’t need to worry as much about these dependencies.\nFor example, contrast a data record of “temperatures through time” to a data record of “user contacts in a social network”.\n\n\nTransitive Dependencies\nThe idea of transitive dependencies is the inclusion of multiple associated attributes within the same data structure.\n\nimport pandas as pd\naverage_high_temps = [18.3, 18.3, 18.9, 20.6, 21.1, 21.7, 23.9, 24.4, 23.9, 22.8, 20.6, 18.3]\naverage_rainfall = [110.7, 119.1, 74.2, 31.5, 8.4, 2.3, 0.5, 1.3, 3.6, 22.9, 45.5, 77.2]\n\n# TRANSITIVE ASSOCIATIONS EXIST BETWEEN MONTHS AND SEASONS IN THIS DATAFRAME:\nmonth = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']\nseason = ['Winter', 'Winter', 'Spring', 'Spring', 'Spring', 'Summer', 'Summer', 'Summer', 'Fall', 'Fall', 'Fall', 'Winter']\nsb_climate = pd.DataFrame({ # Using a dict to create the data frame.\n 'Month': month, # Adding month as the first column of the data frame\n 'Season': season, # Adding the season for each month (this is a transitive dependency)\n 'Avg_High_Temp':average_high_temps, # This is the third column\n 'Avg_Rain':average_rainfall # This is the fourth column\n})\nsb_climate.head()\n\n\n\nNormalization Summary\nIn general, for data analysis, basic normalization is handled for you.\n\nFor read only data with fixed associations, a lack of normalization is manageable.\nHowever, many analyses are easier if you structure your data in ways that are as normalized as possible.\nIf you are collecting data then it is important to develop an organization structure that is normalized." }, { - "objectID": "course-materials/cheatsheets/matplotlib.html#color-and-style", - "href": "course-materials/cheatsheets/matplotlib.html#color-and-style", - "title": "EDS 217 Cheatsheet", - "section": "Color and Style", - "text": "Color and Style\n\nChanging Colors\nax.plot(x, y, color='r') # 'r' for red\nax.scatter(x, y, c='blue')\n\n\nLine Styles\nax.plot(x, y, linestyle='--') # dashed line\nax.plot(x, y, ls=':') # dotted line\n\n\nMarker Styles\nax.plot(x, y, marker='o') # circles\nax.plot(x, y, marker='s') # squares" + "objectID": "course-materials/lectures/99_dry_vs_wet.html#the-end", + "href": "course-materials/lectures/99_dry_vs_wet.html#the-end", + "title": "EDS 217, Lecture 4: DRY 🏜 vs. WET 🌊", + "section": "The End", + "text": "The End" }, { - "objectID": "course-materials/cheatsheets/matplotlib.html#saving-and-displaying", - "href": "course-materials/cheatsheets/matplotlib.html#saving-and-displaying", - "title": "EDS 217 Cheatsheet", - "section": "Saving and Displaying", - "text": "Saving and Displaying\n\nSaving the Figure\nplt.savefig('my_plot.png', dpi=300, bbox_inches='tight')\n\n\nDisplaying the Plot\nplt.show()" + "objectID": "course-materials/lectures/03-debugging.html", + "href": "course-materials/lectures/03-debugging.html", + "title": "Debugging with the VS code debugger", + "section": "", + "text": "debug.png\nWhen your code doesn’t work as expected, you might: 1. Use print statements 1. Ask ChatGPT what’s wrong with your code\nPrint statements can be annoying to put all over the place and sometimes ChatGPT doesn’t know. So what if you could step into your code and run it line by line, interactively, to figure out what was wrong? This is what VS code’s debugger does for you." }, { - "objectID": "course-materials/cheatsheets/matplotlib.html#useful-tips-for-seaborn-users", - "href": "course-materials/cheatsheets/matplotlib.html#useful-tips-for-seaborn-users", - "title": "EDS 217 Cheatsheet", - "section": "Useful Tips for Seaborn Users", - "text": "Useful Tips for Seaborn Users\n\nUse plt.subplots() to create custom layouts that Seaborn doesn’t provide.\nAccess the underlying Matplotlib Axes object in Seaborn plots:\ng = sns.scatterplot(x='x', y='y', data=df)\ng.set_xlabel('Custom X Label')\nCombine Seaborn and Matplotlib in the same figure:\nfig, ax = plt.subplots()\nsns.scatterplot(x='x', y='y', data=df, ax=ax)\nax.plot(x, y, color='r', linestyle='--')\nUse Matplotlib’s plt.tight_layout() to automatically adjust subplot parameters for better spacing.\n\nRemember, most Seaborn functions return a Matplotlib Axes object, allowing you to further customize your plots using Matplotlib functions." + "objectID": "course-materials/lectures/03-debugging.html#setup", + "href": "course-materials/lectures/03-debugging.html#setup", + "title": "Debugging with the VS code debugger", + "section": "Setup", + "text": "Setup\nGet the VS code python extension. 1. Go to the left hand bar in VS code and click “Extensions” 1. Search “Python” 1. Install the extension called “Python” by Microsoft\nNote: VS code has a known bug where sometimes the Debug this cell option disappears. If this happens, you unfortunately need to restart VS code. This is not an issue if debugging .py files though, which is likely where you’ll end up using the debugger the most anyways." }, { - "objectID": "course-materials/cheatsheets/print.html", - "href": "course-materials/cheatsheets/print.html", - "title": "EDS 217 Cheatsheet", + "objectID": "course-materials/lectures/03-debugging.html#using-the-debugger", + "href": "course-materials/lectures/03-debugging.html#using-the-debugger", + "title": "Debugging with the VS code debugger", + "section": "Using the debugger", + "text": "Using the debugger\nWe will practice using the debugger with practice 2-1, problem 4.\nFirst, let’s check out how the debugger works by placing a breakpoint on the first line of the cell where we define variables and stepping through it.\nSee how the variables appear under your Run and Debug tab, and try using the Debug Console to print or manipulate the varibles as you step throught the code.\n\n# Import numpy for mean calculations\nimport numpy as np\n\ntempF_2015 = [61.628, 61.7, 61.808, 61.448, 61.52, 61.538, 61.394, 61.52, 61.61, 62.042, 61.988, 62.168]\ntempF_2016 = [62.186, 62.546, 62.528, 62.06, 61.79, 61.52, 61.61, 61.916, 61.718, 61.682, 61.736, 61.628]\ntempF_2017 = [61.916, 62.132, 62.168, 61.772, 61.718, 61.376, 61.556, 61.646, 61.466, 61.7, 61.664, 61.754]\ntempF_2018 = [61.556, 61.61, 61.664, 61.682, 61.556, 61.466, 61.556, 61.448, 61.52, 61.916, 61.556, 61.718]\ntempF_2019 = [61.754, 61.79, 62.186, 61.898, 61.61, 61.7, 61.772, 61.79, 61.754, 61.898, 61.862, 62.042]\ntempF_2020 = [62.186, 62.312, 62.186, 62.114, 61.898, 61.736, 61.7, 61.646, 61.862, 61.664, 62.06, 61.538]\ntempF_2021 = [61.538, 61.232, 61.664, 61.43, 61.484, 61.592, 61.736, 61.556, 61.736, 61.88, 61.772, 61.61]\n\n# List of yearly lists (may or may not be useful)\ntempF_list = [tempF_2015, tempF_2016, tempF_2017, tempF_2018, tempF_2019, tempF_2020, tempF_2021]\n\n# List of years (probably useful)\nyears = [2015, 2016, 2017, 2018, 2019, 2020, 2021]\n\nBeing able to walk through code line by line is especially helpful when you can step into loops or functions that you would otherwise need print statements to see what is happening inside of.\nTry placing a breakpoint on the first line and seeing how the variables change when you step through this for loop.\n\nfor tempF_year,year in zip(tempF_list,years):\n print(f\"{year} temperatures: {tempF_year}\")\n\n[61.628, 61.7, 61.808, 61.448, 61.52, 61.538, 61.394, 61.52, 61.61, 62.042, 61.988, 62.168]\n2015 temperatures: [61.628, 61.7, 61.808, 61.448, 61.52, 61.538, 61.394, 61.52, 61.61, 62.042, 61.988, 62.168]\n2016 temperatures: [62.186, 62.546, 62.528, 62.06, 61.79, 61.52, 61.61, 61.916, 61.718, 61.682, 61.736, 61.628]\n2017 temperatures: [61.916, 62.132, 62.168, 61.772, 61.718, 61.376, 61.556, 61.646, 61.466, 61.7, 61.664, 61.754]\n2018 temperatures: [61.556, 61.61, 61.664, 61.682, 61.556, 61.466, 61.556, 61.448, 61.52, 61.916, 61.556, 61.718]\n2019 temperatures: [61.754, 61.79, 62.186, 61.898, 61.61, 61.7, 61.772, 61.79, 61.754, 61.898, 61.862, 62.042]\n2020 temperatures: [62.186, 62.312, 62.186, 62.114, 61.898, 61.736, 61.7, 61.646, 61.862, 61.664, 62.06, 61.538]\n2021 temperatures: [61.538, 61.232, 61.664, 61.43, 61.484, 61.592, 61.736, 61.556, 61.736, 61.88, 61.772, 61.61]\n\n\nNow let’s try debugging some code. Suppose your friend has written some code to solve problem 4a but is running into an error. Let’s try using the debugger to fix it:\nPractice 2-1 problem 4a: Calculate the monthly global temperature anomalies (deviation from the mean) in °C for 2015-2021. The mean global land-ocean surface temperature calculated over the 20th century was 15.6°C.\n\n# 4a. Monthly temperature anomalies\n\n# Innitialize a list of yearly lists of anomalies in °C\nanomC_list = [] \n\n# Iterate through yearly lists\nfor tempsF in tempF_list:\n # Empty list for a single year\n anomC = []\n # Iterate through a single year\n for tempF in tempsF:\n # Convert temp from F to C\n tempC = tempF - 32.0 * (5/9)\n # Calculate temp anomaly in °C\n anomC = tempC - 15.6\n # Append to anomC\n anomC.append(round(anomC,2))\n # Add list of anomalies in °C to anomC_list (list of lists)\n anomC_list.append(anomC)\n\nfor anom_list,year in zip(anomC_list,years):\n print(f\"{year} anomalies: {anom_list}\")\n\n28.25022222222222\n\n\n\n---------------------------------------------------------------------------\nAttributeError Traceback (most recent call last)\nCell In[2], line 17\n 15 anomC = tempC - 15.6\n 16 # Append to anomC\n---> 17 anomC.append(round(anomC,2))\n 18 # Add list of anomalies in °C to anomC_list (list of lists)\n 19 anomC_list.append(anomC)\n\nAttributeError: 'float' object has no attribute 'append'\n\n\n\nNow you try it on your own. Try using the debugger to debug the following answer to question 4b:\nCreate a new list with the mean monthly global surface temperature anomalies in °C for 2015-2023 (i.e. calculate the mean temperature anomaly for each month and put these values in a list).\n\n# 4b. Mean monthly temperature anomaly, 2015-2021\n\n# Empty list for monthly mean temperature anomalies\nmonthly_means = []\n# Set up generic counter loop with 12 iterations\nfor i in range(12 + 1):\n # Generate list of temperature anomalies for each month by extracting the ith value from each sublist.\n monthly = []\n # generate a list of anomalies for month i with anomalies from all years in it\n for anom in anomC_list:\n # get the ith anomaly (ith month) from each year list\n monthly.append(anoms[i + 1])\n # Calculate mean for month i\n monthly_mean = np.sum(monthly)\n # Add mean for month i to list of means\n monthly_means.append(round(monthly_means,4))\n\n# Print list of mean monthly temperature anomalies, 2015-2018.\nprint(f\"monthly_means: {monthly_means}\")" + }, + { + "objectID": "course-materials/lectures/01_the_zen_of_python.html", + "href": "course-materials/lectures/01_the_zen_of_python.html", + "title": "The Zen of Python", "section": "", - "text": "The print() function outputs the specified message to the screen. It is often used for debugging to display the values of variables and program status during code execution.\n\n\nprint(\"Hello, World!\")\n\n\n\nx = 10\ny = 20\nprint(x)\nprint(y)" + "text": "# What is the Zen of Python??\nimport this\n\nThe Zen of Python, by Tim Peters\n\nBeautiful is better than ugly.\nExplicit is better than implicit.\nSimple is better than complex.\nComplex is better than complicated.\nFlat is better than nested.\nSparse is better than dense.\nReadability counts.\nSpecial cases aren't special enough to break the rules.\nAlthough practicality beats purity.\nErrors should never pass silently.\nUnless explicitly silenced.\nIn the face of ambiguity, refuse the temptation to guess.\nThere should be one-- and preferably only one --obvious way to do it.\nAlthough that way may not be obvious at first unless you're Dutch.\nNow is better than never.\nAlthough never is often better than *right* now.\nIf the implementation is hard to explain, it's a bad idea.\nIf the implementation is easy to explain, it may be a good idea.\nNamespaces are one honking great idea -- let's do more of those!" }, { - "objectID": "course-materials/cheatsheets/print.html#basic-usage-of-print", - "href": "course-materials/cheatsheets/print.html#basic-usage-of-print", - "title": "EDS 217 Cheatsheet", + "objectID": "course-materials/lectures/01_the_zen_of_python.html#python-errors", + "href": "course-materials/lectures/01_the_zen_of_python.html#python-errors", + "title": "The Zen of Python", + "section": "Python Errors", + "text": "Python Errors\nThere are two types of errors in Python: SyntaxErrors and Exceptions.\n\nSyntaxErrors\nA SyntaxError happens when the Python language interpreter (the parser) detects an incorrectly formatted statement.\nThis code is trying to divide two numbers, but there are mismatched parentheses. What happens when we run it?\n>>> print( 5 / 4 ))\n\nprint( 5 / 4 ))\n\n\n Cell In[1], line 1\n print( 5 / 4 ))\n ^\nSyntaxError: unmatched ')'\n\n\n\n\nWhen python says SyntaxError, you should read this as I don't know what you want me to do!?\nOften the error includes some indication of where the problem is, although this indication can sometimes be misleading if the detection occurs far away from the syntax problem that created the error. Often the interpreter will attempt to explain what the problem is!\n\n\nExceptions\nAn Exception happens the code you have written violates the Python language specification.\nThis code is trying to divide zero by 0. Its syntax is correct. But what happens when we run it?\n>>> print( 0 / 0 )\n\ntry:\n print( 0 / 0 ) \nexcept ZeroDivisionError:\n print(f\"It didn't work because you tried to divide by zero\")\n\nIt didn't work because you tried to divide by zero\n\n\nWhen python says anything other than SyntaxError, you should read this as You are asking to do something I can't do\nIn this case, the ZeroDivisionError is raised because the Python language specification does not allow for division by zero.\n\n\nTypes of Exceptions\nPython has a lot of builtin Errors that correspond to the definition of the Python language.\nA few common Exceptions you will see include TypeError, IndexError, and KeyError.\n\n\nTypeError\nA TypeError is raised when you try to perform a valid method on an inappropriate data type.\n\n# TypeError Examples:\n'a' + 3\n\n\n\nIndexError\nAn IndexError is raised when you try to access an undefined element of a sequence. Sequences are structured data types whose elements are stored in a specific order. A list is an example of a sequence.\n\n# IndexError Example:\nmy_list = ['a', 'b', 'c', 'd']\nmy_list[4]\n\n\n---------------------------------------------------------------------------\nIndexError Traceback (most recent call last)\nCell In[14], line 3\n 1 # IndexError Example:\n 2 my_list = ['a', 'b', 'c', 'd']\n----> 3 my_list[4]\n\nIndexError: list index out of range\n\n\n\n\n\nKeyError\nA KeyError is raised when you try to perform a valid method on an inappropriate data type.\n\n# KeyError Examples:\n\nmy_dict = {'column_1': 'definition 1', 'another_word': 'a second definition'}\nmy_dict['column1']\n\n\n---------------------------------------------------------------------------\nKeyError Traceback (most recent call last)\nCell In[16], line 4\n 1 # KeyError Examples:\n 3 my_dict = {'column_1': 'definition 1', 'another_word': 'a second definition'}\n----> 4 my_dict['column1']\n\nKeyError: 'column1'\n\n\n\n\n\nDeciphering Tracebacks\nWhen an exception is raised in python the interpreter generates a “Traceback” that shows where and why the error occurred. Generally, the REPL has most detailed Traceback information, although Jupyter Notebooks and iPython interactive shells also provide necessary information to debug any exception.\n\n# defining a function\ndef multiply(num1, num2):\n result = num1 * num2\n print(results)\n \n# calling the function\nmultiply(10, 2)\n\n\n---------------------------------------------------------------------------\nNameError Traceback (most recent call last)\nCell In[17], line 7\n 4 print(results)\n 6 # calling the function\n----> 7 multiply(10, 2)\n\nCell In[17], line 4, in multiply(num1, num2)\n 2 def multiply(num1, num2):\n 3 result = num1 * num2\n----> 4 print(results)\n\nNameError: name 'results' is not defined\n\n\n\n\n## The End" + }, + { + "objectID": "course-materials/coding-colabs/6b_preprocess.html", + "href": "course-materials/coding-colabs/6b_preprocess.html", + "title": "", "section": "", - "text": "The print() function outputs the specified message to the screen. It is often used for debugging to display the values of variables and program status during code execution.\n\n\nprint(\"Hello, World!\")\n\n\n\nx = 10\ny = 20\nprint(x)\nprint(y)" + "text": "import pandas as pd\n\n# Load the CO2 dataset\nco2_url = \"https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.csv\"\nco2_df = pd.read_csv(co2_url, comment='#', header=1, \n names=['Year', 'Month', 'DecimalDate', 'MonthlyAverage', \n 'Deseasonalized', 'DaysInMonth', 'StdDev', 'Uncertainty'])\n\n\nco2_df.head()\n\n\n\n\n\n\n\n\nYear\nMonth\nDecimalDate\nMonthlyAverage\nDeseasonalized\nDaysInMonth\nStdDev\nUncertainty\n\n\n\n\n0\n1958\n4\n1958.2877\n317.45\n315.16\n-1\n-9.99\n-0.99\n\n\n1\n1958\n5\n1958.3699\n317.51\n314.69\n-1\n-9.99\n-0.99\n\n\n2\n1958\n6\n1958.4548\n317.27\n315.15\n-1\n-9.99\n-0.99\n\n\n3\n1958\n7\n1958.5370\n315.87\n315.20\n-1\n-9.99\n-0.99\n\n\n4\n1958\n8\n1958.6219\n314.93\n316.21\n-1\n-9.99\n-0.99\n\n\n\n\n\n\n\n\n\n# Convert Year and Month to datetime\nco2_df['Date'] = pd.to_datetime(co2_df['Year'].astype(str) + '-' + co2_df['Month'].astype(str) + '-01')\n\n# Select only the Date and MonthlyAverage columns\nco2_df = co2_df[['Date', 'MonthlyAverage']].rename(columns={'MonthlyAverage': 'CO2Concentration'})\n\n# Sort by date and reset index\nco2_df = co2_df.sort_values('Date').reset_index(drop=True)\n\n# Save to CSV\nco2_df.to_csv('monthly_co2_concentration.csv', index=False)\n\nprint(co2_df.head())\n\n Date CO2Concentration\n0 1958-04-01 317.45\n1 1958-05-01 317.51\n2 1958-06-01 317.27\n3 1958-07-01 315.87\n4 1958-08-01 314.93" }, { - "objectID": "course-materials/cheatsheets/print.html#combining-text-and-variables", - "href": "course-materials/cheatsheets/print.html#combining-text-and-variables", - "title": "EDS 217 Cheatsheet", - "section": "Combining Text and Variables", - "text": "Combining Text and Variables\nYou can combine text and variables in the print() function to make the output more informative. Here are 4 different ways:\n\nThis is a note.\n\n\nMake sure to check your data types before processing.\n\n\n1. Using Comma Separation\nname = \"Alice\"\nage = 30\nprint(\"Name:\", name, \"Age:\", age)\n\n\n2. Using String Formatting\n\nf-string (Formatted String Literal) - Python 3.6+ [PREFERRED]\nname = \"Bob\"\nage = 25\nprint(f\"Name: {name}, Age: {age}\")\n\n\nformat() Method [CAN BE USEFUL IN COMPLICATED PRINT STATEMENTS]\nname = \"Carol\"\nage = 22\nprint(\"Name: {}, Age: {}\".format(name, age))\n\n\nOld % formatting [NOT RECOMMENDED]\nname = \"Dave\"\nage = 28\nprint(\"Name: %s, Age: %d\" % (name, age))" + "objectID": "course-materials/cheatsheets/first_steps.html", + "href": "course-materials/cheatsheets/first_steps.html", + "title": "Python Basics Cheat Sheet", + "section": "", + "text": "Variables are containers for storing data values.\nPython has no command for declaring a variable: it is created the moment you first assign a value to it.\n\nx = 5\nname = \"Alice\"\n\n\n\n\nPython has various data types including:\n\nint (integer): A whole number, positive or negative, without decimals.\nfloat (floating point number): A number, positive or negative, containing one or more decimals.\nstr (string): A sequence of characters in quotes.\nbool (boolean): Represents True or False.\n\n\nage = 30 # int\ntemperature = 20.5 # float\nname = \"Bob\" # str\nis_valid = True # bool\n\n\n\n\n\n\n\nUsed with numeric values to perform common mathematical operations:\n\n\n\n\nOperator\nDescription\n\n\n\n\n+\nAddition\n\n\n-\nSubtraction\n\n\n*\nMultiplication\n\n\n/\nDivision\n\n\n%\nModulus\n\n\n**\nExponentiation\n\n\n//\nFloor division\n\n\n\n\n\n\nx = 10\ny = 3\nprint(x + y) # 13\nprint(x - y) # 7\nprint(x * y) # 30\nprint(x / y) # 3.3333\nprint(x % y) # 1\nprint(x ** y) # 1000\nprint(x // y) # 3\n\n\n\n\nUsed to combine conditional statements:\n\n\n\n\nOperator\nDescription\n\n\n\n\nand\nReturns True if both statements are true\n\n\nor\nReturns True if one of the statements is true\n\n\nnot\nReverse the result, returns False if the result is true\n\n\n\n\n\n\nx = True\ny = False\nprint(x and y) # False\nprint(x or y) # True\nprint(not x) # False\n\n\n\n\nStrings in Python are surrounded by either single quotation marks, or double quotation marks.\n\nhello = \"Hello\"\nworld = 'World'\nprint(hello + \" \" + world) # Hello World\n\nStrings can be indexed with the first character having index 0.\n\na = \"Hello, World!\"\nprint(a[1]) # e\n\nSlicing strings:\n\nb = \"Hello, World!\"\nprint(b[2:5]) # llo\n\n\n\n\n# This is a comment\nprint(\"Hello, World!\") # Prints Hello, World!\nThis cheatsheet covers the very basics to get you started with Python. Experiment with these concepts to understand them better!" }, { - "objectID": "course-materials/cheatsheets/print.html#debugging-with-print", - "href": "course-materials/cheatsheets/print.html#debugging-with-print", - "title": "EDS 217 Cheatsheet", - "section": "Debugging with Print", - "text": "Debugging with Print\nUse print() to display intermediate values in your code to understand how data changes step by step.\n\nExample: Debugging a Loop\nfor i in range(5):\n print(f\"Current value of i: {i}\")\n\n\nExample: Checking Function Outputs\ndef add(a, b):\n result = a + b\n print(f\"Adding {a} + {b} = {result}\")\n return result\n\nadd(5, 3)" + "objectID": "course-materials/cheatsheets/first_steps.html#variables-and-data-types", + "href": "course-materials/cheatsheets/first_steps.html#variables-and-data-types", + "title": "Python Basics Cheat Sheet", + "section": "", + "text": "Variables are containers for storing data values.\nPython has no command for declaring a variable: it is created the moment you first assign a value to it.\n\nx = 5\nname = \"Alice\"\n\n\n\n\nPython has various data types including:\n\nint (integer): A whole number, positive or negative, without decimals.\nfloat (floating point number): A number, positive or negative, containing one or more decimals.\nstr (string): A sequence of characters in quotes.\nbool (boolean): Represents True or False.\n\n\nage = 30 # int\ntemperature = 20.5 # float\nname = \"Bob\" # str\nis_valid = True # bool" }, { - "objectID": "course-materials/cheatsheets/timeseries.html", - "href": "course-materials/cheatsheets/timeseries.html", - "title": "EDS 217 Cheatsheet", + "objectID": "course-materials/cheatsheets/first_steps.html#basic-operations", + "href": "course-materials/cheatsheets/first_steps.html#basic-operations", + "title": "Python Basics Cheat Sheet", "section": "", - "text": "To be added" + "text": "Used with numeric values to perform common mathematical operations:\n\n\n\n\nOperator\nDescription\n\n\n\n\n+\nAddition\n\n\n-\nSubtraction\n\n\n*\nMultiplication\n\n\n/\nDivision\n\n\n%\nModulus\n\n\n**\nExponentiation\n\n\n//\nFloor division\n\n\n\n\n\n\nx = 10\ny = 3\nprint(x + y) # 13\nprint(x - y) # 7\nprint(x * y) # 30\nprint(x / y) # 3.3333\nprint(x % y) # 1\nprint(x ** y) # 1000\nprint(x // y) # 3\n\n\n\n\nUsed to combine conditional statements:\n\n\n\n\nOperator\nDescription\n\n\n\n\nand\nReturns True if both statements are true\n\n\nor\nReturns True if one of the statements is true\n\n\nnot\nReverse the result, returns False if the result is true\n\n\n\n\n\n\nx = True\ny = False\nprint(x and y) # False\nprint(x or y) # True\nprint(not x) # False\n\n\n\n\nStrings in Python are surrounded by either single quotation marks, or double quotation marks.\n\nhello = \"Hello\"\nworld = 'World'\nprint(hello + \" \" + world) # Hello World\n\nStrings can be indexed with the first character having index 0.\n\na = \"Hello, World!\"\nprint(a[1]) # e\n\nSlicing strings:\n\nb = \"Hello, World!\"\nprint(b[2:5]) # llo" }, { - "objectID": "course-materials/day1.html#class-materials", - "href": "course-materials/day1.html#class-materials", - "title": "Intro to Python and JupyterLab", - "section": "Class materials", - "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 1 / morning\nPython and Data Science\n⚒️ Meet JupyterLab ⚒️ Coding in Jupyter Notebooks\n\n\n\n\n\n\n\nday 1 / afternoon\n🐍 Exploring Data Types and Methods\n🐍 Variables & Operators" + "objectID": "course-materials/cheatsheets/first_steps.html#printing-and-commenting", + "href": "course-materials/cheatsheets/first_steps.html#printing-and-commenting", + "title": "Python Basics Cheat Sheet", + "section": "", + "text": "# This is a comment\nprint(\"Hello, World!\") # Prints Hello, World!\nThis cheatsheet covers the very basics to get you started with Python. Experiment with these concepts to understand them better!" }, { - "objectID": "course-materials/day1.html#end-of-day-practice", - "href": "course-materials/day1.html#end-of-day-practice", - "title": "Intro to Python and JupyterLab", - "section": "End-of-day practice", - "text": "End-of-day practice\nOur last task today is to work through an example of a Python data science workflow. This exercise “skips ahead” to content we will be learning later in the course, but provides a preview of where we are headed and how easy it is to do data science in python!\n\n Day 1 Practice: Example Python Data Science Workflow" + "objectID": "course-materials/lectures/00_intro_to_python.html", + "href": "course-materials/lectures/00_intro_to_python.html", + "title": "Lecture 1 - Intro to Python and Environmental Data Science", + "section": "", + "text": "Course Webpage: https://eds-217-essential-python.github.io\n\n\n\n\ndata_science.jpg\n\n\n\n\n\nenvironmental_data_science.jpg\n\n\n\n\n\n🐍 What Python?\n❓ Why Python?\n💻 How Python?\n\n\n“Python is powerful… and fast; plays well with others; runs everywhere; is friendly & easy to learn; is Open.”\n\n\n\n\nPython is a general-purpose, object-oriented programming language that emphasizes code readability through its generous use of white space. Released in 1989, Python is easy to learn and a favorite of programmers and developers.\n\n\n(Python, C, C++, Java, Javascript, R, Pascal) - Take less time to write - Shorter and easier to read - Portable, meaning that they can run on different kinds of computers with few or no modifications.\nThe engine that translates and runs Python is called the Python Interpreter\n\n\"\"\" \nEntering code into this notebook cell \nand pressing [SHIFT-ENTER] will cause the \npython interpreter to execute the code\n\"\"\"\n\n \nprint(\"Hello world!\")\nprint(\"[from this notebook cell]\")\n\nHello world!\n[from this notebook cell]\n\n\n\n\"\"\"\nAlternatively, you can run a \nany python script file (.py file)\nso long as it contains valid\npython code.\n\"\"\"\n!python hello_world.py\n\nHello world!\n[from hello_world.py]\n\n\n\n \n\n\n\n\nNatural languages are the languages that people speak. They are not designed (although they are subjected to various degrees of “order”) and evolve naturally.\nFormal languages are languages that are designed by people for specific applications. - Mathematical Notation \\(E=mc^2\\) - Chemical Notation: \\(\\text{H}_2\\text{O}\\)\nProgramming languages are formal languages that have been designed to express computations.\nParsing: The process of figuring out what the structure of a sentence or statement is (in a natural language you do this subconsciously).\nFormal Languages have strict syntax for tokens and structure:\n\nMathematical syntax error: \\(E=\\$m🦆_2\\) (bad tokens & bad structure)\nChemical syntax error: \\(\\text{G}_3\\text{Z}\\) (bad tokens, but structure is okay)\n\n\n\n\n\nAmbiguity: Natural languages are full of ambiguity, which people parse using contextual clues. Formal languages are nearly or completely unambiguous; any statement has exactly one meaning, regardless of context.\nRedundancy: In order to make up for ambiguity, natural languages employ lots of redundancy. Formal languages are less redundant and more concise.\nLiteralness: Formal languages mean exactly what they say. Natural languages employ idioms and metaphors.\n\nThe inherent differences between familiar natural languages and unfamiliar formal languages creates one of the greatest challenges in learning to code.\n\n\n\n\npoetry: Words are used for sound and meaning. Ambiguity is common and often deliberate.\nprose: The literal meaning of words is important, and the structure contributes meaning. Amenable to analysis but still often ambiguous.\nprogram: Meaning is unambiguous and literal, and can be understood entirely by analysis of the tokens and structure.\n\n\n\n\nFormal languages are very dense, so it takes longer to read them.\nStructure is very important, so it is usually not a good idea to read from top to bottom, left to right. Instead, learn to parse the program in your head, identifying the tokens and interpreting the structure.\nDetails matter. Little things like spelling errors and bad punctuation, which you can get away with in natural languages, will make a big difference in a formal language.\n\n\n\n\n\n\nIBM: R vs. Python\nPython is a multi-purpose language with a readable syntax that’s easy to learn. Programmers use Python to delve into data analysis or use machine learning in scalable production environments.\nR is built by statisticians and leans heavily into statistical models and specialized analytics. Data scientists use R for deep statistical analysis, supported by just a few lines of code and beautiful data visualizations.\nIn general, R is better for initial exploratory analyses, statistical analyses, and data visualization.\nIn general, Python is better for working with APIs, writing maintainable, production-ready code, working with a diverse array of data, and building machine learning or AI workflows.\nBoth languages can do anything. Most data science teams use both languages. (and others too.. Matlab, Javascript, Go, Fortran, etc…)\n\nfrom IPython.lib.display import YouTubeVideo\nYouTubeVideo('GVvfNgszdU0')\n\n\n \n \n\n\n\n\nAnaconda State of Data Science\nData from 2021: \n\n\n\n\nThe data are available here…\nBut, unfortunately, they changed the format of the responses concerning language use between 2022 and 2023. But we can take look at the 2022 data…\nLet’s do some python data science!\n\n# First, we need to gather our tools\nimport pandas as pd # This is the most common data science package used in python!\nimport matplotlib.pyplot as plt # This is the most widely-used plotting package.\n\nimport requests # This package helps us make https requests \nimport io # This package is good at handling input/output streams\n\n\n# Here's the url for the 2022 data. It has a similar structure to the 2021 data, so we can compare them.\nurl = \"https://static.anaconda.cloud/content/Anaconda_2022_State_of_Data_Science_+Raw_Data.csv\"\nresponse = requests.get(url)\n\n\n# Read the response into a dataframe, using the io.StringIO function to feed the response.txt.\n# Also, skip the first three rows\ndf = pd.read_csv(io.StringIO(response.text), skiprows=3)\n\n# Our very first dataframe!\ndf.head()\n\n# Jupyter notebook cells only output the last value requested...\n\n\n\n\n\n\n\n\nIn which country is your primary residence?\nWhich of the following age groups best describes you?\nWhat is the highest level of education you've achieved?\nGender: How do you identify? - Selected Choice\nThe organization I work for is best classified as a:\nWhat is your primary role? - Selected Choice\nFor how many years have you been in your current role?\nWhat position did you hold prior to this? - Selected Choice\nHow would you rate your job satisfaction in your current role?\nWhat would cause you to leave your current employer for a new job? Please select the top option besides pay/benefits. - Selected Choice\n...\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Help choose the best model types to solve specific problems\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Speed up the ML pipeline by automating certain workflows (data cleaning, etc.)\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Tune the model once performance (such as accuracy, etc.) starts to degrade\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Other (please indicate)\nWhat do you think is the biggest problem in the data science/AI/ML space today? - Selected Choice\nWhat tools and resources do you feel are lacking for data scientists who want to learn and develop their skills? (Select all that apply). - Selected Choice\nHow do you typically learn about new tools and topics relevant to your role? (Select all that apply). - Selected Choice\nWhat are you most hoping to see from the data science industry this year? - Selected Choice\nWhat do you believe is the biggest challenge in the open-source community today? - Selected Choice\nHave supply chain disruption problems, such as the ongoing chip shortage, impacted your access to computing resources?\n\n\n\n\n0\nUnited States\n26-41\nDoctoral degree\nMale\nEducational institution\nData Scientist\n1-2 years\nData Scientist\nVery satisfied\nMore flexibility with my work hours\n...\n4.0\n2.0\n5.0\n6.0\nA reduction in job opportunities caused by aut...\nHands-on projects,Mentorship opportunities\nReading technical books, blogs, newsletters, a...\nFurther innovation in the open-source data sci...\nUndermanagement\nNo\n\n\n1\nUnited States\n42-57\nDoctoral degree\nMale\nCommercial (for-profit) entity\nProduct Manager\n5-6 years\nNaN\nVery satisfied\nMore responsibility/opportunity for career adv...\n...\n2.0\n5.0\n4.0\n6.0\nSocial impacts from bias in data and models\nTailored learning paths\nFree video content (e.g. YouTube)\nMore specialized data science hardware\nPublic trust\nYes\n\n\n2\nIndia\n18-25\nBachelor's degree\nFemale\nEducational institution\nData Scientist\nNaN\nNaN\nNaN\nNaN\n...\n1.0\n4.0\n2.0\n6.0\nA reduction in job opportunities caused by aut...\nHands-on projects,Mentorship opportunities\nReading technical books, blogs, newsletters, a...\nFurther innovation in the open-source data sci...\nUndermanagement\nI'm not sure\n\n\n3\nUnited States\n42-57\nBachelor's degree\nMale\nCommercial (for-profit) entity\nProfessor/Instructor/Researcher\n10+ years\nNaN\nModerately satisfied\nMore responsibility/opportunity for career adv...\n...\n1.0\n5.0\n4.0\n6.0\nSocial impacts from bias in data and models\nHands-on projects\nReading technical books, blogs, newsletters, a...\nNew optimized models that allow for more compl...\nTalent shortage\nNo\n\n\n4\nSingapore\n18-25\nHigh School or equivalent\nMale\nNaN\nStudent\nNaN\nNaN\nNaN\nNaN\n...\n4.0\n2.0\n3.0\n6.0\nSocial impacts from bias in data and models\nCommunity engagement and learning platforms,Ta...\nReading technical books, blogs, newsletters, a...\nFurther innovation in the open-source data sci...\nUndermanagement\nYes\n\n\n\n\n5 rows × 120 columns\n\n\n\n\n\n# Jupyter notebook cells only output the last value... unless you use print commands!\nprint(f'Number of survey responses: {len(df)}')\nprint(f'Number of survey questions: {len(df.columns)}') \n\nNumber of survey responses: 3493\nNumber of survey questions: 120\n\n\n\n# 1. Filter the dataframe to only the questions about programming language usage, and \nfiltered_df = df.filter(like='How often do you use the following languages?').copy() # Use copy to force python to make a new copy of the data, not just a reference to a subset.\n\n# 2. Rename the columns to just be the programming languages, without the question preamble\nfiltered_df.rename(columns=lambda x: x.split('-')[-1].strip() if '-' in x else x, inplace=True)\n\nprint(filtered_df.columns)\n\nIndex(['Python', 'R', 'Java', 'JavaScript', 'C/C++', 'C#', 'Julia', 'HTML/CSS',\n 'Bash/Shell', 'SQL', 'Go', 'PHP', 'Rust', 'TypeScript',\n 'Other (please indicate below)'],\n dtype='object')\n\n\n\n# Show the unique values of the `Python` column\nprint(filtered_df['Python'].unique())\n\n['Frequently' 'Sometimes' 'Always' 'Never' 'Rarely' nan]\n\n\n\n# Calculate the percentage of each response for each language\npercentage_df = filtered_df.apply(lambda x: x.value_counts(normalize=True).fillna(0) * 100).transpose()\n\n# Remove the last row, which is the \"Other\" category\npercentage_df = percentage_df[:-1]\n\n# Sort the DataFrame based on the 'Always' responses\nsorted_percentage_df = percentage_df.sort_values(by='Always', ascending=True)\n\n\n# Let's get ready to plot the 2022 data...\nfrom IPython.display import display\n\n# We are going to use the display command to update our figure over multiple cells. \n# This usually isn't necessary, but it's helpful here to see how each set of commands updates the figure\n\n# Define the custom order for plotting\norder = ['Always', 'Frequently', 'Sometimes', 'Rarely', 'Never']\n\ncolors = {\n 'Always': (8/255, 40/255, 81/255), # Replace R1, G1, B1 with the RGB values for 'Dark Blue'\n 'Frequently': (12/255, 96/255, 152/255), # Replace R2, G2, B2 with the RGB values for 'Light Ocean Blue'\n 'Sometimes': (16/255, 146/255, 136/255), # and so on...\n 'Rarely': (11/255, 88/255, 73/255),\n 'Never': (52/255, 163/255, 32/255)\n}\n\n\n# Make the plot\nfig, ax = plt.subplots(figsize=(10, 7))\nsorted_percentage_df[order].plot(kind='barh', stacked=True, ax=ax, color=[colors[label] for label in order])\nax.set_xlabel('Percentage')\nax.set_title('Frequency of Language Usage, 2022',y=1.05)\n\nplt.show() # This command draws our figure. \n\n\n\n\n\n\n\n\n\n# Add labels across the top, like in the original graph\n\n# Get the patches for the top-most bar\nnum_languages = len(sorted_percentage_df)\n\npatches = ax.patches[num_languages-1::num_languages]\n# Calculate the cumulative width of the patches for the top-most bar\ncumulative_widths = [0] * len(order)\nwidths = [patch.get_width() for patch in patches]\nfor i, width in enumerate(widths):\n cumulative_widths[i] = width + (cumulative_widths[i-1] if i > 0 else 0)\n\n\n\n# Add text labels above the bars\nfor i, (width, label) in enumerate(zip(cumulative_widths, order)):\n # Get the color of the current bar segment\n # Calculate the position for the text label\n position = width - (patches[i].get_width() / 2)\n # Add the text label to the plot\n # Adjust the y-coordinate for the text label\n y_position = len(sorted_percentage_df) - 0.3 # Adjust the 0.3 value as needed\n ax.text(position, y_position, label, ha='center', color=colors[label], fontweight='bold')\n\n# Remove the legend\nax.legend().set_visible(False)\n\n#plt.show()\ndisplay(fig) # This command shows our updated figure (we can't re-use \"plt.show()\")\n\n\n\n\n\n\n\n\n\n# Add percentage values inside each patch\nfor patch in ax.patches:\n # Get the width and height of the patch\n width, height = patch.get_width(), patch.get_height()\n \n # Calculate the position for the text label\n x = patch.get_x() + width / 2\n y = patch.get_y() + height / 2\n \n # Get the percentage value for the current patch\n percentage = \"{:.0f}%\".format(width)\n \n # Add the text label to the plot\n ax.text(x, y, percentage, ha='center', va='center', color='white', fontweight='bold')\n\ndisplay(fig) # Let's see those nice text labels!\n\n\n\n\n\n\n\n\n\n# Clean up the figure to remove spines and unecessary labels/ticks, etc..\n\n# Remove x-axis label\nax.set_xlabel('')\n\n# Remove the spines\nax.spines['top'].set_visible(False)\nax.spines['right'].set_visible(False)\nax.spines['bottom'].set_visible(False)\nax.spines['left'].set_visible(False)\n\n# Remove the y-axis tick marks\nax.tick_params(axis='y', which='both', length=0)\n\n# Remove the x-axis tick marks and labels\nax.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)\n\ndisplay(fig) # Now 100% less visually cluttered!" }, { - "objectID": "course-materials/day1.html#additional-resources", - "href": "course-materials/day1.html#additional-resources", - "title": "Intro to Python and JupyterLab", - "section": "Additional Resources", - "text": "Additional Resources\nNA" + "objectID": "course-materials/lectures/00_intro_to_python.html#lecture-agenda", + "href": "course-materials/lectures/00_intro_to_python.html#lecture-agenda", + "title": "Lecture 1 - Intro to Python and Environmental Data Science", + "section": "", + "text": "🐍 What Python?\n❓ Why Python?\n💻 How Python?\n\n\n“Python is powerful… and fast; plays well with others; runs everywhere; is friendly & easy to learn; is Open.”" }, { - "objectID": "course-materials/day3.html#class-materials", - "href": "course-materials/day3.html#class-materials", - "title": "Control and Comprehensions", - "section": "Class materials", - "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 3 / morning\n📝 Control Flows\n🙌 Coding Colab: Control flows and data science\n\n\nday 3 / afternoon\n🐼 Intro to Arrays and Series\n🙌 Coding Colab: Working with Pandas Series" + "objectID": "course-materials/lectures/00_intro_to_python.html#what-is-python", + "href": "course-materials/lectures/00_intro_to_python.html#what-is-python", + "title": "Lecture 1 - Intro to Python and Environmental Data Science", + "section": "", + "text": "Python is a general-purpose, object-oriented programming language that emphasizes code readability through its generous use of white space. Released in 1989, Python is easy to learn and a favorite of programmers and developers.\n\n\n(Python, C, C++, Java, Javascript, R, Pascal) - Take less time to write - Shorter and easier to read - Portable, meaning that they can run on different kinds of computers with few or no modifications.\nThe engine that translates and runs Python is called the Python Interpreter\n\n\"\"\" \nEntering code into this notebook cell \nand pressing [SHIFT-ENTER] will cause the \npython interpreter to execute the code\n\"\"\"\n\n \nprint(\"Hello world!\")\nprint(\"[from this notebook cell]\")\n\nHello world!\n[from this notebook cell]\n\n\n\n\"\"\"\nAlternatively, you can run a \nany python script file (.py file)\nso long as it contains valid\npython code.\n\"\"\"\n!python hello_world.py\n\nHello world!\n[from hello_world.py]\n\n\n\n \n\n\n\n\nNatural languages are the languages that people speak. They are not designed (although they are subjected to various degrees of “order”) and evolve naturally.\nFormal languages are languages that are designed by people for specific applications. - Mathematical Notation \\(E=mc^2\\) - Chemical Notation: \\(\\text{H}_2\\text{O}\\)\nProgramming languages are formal languages that have been designed to express computations.\nParsing: The process of figuring out what the structure of a sentence or statement is (in a natural language you do this subconsciously).\nFormal Languages have strict syntax for tokens and structure:\n\nMathematical syntax error: \\(E=\\$m🦆_2\\) (bad tokens & bad structure)\nChemical syntax error: \\(\\text{G}_3\\text{Z}\\) (bad tokens, but structure is okay)\n\n\n\n\n\nAmbiguity: Natural languages are full of ambiguity, which people parse using contextual clues. Formal languages are nearly or completely unambiguous; any statement has exactly one meaning, regardless of context.\nRedundancy: In order to make up for ambiguity, natural languages employ lots of redundancy. Formal languages are less redundant and more concise.\nLiteralness: Formal languages mean exactly what they say. Natural languages employ idioms and metaphors.\n\nThe inherent differences between familiar natural languages and unfamiliar formal languages creates one of the greatest challenges in learning to code.\n\n\n\n\npoetry: Words are used for sound and meaning. Ambiguity is common and often deliberate.\nprose: The literal meaning of words is important, and the structure contributes meaning. Amenable to analysis but still often ambiguous.\nprogram: Meaning is unambiguous and literal, and can be understood entirely by analysis of the tokens and structure.\n\n\n\n\nFormal languages are very dense, so it takes longer to read them.\nStructure is very important, so it is usually not a good idea to read from top to bottom, left to right. Instead, learn to parse the program in your head, identifying the tokens and interpreting the structure.\nDetails matter. Little things like spelling errors and bad punctuation, which you can get away with in natural languages, will make a big difference in a formal language." }, { - "objectID": "course-materials/day3.html#end-of-day-practice", - "href": "course-materials/day3.html#end-of-day-practice", - "title": "Control and Comprehensions", - "section": "End-of-day practice", - "text": "End-of-day practice\nComplete the following tasks / activities before heading home for the day!\n\n Day 3 Practice: Using Pandas Series for Data Analysis" + "objectID": "course-materials/lectures/00_intro_to_python.html#why-python", + "href": "course-materials/lectures/00_intro_to_python.html#why-python", + "title": "Lecture 1 - Intro to Python and Environmental Data Science", + "section": "", + "text": "IBM: R vs. Python\nPython is a multi-purpose language with a readable syntax that’s easy to learn. Programmers use Python to delve into data analysis or use machine learning in scalable production environments.\nR is built by statisticians and leans heavily into statistical models and specialized analytics. Data scientists use R for deep statistical analysis, supported by just a few lines of code and beautiful data visualizations.\nIn general, R is better for initial exploratory analyses, statistical analyses, and data visualization.\nIn general, Python is better for working with APIs, writing maintainable, production-ready code, working with a diverse array of data, and building machine learning or AI workflows.\nBoth languages can do anything. Most data science teams use both languages. (and others too.. Matlab, Javascript, Go, Fortran, etc…)\n\nfrom IPython.lib.display import YouTubeVideo\nYouTubeVideo('GVvfNgszdU0')\n\n\n \n \n\n\n\n\nAnaconda State of Data Science\nData from 2021:" }, { - "objectID": "course-materials/day3.html#additional-resources", - "href": "course-materials/day3.html#additional-resources", - "title": "Control and Comprehensions", - "section": "Additional Resources", - "text": "Additional Resources" + "objectID": "course-materials/lectures/00_intro_to_python.html#what-about-2023-data", + "href": "course-materials/lectures/00_intro_to_python.html#what-about-2023-data", + "title": "Lecture 1 - Intro to Python and Environmental Data Science", + "section": "", + "text": "The data are available here…\nBut, unfortunately, they changed the format of the responses concerning language use between 2022 and 2023. But we can take look at the 2022 data…\nLet’s do some python data science!\n\n# First, we need to gather our tools\nimport pandas as pd # This is the most common data science package used in python!\nimport matplotlib.pyplot as plt # This is the most widely-used plotting package.\n\nimport requests # This package helps us make https requests \nimport io # This package is good at handling input/output streams\n\n\n# Here's the url for the 2022 data. It has a similar structure to the 2021 data, so we can compare them.\nurl = \"https://static.anaconda.cloud/content/Anaconda_2022_State_of_Data_Science_+Raw_Data.csv\"\nresponse = requests.get(url)\n\n\n# Read the response into a dataframe, using the io.StringIO function to feed the response.txt.\n# Also, skip the first three rows\ndf = pd.read_csv(io.StringIO(response.text), skiprows=3)\n\n# Our very first dataframe!\ndf.head()\n\n# Jupyter notebook cells only output the last value requested...\n\n\n\n\n\n\n\n\nIn which country is your primary residence?\nWhich of the following age groups best describes you?\nWhat is the highest level of education you've achieved?\nGender: How do you identify? - Selected Choice\nThe organization I work for is best classified as a:\nWhat is your primary role? - Selected Choice\nFor how many years have you been in your current role?\nWhat position did you hold prior to this? - Selected Choice\nHow would you rate your job satisfaction in your current role?\nWhat would cause you to leave your current employer for a new job? Please select the top option besides pay/benefits. - Selected Choice\n...\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Help choose the best model types to solve specific problems\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Speed up the ML pipeline by automating certain workflows (data cleaning, etc.)\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Tune the model once performance (such as accuracy, etc.) starts to degrade\nWhat should an AutoML tool do for data scientists? Please drag answers to rank from most important to least important. (1=most important) - Other (please indicate)\nWhat do you think is the biggest problem in the data science/AI/ML space today? - Selected Choice\nWhat tools and resources do you feel are lacking for data scientists who want to learn and develop their skills? (Select all that apply). - Selected Choice\nHow do you typically learn about new tools and topics relevant to your role? (Select all that apply). - Selected Choice\nWhat are you most hoping to see from the data science industry this year? - Selected Choice\nWhat do you believe is the biggest challenge in the open-source community today? - Selected Choice\nHave supply chain disruption problems, such as the ongoing chip shortage, impacted your access to computing resources?\n\n\n\n\n0\nUnited States\n26-41\nDoctoral degree\nMale\nEducational institution\nData Scientist\n1-2 years\nData Scientist\nVery satisfied\nMore flexibility with my work hours\n...\n4.0\n2.0\n5.0\n6.0\nA reduction in job opportunities caused by aut...\nHands-on projects,Mentorship opportunities\nReading technical books, blogs, newsletters, a...\nFurther innovation in the open-source data sci...\nUndermanagement\nNo\n\n\n1\nUnited States\n42-57\nDoctoral degree\nMale\nCommercial (for-profit) entity\nProduct Manager\n5-6 years\nNaN\nVery satisfied\nMore responsibility/opportunity for career adv...\n...\n2.0\n5.0\n4.0\n6.0\nSocial impacts from bias in data and models\nTailored learning paths\nFree video content (e.g. YouTube)\nMore specialized data science hardware\nPublic trust\nYes\n\n\n2\nIndia\n18-25\nBachelor's degree\nFemale\nEducational institution\nData Scientist\nNaN\nNaN\nNaN\nNaN\n...\n1.0\n4.0\n2.0\n6.0\nA reduction in job opportunities caused by aut...\nHands-on projects,Mentorship opportunities\nReading technical books, blogs, newsletters, a...\nFurther innovation in the open-source data sci...\nUndermanagement\nI'm not sure\n\n\n3\nUnited States\n42-57\nBachelor's degree\nMale\nCommercial (for-profit) entity\nProfessor/Instructor/Researcher\n10+ years\nNaN\nModerately satisfied\nMore responsibility/opportunity for career adv...\n...\n1.0\n5.0\n4.0\n6.0\nSocial impacts from bias in data and models\nHands-on projects\nReading technical books, blogs, newsletters, a...\nNew optimized models that allow for more compl...\nTalent shortage\nNo\n\n\n4\nSingapore\n18-25\nHigh School or equivalent\nMale\nNaN\nStudent\nNaN\nNaN\nNaN\nNaN\n...\n4.0\n2.0\n3.0\n6.0\nSocial impacts from bias in data and models\nCommunity engagement and learning platforms,Ta...\nReading technical books, blogs, newsletters, a...\nFurther innovation in the open-source data sci...\nUndermanagement\nYes\n\n\n\n\n5 rows × 120 columns\n\n\n\n\n\n# Jupyter notebook cells only output the last value... unless you use print commands!\nprint(f'Number of survey responses: {len(df)}')\nprint(f'Number of survey questions: {len(df.columns)}') \n\nNumber of survey responses: 3493\nNumber of survey questions: 120\n\n\n\n# 1. Filter the dataframe to only the questions about programming language usage, and \nfiltered_df = df.filter(like='How often do you use the following languages?').copy() # Use copy to force python to make a new copy of the data, not just a reference to a subset.\n\n# 2. Rename the columns to just be the programming languages, without the question preamble\nfiltered_df.rename(columns=lambda x: x.split('-')[-1].strip() if '-' in x else x, inplace=True)\n\nprint(filtered_df.columns)\n\nIndex(['Python', 'R', 'Java', 'JavaScript', 'C/C++', 'C#', 'Julia', 'HTML/CSS',\n 'Bash/Shell', 'SQL', 'Go', 'PHP', 'Rust', 'TypeScript',\n 'Other (please indicate below)'],\n dtype='object')\n\n\n\n# Show the unique values of the `Python` column\nprint(filtered_df['Python'].unique())\n\n['Frequently' 'Sometimes' 'Always' 'Never' 'Rarely' nan]\n\n\n\n# Calculate the percentage of each response for each language\npercentage_df = filtered_df.apply(lambda x: x.value_counts(normalize=True).fillna(0) * 100).transpose()\n\n# Remove the last row, which is the \"Other\" category\npercentage_df = percentage_df[:-1]\n\n# Sort the DataFrame based on the 'Always' responses\nsorted_percentage_df = percentage_df.sort_values(by='Always', ascending=True)\n\n\n# Let's get ready to plot the 2022 data...\nfrom IPython.display import display\n\n# We are going to use the display command to update our figure over multiple cells. \n# This usually isn't necessary, but it's helpful here to see how each set of commands updates the figure\n\n# Define the custom order for plotting\norder = ['Always', 'Frequently', 'Sometimes', 'Rarely', 'Never']\n\ncolors = {\n 'Always': (8/255, 40/255, 81/255), # Replace R1, G1, B1 with the RGB values for 'Dark Blue'\n 'Frequently': (12/255, 96/255, 152/255), # Replace R2, G2, B2 with the RGB values for 'Light Ocean Blue'\n 'Sometimes': (16/255, 146/255, 136/255), # and so on...\n 'Rarely': (11/255, 88/255, 73/255),\n 'Never': (52/255, 163/255, 32/255)\n}\n\n\n# Make the plot\nfig, ax = plt.subplots(figsize=(10, 7))\nsorted_percentage_df[order].plot(kind='barh', stacked=True, ax=ax, color=[colors[label] for label in order])\nax.set_xlabel('Percentage')\nax.set_title('Frequency of Language Usage, 2022',y=1.05)\n\nplt.show() # This command draws our figure. \n\n\n\n\n\n\n\n\n\n# Add labels across the top, like in the original graph\n\n# Get the patches for the top-most bar\nnum_languages = len(sorted_percentage_df)\n\npatches = ax.patches[num_languages-1::num_languages]\n# Calculate the cumulative width of the patches for the top-most bar\ncumulative_widths = [0] * len(order)\nwidths = [patch.get_width() for patch in patches]\nfor i, width in enumerate(widths):\n cumulative_widths[i] = width + (cumulative_widths[i-1] if i > 0 else 0)\n\n\n\n# Add text labels above the bars\nfor i, (width, label) in enumerate(zip(cumulative_widths, order)):\n # Get the color of the current bar segment\n # Calculate the position for the text label\n position = width - (patches[i].get_width() / 2)\n # Add the text label to the plot\n # Adjust the y-coordinate for the text label\n y_position = len(sorted_percentage_df) - 0.3 # Adjust the 0.3 value as needed\n ax.text(position, y_position, label, ha='center', color=colors[label], fontweight='bold')\n\n# Remove the legend\nax.legend().set_visible(False)\n\n#plt.show()\ndisplay(fig) # This command shows our updated figure (we can't re-use \"plt.show()\")\n\n\n\n\n\n\n\n\n\n# Add percentage values inside each patch\nfor patch in ax.patches:\n # Get the width and height of the patch\n width, height = patch.get_width(), patch.get_height()\n \n # Calculate the position for the text label\n x = patch.get_x() + width / 2\n y = patch.get_y() + height / 2\n \n # Get the percentage value for the current patch\n percentage = \"{:.0f}%\".format(width)\n \n # Add the text label to the plot\n ax.text(x, y, percentage, ha='center', va='center', color='white', fontweight='bold')\n\ndisplay(fig) # Let's see those nice text labels!\n\n\n\n\n\n\n\n\n\n# Clean up the figure to remove spines and unecessary labels/ticks, etc..\n\n# Remove x-axis label\nax.set_xlabel('')\n\n# Remove the spines\nax.spines['top'].set_visible(False)\nax.spines['right'].set_visible(False)\nax.spines['bottom'].set_visible(False)\nax.spines['left'].set_visible(False)\n\n# Remove the y-axis tick marks\nax.tick_params(axis='y', which='both', length=0)\n\n# Remove the x-axis tick marks and labels\nax.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)\n\ndisplay(fig) # Now 100% less visually cluttered!" }, { - "objectID": "course-materials/day5.html#class-materials", - "href": "course-materials/day5.html#class-materials", - "title": "Transforming Data in Pandas", - "section": "Class materials", - "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 5 / morning\n📝 Selecting and Filtering in Pandas\n🐼 Cleaning Data\n\n\nday 5 / afternoon\n🙌 Coding Colab: Cleaning DataFrames\nEnd-of-day practice" + "objectID": "course-materials/lectures/00_intro_to_python.html#the-end", + "href": "course-materials/lectures/00_intro_to_python.html#the-end", + "title": "Lecture 1 - Intro to Python and Environmental Data Science", + "section": "The End", + "text": "The End" }, { - "objectID": "course-materials/day5.html#end-of-day-practice", - "href": "course-materials/day5.html#end-of-day-practice", - "title": "Transforming Data in Pandas", - "section": "End-of-day practice", - "text": "End-of-day practice\nComplete the following tasks / activities before heading home for the day!\n\n Day 5 Practice: 🍌 Analyzing the “Banana Index” 🍌" + "objectID": "course-materials/lectures/02_helpGPT.html", + "href": "course-materials/lectures/02_helpGPT.html", + "title": "Getting Help", + "section": "", + "text": "When you get an error, or an unexpected result, or you are not sure what to do…\n\n\n\nFinding help inside Python\nFinding help outside Python\n\n\n\n\nHow do we interrogate the data (and other objects) we encounter while coding?\n\nmy_var = 'some_unknown_thing'\nWhat is it?\n\nmy_var = 'some_unknown_thing'\ntype(my_var)\n\nstr\n\n\nThe type() command tells you what sort of thing an object is.\n\n\n\nHow do we interrogate the data (and other objects) we encounter while coding?\n\nmy_var = 'some_unknown_thing'\nWhat can I do with it?\n\nmy_var = ['my', 'list', 'of', 'things']\nmy_var = my_var + ['a', 'nother', 'list']\ndir(my_var)\n\n['__add__',\n '__class__',\n '__class_getitem__',\n '__contains__',\n '__delattr__',\n '__delitem__',\n '__dir__',\n '__doc__',\n '__eq__',\n '__format__',\n '__ge__',\n '__getattribute__',\n '__getitem__',\n '__gt__',\n '__hash__',\n '__iadd__',\n '__imul__',\n '__init__',\n '__init_subclass__',\n '__iter__',\n '__le__',\n '__len__',\n '__lt__',\n '__mul__',\n '__ne__',\n '__new__',\n '__reduce__',\n '__reduce_ex__',\n '__repr__',\n '__reversed__',\n '__rmul__',\n '__setattr__',\n '__setitem__',\n '__sizeof__',\n '__str__',\n '__subclasshook__',\n 'append',\n 'clear',\n 'copy',\n 'count',\n 'extend',\n 'index',\n 'insert',\n 'pop',\n 'remove',\n 'reverse',\n 'sort']\n\n\nThe dir() command tells you what attributes an object has.\n\n\n\n\n# using the dir command\nmy_list = ['a', 'b', 'c']\nlist(reversed(dir(my_list)))\nmy_list.sort?\n\nSignature: my_list.sort(*, key=None, reverse=False)\nDocstring:\nSort the list in ascending order and return None.\n\nThe sort is in-place (i.e. the list itself is modified) and stable (i.e. the\norder of two equal elements is maintained).\n\nIf a key function is given, apply it once to each list item and sort them,\nascending or descending, according to their function values.\n\nThe reverse flag can be set to sort in descending order.\nType: builtin_function_or_method\n\n\n\n\n\n__attributes__ are internal (or private) attributes associated with all python objects.\nThese are called “magic” or “dunder” methods.\ndunder → “double under” → __\n\n\n\nEverything in Python is an object, and every operation corresponds to a method.\n\n# __add__ and __mul__. __len__. (None). 2 Wrongs.\n\n3 + 4\n\n7\n\n\n\n\n\nGenerally, you will not have to worry about dunder methods.\nHere’s a shortcut function to look at only non-dunder methods\n\n\n\n\n\nYou can use the <tab> key in iPython (or Jupyter environments) to explore object methods. By default, only “public” (non-dunder) methods are returned.\n\n\n\nYou can usually just pause typing and VSCode will provide object introspection:\n\nstring = 'some letters'\n\n\n\n\n\nMost objects - especially packages and libraries - provide help documentation that can be accessed using the python helper function… called… help()\n\n# 3, help, str, soil...\nimport math\nhelp(math)\n\nHelp on module math:\n\nNAME\n math\n\nMODULE REFERENCE\n https://docs.python.org/3.10/library/math.html\n \n The following documentation is automatically generated from the Python\n source files. It may be incomplete, incorrect or include features that\n are considered implementation detail and may vary between Python\n implementations. When in doubt, consult the module reference at the\n location listed above.\n\nDESCRIPTION\n This module provides access to the mathematical functions\n defined by the C standard.\n\nFUNCTIONS\n acos(x, /)\n Return the arc cosine (measured in radians) of x.\n \n The result is between 0 and pi.\n \n acosh(x, /)\n Return the inverse hyperbolic cosine of x.\n \n asin(x, /)\n Return the arc sine (measured in radians) of x.\n \n The result is between -pi/2 and pi/2.\n \n asinh(x, /)\n Return the inverse hyperbolic sine of x.\n \n atan(x, /)\n Return the arc tangent (measured in radians) of x.\n \n The result is between -pi/2 and pi/2.\n \n atan2(y, x, /)\n Return the arc tangent (measured in radians) of y/x.\n \n Unlike atan(y/x), the signs of both x and y are considered.\n \n atanh(x, /)\n Return the inverse hyperbolic tangent of x.\n \n ceil(x, /)\n Return the ceiling of x as an Integral.\n \n This is the smallest integer >= x.\n \n comb(n, k, /)\n Number of ways to choose k items from n items without repetition and without order.\n \n Evaluates to n! / (k! * (n - k)!) when k <= n and evaluates\n to zero when k > n.\n \n Also called the binomial coefficient because it is equivalent\n to the coefficient of k-th term in polynomial expansion of the\n expression (1 + x)**n.\n \n Raises TypeError if either of the arguments are not integers.\n Raises ValueError if either of the arguments are negative.\n \n copysign(x, y, /)\n Return a float with the magnitude (absolute value) of x but the sign of y.\n \n On platforms that support signed zeros, copysign(1.0, -0.0)\n returns -1.0.\n \n cos(x, /)\n Return the cosine of x (measured in radians).\n \n cosh(x, /)\n Return the hyperbolic cosine of x.\n \n degrees(x, /)\n Convert angle x from radians to degrees.\n \n dist(p, q, /)\n Return the Euclidean distance between two points p and q.\n \n The points should be specified as sequences (or iterables) of\n coordinates. Both inputs must have the same dimension.\n \n Roughly equivalent to:\n sqrt(sum((px - qx) ** 2.0 for px, qx in zip(p, q)))\n \n erf(x, /)\n Error function at x.\n \n erfc(x, /)\n Complementary error function at x.\n \n exp(x, /)\n Return e raised to the power of x.\n \n expm1(x, /)\n Return exp(x)-1.\n \n This function avoids the loss of precision involved in the direct evaluation of exp(x)-1 for small x.\n \n fabs(x, /)\n Return the absolute value of the float x.\n \n factorial(x, /)\n Find x!.\n \n Raise a ValueError if x is negative or non-integral.\n \n floor(x, /)\n Return the floor of x as an Integral.\n \n This is the largest integer <= x.\n \n fmod(x, y, /)\n Return fmod(x, y), according to platform C.\n \n x % y may differ.\n \n frexp(x, /)\n Return the mantissa and exponent of x, as pair (m, e).\n \n m is a float and e is an int, such that x = m * 2.**e.\n If x is 0, m and e are both 0. Else 0.5 <= abs(m) < 1.0.\n \n fsum(seq, /)\n Return an accurate floating point sum of values in the iterable seq.\n \n Assumes IEEE-754 floating point arithmetic.\n \n gamma(x, /)\n Gamma function at x.\n \n gcd(*integers)\n Greatest Common Divisor.\n \n hypot(...)\n hypot(*coordinates) -> value\n \n Multidimensional Euclidean distance from the origin to a point.\n \n Roughly equivalent to:\n sqrt(sum(x**2 for x in coordinates))\n \n For a two dimensional point (x, y), gives the hypotenuse\n using the Pythagorean theorem: sqrt(x*x + y*y).\n \n For example, the hypotenuse of a 3/4/5 right triangle is:\n \n >>> hypot(3.0, 4.0)\n 5.0\n \n isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0)\n Determine whether two floating point numbers are close in value.\n \n rel_tol\n maximum difference for being considered \"close\", relative to the\n magnitude of the input values\n abs_tol\n maximum difference for being considered \"close\", regardless of the\n magnitude of the input values\n \n Return True if a is close in value to b, and False otherwise.\n \n For the values to be considered close, the difference between them\n must be smaller than at least one of the tolerances.\n \n -inf, inf and NaN behave similarly to the IEEE 754 Standard. That\n is, NaN is not close to anything, even itself. inf and -inf are\n only close to themselves.\n \n isfinite(x, /)\n Return True if x is neither an infinity nor a NaN, and False otherwise.\n \n isinf(x, /)\n Return True if x is a positive or negative infinity, and False otherwise.\n \n isnan(x, /)\n Return True if x is a NaN (not a number), and False otherwise.\n \n isqrt(n, /)\n Return the integer part of the square root of the input.\n \n lcm(*integers)\n Least Common Multiple.\n \n ldexp(x, i, /)\n Return x * (2**i).\n \n This is essentially the inverse of frexp().\n \n lgamma(x, /)\n Natural logarithm of absolute value of Gamma function at x.\n \n log(...)\n log(x, [base=math.e])\n Return the logarithm of x to the given base.\n \n If the base not specified, returns the natural logarithm (base e) of x.\n \n log10(x, /)\n Return the base 10 logarithm of x.\n \n log1p(x, /)\n Return the natural logarithm of 1+x (base e).\n \n The result is computed in a way which is accurate for x near zero.\n \n log2(x, /)\n Return the base 2 logarithm of x.\n \n modf(x, /)\n Return the fractional and integer parts of x.\n \n Both results carry the sign of x and are floats.\n \n nextafter(x, y, /)\n Return the next floating-point value after x towards y.\n \n perm(n, k=None, /)\n Number of ways to choose k items from n items without repetition and with order.\n \n Evaluates to n! / (n - k)! when k <= n and evaluates\n to zero when k > n.\n \n If k is not specified or is None, then k defaults to n\n and the function returns n!.\n \n Raises TypeError if either of the arguments are not integers.\n Raises ValueError if either of the arguments are negative.\n \n pow(x, y, /)\n Return x**y (x to the power of y).\n \n prod(iterable, /, *, start=1)\n Calculate the product of all the elements in the input iterable.\n \n The default start value for the product is 1.\n \n When the iterable is empty, return the start value. This function is\n intended specifically for use with numeric values and may reject\n non-numeric types.\n \n radians(x, /)\n Convert angle x from degrees to radians.\n \n remainder(x, y, /)\n Difference between x and the closest integer multiple of y.\n \n Return x - n*y where n*y is the closest integer multiple of y.\n In the case where x is exactly halfway between two multiples of\n y, the nearest even value of n is used. The result is always exact.\n \n sin(x, /)\n Return the sine of x (measured in radians).\n \n sinh(x, /)\n Return the hyperbolic sine of x.\n \n sqrt(x, /)\n Return the square root of x.\n \n tan(x, /)\n Return the tangent of x (measured in radians).\n \n tanh(x, /)\n Return the hyperbolic tangent of x.\n \n trunc(x, /)\n Truncates the Real x to the nearest Integral toward 0.\n \n Uses the __trunc__ magic method.\n \n ulp(x, /)\n Return the value of the least significant bit of the float x.\n\nDATA\n e = 2.718281828459045\n inf = inf\n nan = nan\n pi = 3.141592653589793\n tau = 6.283185307179586\n\nFILE\n /Users/kellycaylor/mambaforge/envs/eds217_2023/lib/python3.10/lib-dynload/math.cpython-310-darwin.so\n\n\n\n\n\n\n\nIn the iPython shell (or the Jupyter Notebook/Jupyter Lab environment), you can also access the help() command using ?.\n\nmath\n\n<module 'math' from '/Users/kellycaylor/mambaforge/envs/eds217_2023/lib/python3.10/lib-dynload/math.cpython-310-darwin.so'>\n\n\n\n\n\nIn the iPython shell (or the Jupyter Notebook/Jupyter Lab environment) you can use ?? to see the actual source code of python code\n\n\n\n?? only shows source code for for python functions that aren’t compiled to C code. Otherwise, it will show the same information as ?\n\n\n\n\n\n\n\n\n\n\nThe print command is the most commonly used tool for beginners to understand errors\n# This code generates a `TypeError` that \n# x is not the right kind of variable.\ndo_something(x) \nThe print command is the most commonly used debugging tool for beginners.\n\n\n\nPython has a string format called f-strings. These are strings that are prefixed with an f character and allow in-line variable substitution.\n\n# print using c-style format statements\nx = 3.45\nprint(f\"x = {x}\")\n\nx = 3.45\n\n\n\ndef do_something(x):\n x = x / 2 \n return x\n\n# This code generates a `TypeError` that \n# x is not the right kind of variable.\nx = 'f'\n# Check and see what is X?\nprint(\n f\"calling do_something() with x={x}\" # Python f-string\n)\n\ndo_something(x) \n\ncalling do_something() with x=f\n\n\n\n---------------------------------------------------------------------------\nTypeError Traceback (most recent call last)\nCell In[13], line 13\n 8 # Check and see what is X?\n 9 print(\n 10 f\"calling do_something() with x={x}\" # Python f-string\n 11 )\n---> 13 do_something(x)\n\nCell In[13], line 2, in do_something(x)\n 1 def do_something(x):\n----> 2 x = x / 2 \n 3 return x\n\nTypeError: unsupported operand type(s) for /: 'str' and 'int'\n\n\n\n\n\n\nAs of Fall 2002: - O’Rielly Books (Requires UCSB login) - My O’Rielly pdf library: https://bit.ly/eds-217-books (Requires UCSB login)\nAs of Fall, 2022: - Python Docs - Stack Overflow - Talk Python - Ask Python\nAs of Fall, 2023:\nLLMs.\n\nChatGPT - Need $ for GPT-4, 3.X fine debugger, but not always a great programmer.\nGitHub CoPilot - Should be able to get a free student account. Works great in VSCode; we will set this up together later in the course." }, { - "objectID": "course-materials/day5.html#additional-resources", - "href": "course-materials/day5.html#additional-resources", - "title": "Transforming Data in Pandas", - "section": "Additional Resources", - "text": "Additional Resources" + "objectID": "course-materials/lectures/02_helpGPT.html#finding-help", + "href": "course-materials/lectures/02_helpGPT.html#finding-help", + "title": "Getting Help", + "section": "", + "text": "When you get an error, or an unexpected result, or you are not sure what to do…\n\n\n\nFinding help inside Python\nFinding help outside Python\n\n\n\n\nHow do we interrogate the data (and other objects) we encounter while coding?\n\nmy_var = 'some_unknown_thing'\nWhat is it?\n\nmy_var = 'some_unknown_thing'\ntype(my_var)\n\nstr\n\n\nThe type() command tells you what sort of thing an object is.\n\n\n\nHow do we interrogate the data (and other objects) we encounter while coding?\n\nmy_var = 'some_unknown_thing'\nWhat can I do with it?\n\nmy_var = ['my', 'list', 'of', 'things']\nmy_var = my_var + ['a', 'nother', 'list']\ndir(my_var)\n\n['__add__',\n '__class__',\n '__class_getitem__',\n '__contains__',\n '__delattr__',\n '__delitem__',\n '__dir__',\n '__doc__',\n '__eq__',\n '__format__',\n '__ge__',\n '__getattribute__',\n '__getitem__',\n '__gt__',\n '__hash__',\n '__iadd__',\n '__imul__',\n '__init__',\n '__init_subclass__',\n '__iter__',\n '__le__',\n '__len__',\n '__lt__',\n '__mul__',\n '__ne__',\n '__new__',\n '__reduce__',\n '__reduce_ex__',\n '__repr__',\n '__reversed__',\n '__rmul__',\n '__setattr__',\n '__setitem__',\n '__sizeof__',\n '__str__',\n '__subclasshook__',\n 'append',\n 'clear',\n 'copy',\n 'count',\n 'extend',\n 'index',\n 'insert',\n 'pop',\n 'remove',\n 'reverse',\n 'sort']\n\n\nThe dir() command tells you what attributes an object has.\n\n\n\n\n# using the dir command\nmy_list = ['a', 'b', 'c']\nlist(reversed(dir(my_list)))\nmy_list.sort?\n\nSignature: my_list.sort(*, key=None, reverse=False)\nDocstring:\nSort the list in ascending order and return None.\n\nThe sort is in-place (i.e. the list itself is modified) and stable (i.e. the\norder of two equal elements is maintained).\n\nIf a key function is given, apply it once to each list item and sort them,\nascending or descending, according to their function values.\n\nThe reverse flag can be set to sort in descending order.\nType: builtin_function_or_method\n\n\n\n\n\n__attributes__ are internal (or private) attributes associated with all python objects.\nThese are called “magic” or “dunder” methods.\ndunder → “double under” → __\n\n\n\nEverything in Python is an object, and every operation corresponds to a method.\n\n# __add__ and __mul__. __len__. (None). 2 Wrongs.\n\n3 + 4\n\n7\n\n\n\n\n\nGenerally, you will not have to worry about dunder methods.\nHere’s a shortcut function to look at only non-dunder methods\n\n\n\n\n\nYou can use the <tab> key in iPython (or Jupyter environments) to explore object methods. By default, only “public” (non-dunder) methods are returned.\n\n\n\nYou can usually just pause typing and VSCode will provide object introspection:\n\nstring = 'some letters'\n\n\n\n\n\nMost objects - especially packages and libraries - provide help documentation that can be accessed using the python helper function… called… help()\n\n# 3, help, str, soil...\nimport math\nhelp(math)\n\nHelp on module math:\n\nNAME\n math\n\nMODULE REFERENCE\n https://docs.python.org/3.10/library/math.html\n \n The following documentation is automatically generated from the Python\n source files. It may be incomplete, incorrect or include features that\n are considered implementation detail and may vary between Python\n implementations. When in doubt, consult the module reference at the\n location listed above.\n\nDESCRIPTION\n This module provides access to the mathematical functions\n defined by the C standard.\n\nFUNCTIONS\n acos(x, /)\n Return the arc cosine (measured in radians) of x.\n \n The result is between 0 and pi.\n \n acosh(x, /)\n Return the inverse hyperbolic cosine of x.\n \n asin(x, /)\n Return the arc sine (measured in radians) of x.\n \n The result is between -pi/2 and pi/2.\n \n asinh(x, /)\n Return the inverse hyperbolic sine of x.\n \n atan(x, /)\n Return the arc tangent (measured in radians) of x.\n \n The result is between -pi/2 and pi/2.\n \n atan2(y, x, /)\n Return the arc tangent (measured in radians) of y/x.\n \n Unlike atan(y/x), the signs of both x and y are considered.\n \n atanh(x, /)\n Return the inverse hyperbolic tangent of x.\n \n ceil(x, /)\n Return the ceiling of x as an Integral.\n \n This is the smallest integer >= x.\n \n comb(n, k, /)\n Number of ways to choose k items from n items without repetition and without order.\n \n Evaluates to n! / (k! * (n - k)!) when k <= n and evaluates\n to zero when k > n.\n \n Also called the binomial coefficient because it is equivalent\n to the coefficient of k-th term in polynomial expansion of the\n expression (1 + x)**n.\n \n Raises TypeError if either of the arguments are not integers.\n Raises ValueError if either of the arguments are negative.\n \n copysign(x, y, /)\n Return a float with the magnitude (absolute value) of x but the sign of y.\n \n On platforms that support signed zeros, copysign(1.0, -0.0)\n returns -1.0.\n \n cos(x, /)\n Return the cosine of x (measured in radians).\n \n cosh(x, /)\n Return the hyperbolic cosine of x.\n \n degrees(x, /)\n Convert angle x from radians to degrees.\n \n dist(p, q, /)\n Return the Euclidean distance between two points p and q.\n \n The points should be specified as sequences (or iterables) of\n coordinates. Both inputs must have the same dimension.\n \n Roughly equivalent to:\n sqrt(sum((px - qx) ** 2.0 for px, qx in zip(p, q)))\n \n erf(x, /)\n Error function at x.\n \n erfc(x, /)\n Complementary error function at x.\n \n exp(x, /)\n Return e raised to the power of x.\n \n expm1(x, /)\n Return exp(x)-1.\n \n This function avoids the loss of precision involved in the direct evaluation of exp(x)-1 for small x.\n \n fabs(x, /)\n Return the absolute value of the float x.\n \n factorial(x, /)\n Find x!.\n \n Raise a ValueError if x is negative or non-integral.\n \n floor(x, /)\n Return the floor of x as an Integral.\n \n This is the largest integer <= x.\n \n fmod(x, y, /)\n Return fmod(x, y), according to platform C.\n \n x % y may differ.\n \n frexp(x, /)\n Return the mantissa and exponent of x, as pair (m, e).\n \n m is a float and e is an int, such that x = m * 2.**e.\n If x is 0, m and e are both 0. Else 0.5 <= abs(m) < 1.0.\n \n fsum(seq, /)\n Return an accurate floating point sum of values in the iterable seq.\n \n Assumes IEEE-754 floating point arithmetic.\n \n gamma(x, /)\n Gamma function at x.\n \n gcd(*integers)\n Greatest Common Divisor.\n \n hypot(...)\n hypot(*coordinates) -> value\n \n Multidimensional Euclidean distance from the origin to a point.\n \n Roughly equivalent to:\n sqrt(sum(x**2 for x in coordinates))\n \n For a two dimensional point (x, y), gives the hypotenuse\n using the Pythagorean theorem: sqrt(x*x + y*y).\n \n For example, the hypotenuse of a 3/4/5 right triangle is:\n \n >>> hypot(3.0, 4.0)\n 5.0\n \n isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0)\n Determine whether two floating point numbers are close in value.\n \n rel_tol\n maximum difference for being considered \"close\", relative to the\n magnitude of the input values\n abs_tol\n maximum difference for being considered \"close\", regardless of the\n magnitude of the input values\n \n Return True if a is close in value to b, and False otherwise.\n \n For the values to be considered close, the difference between them\n must be smaller than at least one of the tolerances.\n \n -inf, inf and NaN behave similarly to the IEEE 754 Standard. That\n is, NaN is not close to anything, even itself. inf and -inf are\n only close to themselves.\n \n isfinite(x, /)\n Return True if x is neither an infinity nor a NaN, and False otherwise.\n \n isinf(x, /)\n Return True if x is a positive or negative infinity, and False otherwise.\n \n isnan(x, /)\n Return True if x is a NaN (not a number), and False otherwise.\n \n isqrt(n, /)\n Return the integer part of the square root of the input.\n \n lcm(*integers)\n Least Common Multiple.\n \n ldexp(x, i, /)\n Return x * (2**i).\n \n This is essentially the inverse of frexp().\n \n lgamma(x, /)\n Natural logarithm of absolute value of Gamma function at x.\n \n log(...)\n log(x, [base=math.e])\n Return the logarithm of x to the given base.\n \n If the base not specified, returns the natural logarithm (base e) of x.\n \n log10(x, /)\n Return the base 10 logarithm of x.\n \n log1p(x, /)\n Return the natural logarithm of 1+x (base e).\n \n The result is computed in a way which is accurate for x near zero.\n \n log2(x, /)\n Return the base 2 logarithm of x.\n \n modf(x, /)\n Return the fractional and integer parts of x.\n \n Both results carry the sign of x and are floats.\n \n nextafter(x, y, /)\n Return the next floating-point value after x towards y.\n \n perm(n, k=None, /)\n Number of ways to choose k items from n items without repetition and with order.\n \n Evaluates to n! / (n - k)! when k <= n and evaluates\n to zero when k > n.\n \n If k is not specified or is None, then k defaults to n\n and the function returns n!.\n \n Raises TypeError if either of the arguments are not integers.\n Raises ValueError if either of the arguments are negative.\n \n pow(x, y, /)\n Return x**y (x to the power of y).\n \n prod(iterable, /, *, start=1)\n Calculate the product of all the elements in the input iterable.\n \n The default start value for the product is 1.\n \n When the iterable is empty, return the start value. This function is\n intended specifically for use with numeric values and may reject\n non-numeric types.\n \n radians(x, /)\n Convert angle x from degrees to radians.\n \n remainder(x, y, /)\n Difference between x and the closest integer multiple of y.\n \n Return x - n*y where n*y is the closest integer multiple of y.\n In the case where x is exactly halfway between two multiples of\n y, the nearest even value of n is used. The result is always exact.\n \n sin(x, /)\n Return the sine of x (measured in radians).\n \n sinh(x, /)\n Return the hyperbolic sine of x.\n \n sqrt(x, /)\n Return the square root of x.\n \n tan(x, /)\n Return the tangent of x (measured in radians).\n \n tanh(x, /)\n Return the hyperbolic tangent of x.\n \n trunc(x, /)\n Truncates the Real x to the nearest Integral toward 0.\n \n Uses the __trunc__ magic method.\n \n ulp(x, /)\n Return the value of the least significant bit of the float x.\n\nDATA\n e = 2.718281828459045\n inf = inf\n nan = nan\n pi = 3.141592653589793\n tau = 6.283185307179586\n\nFILE\n /Users/kellycaylor/mambaforge/envs/eds217_2023/lib/python3.10/lib-dynload/math.cpython-310-darwin.so\n\n\n\n\n\n\n\nIn the iPython shell (or the Jupyter Notebook/Jupyter Lab environment), you can also access the help() command using ?.\n\nmath\n\n<module 'math' from '/Users/kellycaylor/mambaforge/envs/eds217_2023/lib/python3.10/lib-dynload/math.cpython-310-darwin.so'>\n\n\n\n\n\nIn the iPython shell (or the Jupyter Notebook/Jupyter Lab environment) you can use ?? to see the actual source code of python code\n\n\n\n?? only shows source code for for python functions that aren’t compiled to C code. Otherwise, it will show the same information as ?\n\n\n\n\n\n\n\n\n\n\nThe print command is the most commonly used tool for beginners to understand errors\n# This code generates a `TypeError` that \n# x is not the right kind of variable.\ndo_something(x) \nThe print command is the most commonly used debugging tool for beginners.\n\n\n\nPython has a string format called f-strings. These are strings that are prefixed with an f character and allow in-line variable substitution.\n\n# print using c-style format statements\nx = 3.45\nprint(f\"x = {x}\")\n\nx = 3.45\n\n\n\ndef do_something(x):\n x = x / 2 \n return x\n\n# This code generates a `TypeError` that \n# x is not the right kind of variable.\nx = 'f'\n# Check and see what is X?\nprint(\n f\"calling do_something() with x={x}\" # Python f-string\n)\n\ndo_something(x) \n\ncalling do_something() with x=f\n\n\n\n---------------------------------------------------------------------------\nTypeError Traceback (most recent call last)\nCell In[13], line 13\n 8 # Check and see what is X?\n 9 print(\n 10 f\"calling do_something() with x={x}\" # Python f-string\n 11 )\n---> 13 do_something(x)\n\nCell In[13], line 2, in do_something(x)\n 1 def do_something(x):\n----> 2 x = x / 2 \n 3 return x\n\nTypeError: unsupported operand type(s) for /: 'str' and 'int'\n\n\n\n\n\n\nAs of Fall 2002: - O’Rielly Books (Requires UCSB login) - My O’Rielly pdf library: https://bit.ly/eds-217-books (Requires UCSB login)\nAs of Fall, 2022: - Python Docs - Stack Overflow - Talk Python - Ask Python\nAs of Fall, 2023:\nLLMs.\n\nChatGPT - Need $ for GPT-4, 3.X fine debugger, but not always a great programmer.\nGitHub CoPilot - Should be able to get a free student account. Works great in VSCode; we will set this up together later in the course." }, { - "objectID": "course-materials/day7.html#class-materials", - "href": "course-materials/day7.html#class-materials", - "title": "Data Handling and Visualization, Day 2", - "section": "Class materials", - "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 7 / morning\n📊 Data visualization (Part I)\n📊 Data visualization (Part II)\n\n\nday 7 / afternoon\n🙌 Coding Colab: Exploring data through visualizations" + "objectID": "course-materials/lectures/02_helpGPT.html#how-to-move-from-a-beginner-to-a-more-advanced-python-user", + "href": "course-materials/lectures/02_helpGPT.html#how-to-move-from-a-beginner-to-a-more-advanced-python-user", + "title": "Getting Help", + "section": "How to move from a beginner to a more `advanced`` python user", + "text": "How to move from a beginner to a more `advanced`` python user\nTaken from Talk Python to Me, Episode #427, with some modifications." }, { - "objectID": "course-materials/day7.html#end-of-day-practice", - "href": "course-materials/day7.html#end-of-day-practice", - "title": "Data Handling and Visualization, Day 2", - "section": "End-of-day practice", - "text": "End-of-day practice\nComplete the following tasks / activities before heading home for the day!\n\n Day 7 Practice: 🌲 USDA Plant Hardiness Zones 🌴" + "objectID": "course-materials/lectures/02_helpGPT.html#know-your-goals", + "href": "course-materials/lectures/02_helpGPT.html#know-your-goals", + "title": "Getting Help", + "section": "Know your goals", + "text": "Know your goals\n\nWhy are you learning python?\nWhy are you learning data science?" }, { - "objectID": "course-materials/day7.html#additional-resources", - "href": "course-materials/day7.html#additional-resources", - "title": "Data Handling and Visualization, Day 2", - "section": "Additional Resources", - "text": "Additional Resources" + "objectID": "course-materials/lectures/02_helpGPT.html#have-a-project-in-mind", + "href": "course-materials/lectures/02_helpGPT.html#have-a-project-in-mind", + "title": "Getting Help", + "section": "Have a project in mind", + "text": "Have a project in mind\n\nWhat do you want to do with it?\nUse python to solve a problem you are interested in solving.\nDon’t be afraid to work on personal projects.\n\n\nSome examples of my personal “problem-solving” projects\nBiobib - Python code to make my CV/Biobib from a google sheets/.csv file.\nTriumph - Python notebooks for a 1959 Triumph TR3A EV conversion project.\nStoplight - A simple python webapp for monitoring EDS217 course pace." }, { - "objectID": "course-materials/day9.html#class-materials", - "href": "course-materials/day9.html#class-materials", - "title": "Building a Python Data Science Workflow", - "section": "Class materials", - "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 9 / morning\nWorking on Final Data Science Project (all morning)\n\n\n\nday 9 / afternoon\nData Science Project Presentations (all afternoon)" + "objectID": "course-materials/lectures/02_helpGPT.html#dont-limit-your-learning-to-whats-needed-for-your-project", + "href": "course-materials/lectures/02_helpGPT.html#dont-limit-your-learning-to-whats-needed-for-your-project", + "title": "Getting Help", + "section": "Don’t limit your learning to what’s needed for your project", + "text": "Don’t limit your learning to what’s needed for your project\n\nLearn more than you need to know…\n…but try to use less than you think you need" }, { - "objectID": "course-materials/day9.html#end-of-day-practice", - "href": "course-materials/day9.html#end-of-day-practice", - "title": "Building a Python Data Science Workflow", - "section": "End-of-day practice", - "text": "End-of-day practice\nEnd of Class! Congratulations!!" + "objectID": "course-materials/lectures/02_helpGPT.html#read-good-code", + "href": "course-materials/lectures/02_helpGPT.html#read-good-code", + "title": "Getting Help", + "section": "Read good code", + "text": "Read good code\n\nLibraries and packages have great examples of code.\nRead the code (not just docs) of the packages you use.\nGithub is a great place to find code." }, { - "objectID": "course-materials/day9.html#additional-resources", - "href": "course-materials/day9.html#additional-resources", - "title": "Building a Python Data Science Workflow", - "section": "Additional Resources", - "text": "Additional Resources" + "objectID": "course-materials/lectures/02_helpGPT.html#know-your-tools", + "href": "course-materials/lectures/02_helpGPT.html#know-your-tools", + "title": "Getting Help", + "section": "Know your tools", + "text": "Know your tools\n\nLearn how to use your IDE (VSCode)\nLearn how to use your package manager (conda)\nLearn how to use your shell (bash)\nLearn how to use your version control system (git)\n\n\nLearn how to test your code\n\nTesting is part of programming.\nTesting is a great way to learn.\nFocus on end-to-end (E2E) tests (rather than unit tests)\n\nUnit tests:\nDoes it work the way you expect it to (operation-centric)?\nEnd-to-end test:\nDoes it do what you want it to do (output-centric)?" }, { - "objectID": "course-materials/interactive-sessions/interactive-session-git.html", - "href": "course-materials/interactive-sessions/interactive-session-git.html", - "title": "Sidebar", - "section": "", - "text": "Welcome to git (xkcd)" + "objectID": "course-materials/lectures/02_helpGPT.html#know-whats-good-enough-for-any-given-project", + "href": "course-materials/lectures/02_helpGPT.html#know-whats-good-enough-for-any-given-project", + "title": "Getting Help", + "section": "Know what’s good enough for any given project", + "text": "Know what’s good enough for any given project\n\nYou’re not writing code for a self-driving car or a pacemaker.\n\nDon’t over-engineer your code.\nDon’t over-optimize your code.\nSimple is better than complex." }, { - "objectID": "course-materials/interactive-sessions/interactive-session-git.html#setting-up-git-for-collaborating-with-notebooks", - "href": "course-materials/interactive-sessions/interactive-session-git.html#setting-up-git-for-collaborating-with-notebooks", - "title": "Sidebar", - "section": "1. Setting up git for collaborating with Notebooks", - "text": "1. Setting up git for collaborating with Notebooks\n\nCleaning Up Jupyter Notebook Files Before Committing to Git\nIn data science workflows, particularly when collaborating using Jupyter Notebooks, it’s important to maintain a clean and efficient Git repository. This guide will help you set up your environment to automatically remove outputs from .ipynb files before committing them, which improves collaboration and reduces repository size.\n\n\nWhy Clean Up .ipynb Files?\n\nReduced Size: Outputs can bloat file sizes, making repositories larger and slower to clone.\nFewer Conflicts: Output cells can cause merge conflicts when multiple people edit the same file.\nEncouraged Reproducibility: Keeping notebooks free of outputs encourages others to run the notebooks themselves.\n\n\n\nStep-by-Step Setup\n\nStep 1: Check if jq is Installed\n\nOpen Terminal: Access your terminal application.\nCheck for jq: Run the following command to see if jq is installed and check its version:\njq --version\nVerify Version: Ensure the output is jq-1.5 or higher. If jq is installed and the version is at least 1.5, you can proceed to the next steps. If not, see the installation note below.\n\n\n\nStep 2: Configure Git to Use a Global Attributes File\n\nOpen ~/.gitconfig: Use nano to edit this file:\nnano ~/.gitconfig\nAdd the Configuration: Copy and paste the following lines:\n[core]\nattributesfile = ~/.gitattributes_global\n\n[filter \"nbstrip_full\"]\nclean = \"jq --indent 1 \\\n '(.cells[] | select(has(\\\"outputs\\\")) | .outputs) = [] \\\n | (.cells[] | select(has(\\\"execution_count\\\")) | .execution_count) = null \\\n | .metadata = {\\\"language_info\\\": {\\\"name\\\": \\\"python\\\", \\\"pygments_lexer\\\": \\\"ipython3\\\"}} \\\n | .cells[].metadata = {} \\\n '\"\nsmudge = cat\nrequired = true\nSave and Exit: Press CTRL + X, then Y, and Enter to save the file.\n\n\n\nStep 3: Create a Global Git Attributes File\n\nOpen ~/.gitattributes_global: Use nano to edit this file:\nnano ~/.gitattributes_global\nAdd the Following Line:\n*.ipynb filter=nbstrip_full\nSave and Exit: Press CTRL + X, then Y, and Enter.\n\n\n\n\nHow This Works\n\nfilter \"nbstrip_full\": This filter uses the jq command to strip outputs and reset execution counts in .ipynb files.\nclean: Removes outputs when files are staged for commit.\nsmudge: Ensures the original content is restored upon checkout.\nrequired: Enforces the use of the filter for the specified files.\n\n\n\nBenefits for Python Environmental Data Science Workflows\n\nEfficiency: Smaller files mean faster repository operations.\nCollaboration: Fewer conflicts facilitate teamwork.\nReproducibility: Encourages consistent execution across environments." + "objectID": "course-materials/lectures/02_helpGPT.html#embrace-refactoring", + "href": "course-materials/lectures/02_helpGPT.html#embrace-refactoring", + "title": "Getting Help", + "section": "Embrace refactoring", + "text": "Embrace refactoring\nRefactoring is the process of changing your code without changing its behavior.\n\nShip of Theseus: If you replace every part of a ship, is it still the same ship?\n\n\nAs you learn more, you will find better ways to do things.\nDon’t be afraid to change your code.\nTests (especially end-to-end tests) help you refactor with confidence.\n“Code smells”… if it smells bad, it probably is bad.\n\nCode Smells\nComments can be a code smell; they can be a sign that your code is not clear enough." }, { - "objectID": "course-materials/interactive-sessions/interactive-session-git.html#optional-installing-jq", - "href": "course-materials/interactive-sessions/interactive-session-git.html#optional-installing-jq", - "title": "Sidebar", - "section": "Optional: Installing jq", - "text": "Optional: Installing jq\nIf jq is not installed or needs to be updated, follow these instructions for your operating system.\n\nWindows\n\nDownload jq:\n\nVisit the jq downloads page and download the Windows executable (jq-win64.exe).\n\nAdd to PATH:\n\nMove the jq-win64.exe to a directory included in your system’s PATH or rename it to jq.exe and place it in C:\\Windows\\System32.\n\nVerify Installation:\n\nOpen Command Prompt and run jq --version to ensure it’s correctly installed.\n\n\n\n\nmacOS\n\nUsing Homebrew:\n\nHomebrew is a package manager for macOS that simplifies the installation of software. It’s widely used for installing command-line tools and other utilities. If you don’t have Homebrew installed, you can follow the instructions on the Homebrew website.\nOnce Homebrew is installed, open Terminal and run the following command to install jq:\nbrew install jq\n\nVerify Installation:\n\nRun jq --version to confirm it is installed and at least version 1.5.\n\n\nBy following these steps, you ensure that your Jupyter Notebook files remain clean and efficient within your Git repositories, enhancing collaboration and reproducibility in your workflows.\n\n\n\nAdditional Note\nFor Linux users, you can typically install jq using your package manager, such as apt on Debian-based systems or yum on Red Hat-based systems:\n# Debian-based systems\nsudo apt-get install jq\n\n# Red Hat-based systems\nsudo yum install jq\n\n\nEnd interactive session 2A" + "objectID": "course-materials/lectures/02_helpGPT.html#write-things-down", + "href": "course-materials/lectures/02_helpGPT.html#write-things-down", + "title": "Getting Help", + "section": "Write things down", + "text": "Write things down\n\nKeep an ideas notebook\n\nWrite down ideas for projects\nWrite down ideas for code\n\n\n\nWrite comments to yourself and others\n\n\nWrite documentation\n\n\nWrite down questions (use your tools; github issues, etc.)" }, { - "objectID": "course-materials/live-coding/2d_list_comprehensions.html", - "href": "course-materials/live-coding/2d_list_comprehensions.html", - "title": "Live Coding Session 2D", - "section": "", - "text": "In this session, we will be exploring List and Dictionary comprehensions together. Live coding is a great way to learn programming as it allows you to see the process of writing code in real-time, including how to deal with unexpected issues and debug errors." + "objectID": "course-materials/lectures/02_helpGPT.html#go-meet-people", + "href": "course-materials/lectures/02_helpGPT.html#go-meet-people", + "title": "Getting Help", + "section": "Go meet people!", + "text": "Go meet people!\n\nThe Python (and Data Science) community is great!\n\nGo to Python & Data Science meetups.\n\nCentral Coast Python\n\n\n\nGo to python and data science conferences.\n\nPyCon 2024 & 2025 will be in Pittsburgh, PA\nPyData (Conferences all over the world)\n\n\n\nGo to hackathons.\n\nSB Hacks (UCSB)\nMLH (Major League Hacking)\nHackathon.com (Hackathons all over the world)" }, { - "objectID": "course-materials/live-coding/2d_list_comprehensions.html#overview", - "href": "course-materials/live-coding/2d_list_comprehensions.html#overview", - "title": "Live Coding Session 2D", + "objectID": "course-materials/lectures/04-next_steps.html", + "href": "course-materials/lectures/04-next_steps.html", + "title": "How to move from a beginner to a more advanced python user", "section": "", - "text": "In this session, we will be exploring List and Dictionary comprehensions together. Live coding is a great way to learn programming as it allows you to see the process of writing code in real-time, including how to deal with unexpected issues and debug errors." - }, - { - "objectID": "course-materials/live-coding/2d_list_comprehensions.html#objectives", - "href": "course-materials/live-coding/2d_list_comprehensions.html#objectives", - "title": "Live Coding Session 2D", - "section": "Objectives", - "text": "Objectives\n\nUnderstand the fundamentals of comprehensions in Python.\nApply comprehensions in practical examples.\nDevelop the ability to troubleshoot and debug in a live setting." + "text": "Taken from Talk Python to Me, Episode #427, with some modifications." }, { - "objectID": "course-materials/live-coding/2d_list_comprehensions.html#overview-1", - "href": "course-materials/live-coding/2d_list_comprehensions.html#overview-1", - "title": "Live Coding Session 2D", - "section": "Overview", - "text": "Overview\nThis session introduces list and dictionary comprehensions, providing a comparison to traditional control flow methods. The goal is to help students understand the advantages of using comprehensions in Python and to practice writing their own.\nThe session is designed to be completed in 45 minutes, including setting up the notebook." + "objectID": "course-materials/lectures/04-next_steps.html#know-your-goals", + "href": "course-materials/lectures/04-next_steps.html#know-your-goals", + "title": "How to move from a beginner to a more advanced python user", + "section": "1. Know your goals", + "text": "1. Know your goals\n\nWhy are you learning python?\nWhy are you learning data science?" }, { - "objectID": "course-materials/live-coding/2d_list_comprehensions.html#setting-up-your-notebook-5-minutes", - "href": "course-materials/live-coding/2d_list_comprehensions.html#setting-up-your-notebook-5-minutes", - "title": "Live Coding Session 2D", - "section": "1. Setting Up Your Notebook (5 minutes)", - "text": "1. Setting Up Your Notebook (5 minutes)\nGoal: Start by having students set up their Jupyter notebook with markdown headers. This helps organize the session into distinct sections, making it easier for them to follow along and refer back to their work later.\n\nInstructions:\n\nCreate a new Jupyter notebook or open an existing one for this session.\nAdd markdown cells with the following headers, using ## for each header.\nPlace code cells between the headers where you’ll write and execute your code.\n\n\n\nHeader Texts:\n\nFirst markdown cell:\n## Review: Traditional Control Flow Approaches\nSecond markdown cell:\n## Introduction to List Comprehensions\nThird markdown cell:\n## Introduction to Dictionary Comprehensions\nFourth markdown cell:\n## Using Conditional Logic in Comprehensions\nFifth markdown cell:\n## Summary and Best Practices\nSixth markdown cell:\n## Reflections" + "objectID": "course-materials/lectures/04-next_steps.html#have-a-project-in-mind", + "href": "course-materials/lectures/04-next_steps.html#have-a-project-in-mind", + "title": "How to move from a beginner to a more advanced python user", + "section": "2. Have a project in mind", + "text": "2. Have a project in mind\n\nWhat do you want to do with it?\nUse python to solve a problem you are interested in solving.\nDon’t be afraid to work on personal projects.\n\n\nSome examples of my personal “problem-solving” projects\nBiobib - Python code to make my CV/Biobib from a google sheets/.csv file.\nTriumph - Python notebooks for a 1959 Triumph TR3A EV conversion project.\nStoplight - A simple python webapp for monitoring EDS217 course pace." }, { - "objectID": "course-materials/live-coding/2d_list_comprehensions.html#session-format", - "href": "course-materials/live-coding/2d_list_comprehensions.html#session-format", - "title": "Live Coding Session 2D", - "section": "Session Format", - "text": "Session Format\n\nIntroduction\n\nBrief discussion about the topic and its importance in data science.\n\n\n\nDemonstration\n\nI will demonstrate code examples live. Follow along and write the code into your own Jupyter notebook.\n\n\n\nPractice\n\nYou will have the opportunity to try exercises on your own to apply what you’ve learned.\n\n\n\nQ&A\n\nWe will have a Q&A session at the end where you can ask specific questions about the code, concepts, or issues encountered during the session." + "objectID": "course-materials/lectures/04-next_steps.html#dont-limit-your-learning-to-whats-needed-for-your-project", + "href": "course-materials/lectures/04-next_steps.html#dont-limit-your-learning-to-whats-needed-for-your-project", + "title": "How to move from a beginner to a more advanced python user", + "section": "3. Don’t limit your learning to what’s needed for your project", + "text": "3. Don’t limit your learning to what’s needed for your project\n\nLearn more than you need to know…\nMath: 3Blue1Brown\nPython Data Science: PyData\nData Visualization: Edward Tufte, Cole Nussbaumer-Knaflic, David McCandless\nBe curious about what’s possible, not just what’s necessary.\n…but try to use less than you think you need" }, { - "objectID": "course-materials/live-coding/2d_list_comprehensions.html#after-the-session", - "href": "course-materials/live-coding/2d_list_comprehensions.html#after-the-session", - "title": "Live Coding Session 2D", - "section": "After the Session", - "text": "After the Session\n\nReview your notes and try to replicate the exercises on your own.\nExperiment with the code by modifying parameters or adding new features to deepen your understanding.\nCheck out our class comprehensions cheatsheet." + "objectID": "course-materials/lectures/04-next_steps.html#read-good-code", + "href": "course-materials/lectures/04-next_steps.html#read-good-code", + "title": "How to move from a beginner to a more advanced python user", + "section": "4. Read good code", + "text": "4. Read good code\n\nLibraries and packages have great examples of code!\nRead the code (not just docs) of the packages you use.\n\nIt’s okay if you can’t understand it all. Often you can understand intent, but not what the code does. How would you have done it? Why did the author select a different approach?\n\nGithub is a great place to find code." }, { - "objectID": "course-materials/live-coding/4b_exploring_dataframes.html#overview", - "href": "course-materials/live-coding/4b_exploring_dataframes.html#overview", - "title": "Live Coding Session 4B", - "section": "Overview", - "text": "Overview\nIn this 45-minute session, we will explore the basics of pandas DataFrames - a fundamental data structure for data manipulation and analysis in Python. We’ll focus on essential operations that form the foundation of working with DataFrames." + "objectID": "course-materials/lectures/04-next_steps.html#know-your-tools", + "href": "course-materials/lectures/04-next_steps.html#know-your-tools", + "title": "How to move from a beginner to a more advanced python user", + "section": "5. Know your tools", + "text": "5. Know your tools\n\nLearn how to use your IDE (VSCode)\nLearn how to use your package manager (conda, mamba)\nLearn how to use your shell (bash, powershell, WSL)\nLearn how to use your version control system (git, Github Desktop)" }, { - "objectID": "course-materials/live-coding/4b_exploring_dataframes.html#objectives", - "href": "course-materials/live-coding/4b_exploring_dataframes.html#objectives", - "title": "Live Coding Session 4B", - "section": "Objectives", - "text": "Objectives\n\nUnderstand the structure and basic properties of pandas DataFrames.\nLearn how to create and load DataFrames.\nApply methods for data selection and filtering.\nPerform basic data manipulation and analysis using DataFrames." + "objectID": "course-materials/lectures/04-next_steps.html#learn-how-to-test-your-code", + "href": "course-materials/lectures/04-next_steps.html#learn-how-to-test-your-code", + "title": "How to move from a beginner to a more advanced python user", + "section": "6. Learn how to test your code", + "text": "6. Learn how to test your code\n\nTesting code is part of writing code, and testing is a great way to learn!\nFocus on end-to-end (E2E) tests (rather than unit tests)\n\nUnit tests:\nDoes it work the way you expect it to (operation-centric)?\nEnd-to-end test:\nDoes it do what you want it to do (output-centric)?\n\n\nTesting for data science\nTesting with PyTest for data science" }, { - "objectID": "course-materials/live-coding/4b_exploring_dataframes.html#getting-started-5-minutes", - "href": "course-materials/live-coding/4b_exploring_dataframes.html#getting-started-5-minutes", - "title": "Live Coding Session 4B", - "section": "Getting Started (5 minutes)", - "text": "Getting Started (5 minutes)\n\nPrepare Your Environment:\n\nOpen JupyterLab and create a new notebook named “pandas_dataframes_intro”.\nDownload the sample dataset from here.\n\nParticipation:\n\nCode along with me during the session.\nAsk questions as we go - if you’re wondering about something, others probably are too!" + "objectID": "course-materials/lectures/04-next_steps.html#know-whats-good-enough-for-any-given-project", + "href": "course-materials/lectures/04-next_steps.html#know-whats-good-enough-for-any-given-project", + "title": "How to move from a beginner to a more advanced python user", + "section": "7. Know what’s good enough for any given project", + "text": "7. Know what’s good enough for any given project\n\nYou’re not writing code for a self-driving car or a pacemaker.\n\nDon’t over-engineer your code.\nDon’t over-optimize your code.\nSimple is better than complex." }, { - "objectID": "course-materials/live-coding/4b_exploring_dataframes.html#session-outline", - "href": "course-materials/live-coding/4b_exploring_dataframes.html#session-outline", - "title": "Live Coding Session 4B", - "section": "Session Outline", - "text": "Session Outline\n\n1. Introduction to pandas DataFrames (5 minutes)\n\nWhat are DataFrames?\nImporting pandas and creating a simple DataFrame\n\n\n\n2. Loading and Exploring Data (10 minutes)\n\nReading a CSV file into a DataFrame\nBasic DataFrame attributes and methods (shape, info, describe, head)\n\n\n\n3. Data Selection and Filtering (10 minutes)\n\nSelecting columns and rows\nBoolean indexing\n\n\n\n4. Basic Data Manipulation (10 minutes)\n\nAdding and removing columns\nHandling missing data\n\n\n\n5. Q&A and Wrap-up (5 minutes)\n\nAddress any questions\nRecap key points" + "objectID": "course-materials/lectures/04-next_steps.html#embrace-refactoring", + "href": "course-materials/lectures/04-next_steps.html#embrace-refactoring", + "title": "How to move from a beginner to a more advanced python user", + "section": "8. Embrace refactoring", + "text": "8. Embrace refactoring\nRefactoring is the process of changing your code without changing its behavior.\n\nShip of Theseus: If you replace every part of a ship, is it still the same ship?\n\n\nAs you learn more, you will find better ways to do things.\nDon’t be afraid to change your code.\nTests (especially end-to-end tests) help you refactor with confidence.\n“Code smells”… if it smells bad, it probably is bad.\n\nCode Smells\nComments can be a code smell; they can be a sign that your code is not clear enough." }, { - "objectID": "course-materials/live-coding/4b_exploring_dataframes.html#code-examples-well-cover", - "href": "course-materials/live-coding/4b_exploring_dataframes.html#code-examples-well-cover", - "title": "Live Coding Session 4B", - "section": "Code Examples We’ll Cover", - "text": "Code Examples We’ll Cover\nimport pandas as pd\n\n# Creating a DataFrame\ndf = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})\n\n# Loading data from CSV\ndf = pd.read_csv('sample_data.csv')\n\n# Basic exploration\nprint(df.shape)\ndf.info()\nprint(df.describe())\n\n# Selection and filtering\nselected_columns = df[['column1', 'column2']]\nfiltered_rows = df[df['column1'] > 5]\n\n# Data manipulation\ndf['new_column'] = df['column1'] * 2\ndf.dropna(inplace=True)" + "objectID": "course-materials/lectures/04-next_steps.html#write-things-down", + "href": "course-materials/lectures/04-next_steps.html#write-things-down", + "title": "How to move from a beginner to a more advanced python user", + "section": "9. Write things down", + "text": "9. Write things down\n\nKeep an ideas notebook\n\nWrite down ideas for projects\nWrite down ideas for code\n\n\n\nWrite comments to yourself and others\n\n\nWrite documentation\n\nCode Documentation in Python\n\n\n\nWrite down questions (use your tools; github issues, etc…)" }, { - "objectID": "course-materials/live-coding/4b_exploring_dataframes.html#after-the-session", - "href": "course-materials/live-coding/4b_exploring_dataframes.html#after-the-session", - "title": "Live Coding Session 4B", - "section": "After the Session", - "text": "After the Session\n\nReview your notes and try to replicate the exercises on your own.\nExperiment with the code using your own datasets.\nCheck out our class DataFrame cheatsheet for quick reference.\nFor more advanced features, explore the official pandas documentation." + "objectID": "course-materials/lectures/04-next_steps.html#go-meet-people", + "href": "course-materials/lectures/04-next_steps.html#go-meet-people", + "title": "How to move from a beginner to a more advanced python user", + "section": "10. Go meet people!", + "text": "10. Go meet people!\n\nThe Python (and Data Science) community is great!\n\nGo to Python & Data Science meetups.\n\nCentral Coast Python\n\n\n\nGo to python and data science conferences.\n\nPyCon 2024 & 2025 will be in Pittsburgh, PA\nPyData (Conferences all over the world)\n\n\n\nGo to hackathons.\n\nSB Hacks (UCSB)\nMLH (Major League Hacking)\nHackathon.com (Hackathons all over the world)" }, { - "objectID": "index.html#course-description", - "href": "index.html#course-description", - "title": "Python for Environmental Data Science", - "section": "Course Description", - "text": "Course Description\nProgramming skills are critical when working with, understanding, analyzing, and gleaning insights from environmental data. In the intensive EDS 217 course, students will develop fundamental skills in Python programming, data manipulation, and data visualization, specifically tailored for environmental data science applications.\nThe goal of EDS 217 (Python for Environmental Data Science) is to equip incoming MEDS students with the programming methods, skills, notation, and language commonly used in the python data science stack, which will be essential for their python-based data science courses and projects in the program as well as in their data science careers. By the end of the course, students should be able to:\n\nManipulate and analyze data using libraries like pandas and NumPy\nVisualize data using Matplotlib and Seaborn\nWrite, interpret, and debug Python scripts\nImplement basic algorithms for data processing\nUtilize logical operations, control flow, and functions in programming\nCollaborate with peers to solve group programming tasks, and communicate the process and results to the rest of the class" + "objectID": "course-materials/lectures/lectures.html", + "href": "course-materials/lectures/lectures.html", + "title": "EDS 217 Lectures", + "section": "", + "text": "This page contains links to lecture materials for EDS 217.\n\nIntroduction to Python Data Science\n\n\nThe Zen of Python\n\n\nDebugging" }, { - "objectID": "index.html#teaching-team", - "href": "index.html#teaching-team", - "title": "Python for Environmental Data Science", - "section": "Teaching Team", - "text": "Teaching Team\n\n\n\n\nInstructor\n\n\n\n\n\n\n\nKelly Caylor\nEmail: caylor@ucsb.edu\nLearn more: Bren profile\n\n\n\n\nTA\n\n\n\n\n\n\n\nAnna Boser\nEmail: anaboser@ucsb.edu\nLearn more: Bren profile" + "objectID": "course-materials/cheatsheets/data_aggregation.html", + "href": "course-materials/cheatsheets/data_aggregation.html", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "To be added" }, { - "objectID": "course-materials/final_project.html", - "href": "course-materials/final_project.html", - "title": "Final Activity", + "objectID": "course-materials/cheatsheets/data_merging.html", + "href": "course-materials/cheatsheets/data_merging.html", + "title": "EDS 217 Cheatsheet", "section": "", - "text": "In this final class activity, you will work in small groups (2-3) to develop a example data science workflow.\n\nImport Data\nExplore Data\nClean Data\nFilter Data\nSort Data\nTransform Data\nGroup Data\nAggregate Data\nVisualize Data" + "text": "To be added" }, { - "objectID": "course-materials/final_project.html#diy-python-data-scienceworkflow", - "href": "course-materials/final_project.html#diy-python-data-scienceworkflow", - "title": "Final Activity", + "objectID": "course-materials/cheatsheets/lists.html", + "href": "course-materials/cheatsheets/lists.html", + "title": "EDS 217 Cheatsheet", "section": "", - "text": "In this final class activity, you will work in small groups (2-3) to develop a example data science workflow.\n\nImport Data\nExplore Data\nClean Data\nFilter Data\nSort Data\nTransform Data\nGroup Data\nAggregate Data\nVisualize Data" + "text": "my_list = []\n\n\n\nmy_list = [1, 2, 3, 4, 5]\n\n\n\nmixed_list = [1, \"hello\", 3.14, True]\n\n\n\nnested_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]\n\n\n\n\n\n\nmy_list = [10, 20, 30, 40]\nprint(my_list[0]) # Output: 10\nprint(my_list[2]) # Output: 30\n\n\n\nprint(my_list[-1]) # Output: 40\n\n\n\nsublist = my_list[1:3] # Output: [20, 30]\n\n\n\n\n\n\nmy_list[1] = 25 # my_list becomes [10, 25, 30, 40]\n\n\n\nmy_list.append(50) # my_list becomes [10, 25, 30, 40, 50]\n\n\n\nmy_list.insert(1, 15) # my_list becomes [10, 15, 25, 30, 40, 50]\n\n\n\nmy_list.extend([60, 70]) # my_list becomes [10, 15, 25, 30, 40, 50, 60, 70]\n\n\n\n\n\n\nmy_list.remove(25) # my_list becomes [10, 15, 30, 40, 50, 60, 70]\n\n\n\ndel my_list[0] # my_list becomes [15, 30, 40, 50, 60, 70]\n\n\n\nlast_element = my_list.pop() # my_list becomes [15, 30, 40, 50, 60]\n\n\n\nelement = my_list.pop(2) # my_list becomes [15, 30, 50, 60]\n\n\n\n\n\n\nlength = len(my_list) # Output: 4\n\n\n\nis_in_list = 30 in my_list # Output: True\n\n\n\ncombined_list = my_list + [80, 90] # Output: [15, 30, 50, 60, 80, 90]\n\n\n\nrepeated_list = [1, 2, 3] * 3 # Output: [1, 2, 3, 1, 2, 3, 1, 2, 3]\n\n\n\n\n\n\nfor item in my_list:\n print(item)\n\n\n\nfor index, value in enumerate(my_list):\n print(f\"Index {index} has value {value}\")\n\n\n\n\n\n\nsquares = [x**2 for x in range(5)] # Output: [0, 1, 4, 9, 16]\n\n\n\nevens = [x for x in range(10) if x % 2 == 0] # Output: [0, 2, 4, 6, 8]\n\n\n\n\n\n\nmy_list.sort() # Sorts in place\n\n\n\nsorted_list = sorted(my_list) # Returns a sorted copy\n\n\n\nmy_list.reverse() # Reverses in place\n\n\n\ncount = my_list.count(30) # Output: 1\n\n\n\nindex = my_list.index(50) # Output: 2\n\n\n\n\n\n\n# Incorrect\nfor item in my_list:\n if item < 20:\n my_list.remove(item)\n\n# Correct (Using a copy)\nfor item in my_list[:]:\n if item < 20:\n my_list.remove(item)" }, { - "objectID": "course-materials/final_project.html#what-to-do", - "href": "course-materials/final_project.html#what-to-do", - "title": "Final Activity", - "section": "What to do", - "text": "What to do\nTo conduct this exercise, you should find a suitable dataset; it doesn’t need to be environmental data per se - be creative in your search! You should also focus on making a number of exploratory and analysis visualizations using seaborn. You should avoid planning any analysis that absolutely require mapping and focus on using only pandas, numpy, matplotlib, and seaborn libraries.\nYour final product will be a self-contained notebook that is well-documented with markdown and code comments that you will walk through as a presentation to the rest of the class on the final day.\nYour notebook should include each of the nine steps, even if you don’t need to do much in each of them.\n\n\n\n\n\n\nNote\n\n\n\nYou can include visualizations as part of your data exploration (step 2), or anywhere else it is helpful.\n\n\nAdditional figures and graphics are also welcome - you are encouraged to make your notebooks as engaging and visually interesting as possible.\nHere are some links to potential data resources that you can use to develop your analyses:\n\nGeneral places to find fun data\n\nKaggle\nData is Plural\nUS Data.gov\nZenodo\nR for Data Science\n\n\n\nOddly specific datasets\n\nCentral Park Squirrel Survey\nHarry Potter Characters Dataset\nSpotify Tracks\nLego Dataset" + "objectID": "course-materials/cheatsheets/lists.html#creating-lists", + "href": "course-materials/cheatsheets/lists.html#creating-lists", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "my_list = []\n\n\n\nmy_list = [1, 2, 3, 4, 5]\n\n\n\nmixed_list = [1, \"hello\", 3.14, True]\n\n\n\nnested_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]" }, { - "objectID": "course-materials/final_project.html#using-google-drive-to-store-your-.csv-file.", - "href": "course-materials/final_project.html#using-google-drive-to-store-your-.csv-file.", - "title": "Final Activity", - "section": "Using Google Drive to store your .csv file.", - "text": "Using Google Drive to store your .csv file.\nOnce you’ve found a .csv file that you want to use, you should:\n\nSave your file to a google drive folder in your UCSB account.\nChange the sharing settings to allow anyone with a link to view your file.\nOpen the sharing dialog and copy the sharing link to your clipboard.\nUse the code below to download your file (you will need to add this code to the top of your notebook in the Import Data section)\n\n\n\n\n\n\n\nWarning\n\n\n\nFor this code to work on the workbench server, you will need to switch your kernel from 3.10.0 to 3.7.13. You can switch kernels by clicking on the kernel name in the upper right of your notebook.\n\n\n\n\nCode\nimport pandas as pd\nimport requests\n\ndef extract_file_id(url):\n \"\"\"Extract file id from Google Drive Sharing URL.\"\"\"\n return url.split(\"/\")[-2]\n\ndef df_from_gdrive_csv(url):\n \"\"\" Get the CSV file from a Google Drive Sharing URL.\"\"\"\n file_id = extract_file_id(url)\n URL = \"https://docs.google.com/uc?export=download\"\n session = requests.Session()\n response = session.get(URL, params={\"id\": file_id}, stream=True)\n return pd.read_csv(response.raw)\n\n# Example of how to use:\n# Note: your sharing link will be different, but should look like this:\nsharing_url = \"https://drive.google.com/file/d/1RlilHNG7BtvXT2Pm4OpgNvEjVJJZNaps/view?usp=share_link\"\ndf = df_from_gdrive_csv(sharing_url)\ndf.head()\n\n\n\n\n\n\n\n\n\ndate\nlocation\ntemperature\nsalinity\ndepth\n\n\n\n\n0\n2020-01-01\nPacific\n21.523585\nNaN\n200\n\n\n1\n2020-01-02\nPacific\n14.800079\n34.467264\n100\n\n\n2\n2020-01-03\nPacific\n23.752256\n35.016505\n100\n\n\n3\n2020-01-04\nPacific\n24.702824\n36.416944\n200\n\n\n4\n2020-01-05\nPacific\n10.244824\n35.807487\n1000" + "objectID": "course-materials/cheatsheets/lists.html#accessing-elements", + "href": "course-materials/cheatsheets/lists.html#accessing-elements", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "my_list = [10, 20, 30, 40]\nprint(my_list[0]) # Output: 10\nprint(my_list[2]) # Output: 30\n\n\n\nprint(my_list[-1]) # Output: 40\n\n\n\nsublist = my_list[1:3] # Output: [20, 30]" }, { - "objectID": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html", - "href": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html", - "title": "Interactive Session 1B", + "objectID": "course-materials/cheatsheets/lists.html#modifying-lists", + "href": "course-materials/cheatsheets/lists.html#modifying-lists", + "title": "EDS 217 Cheatsheet", "section": "", - "text": "Now that you’ve seen the REPL in iPython, from now on in this class you will code in Jupyter Notebooks. Jupyter is an incredibly awesome and user-friendly integrated development environment (IDE). An IDE provides a place for data scientists to see and work with a bunch of different aspects of their work in a nice, organized interface." + "text": "my_list[1] = 25 # my_list becomes [10, 25, 30, 40]\n\n\n\nmy_list.append(50) # my_list becomes [10, 25, 30, 40, 50]\n\n\n\nmy_list.insert(1, 15) # my_list becomes [10, 15, 25, 30, 40, 50]\n\n\n\nmy_list.extend([60, 70]) # my_list becomes [10, 15, 25, 30, 40, 50, 60, 70]" }, { - "objectID": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#meet-jupyterlab", - "href": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#meet-jupyterlab", - "title": "Interactive Session 1B", - "section": "1. Meet JupyterLab", - "text": "1. Meet JupyterLab\nJupyterLab provides a nice user interface for data science, development, reporting, and collaboration (all of which you’ll learn about throughout the MEDS program) in one place.\n\nFeatures of JupyterLab as an IDE\n\nInteractive Computing: JupyterLab is designed primarily for interactive computing and data analysis. It supports live code execution, data visualization, and interactive widgets, which are key features of modern IDEs.\nMulti-language Support: While originally developed for Python, JupyterLab supports many other programming languages through the use of kernels, making it versatile for various programming tasks.\nRich Text Editing: It provides a rich text editor for creating and editing Jupyter Notebooks, which can contain both code and narrative text (Markdown), allowing for documentation and code to coexist.\nCode Execution: JupyterLab allows you to execute code cells and see the output immediately, making it suitable for testing and iterating on code quickly.\nFile Management: It includes a file manager for browsing and managing project files, similar to the file explorers found in traditional IDEs.\nExtensions and Customization: JupyterLab supports numerous extensions that can enhance its capabilities, such as version control integration, terminal access, and enhanced visualizations.\nIntegrated Tools: It has an integrated terminal, variable inspector, and other tools that are typically part of an IDE, providing a comprehensive environment for development.\n\n\n\nDifferences from Traditional IDEs\n\nFocus on Notebooks: Unlike many traditional IDEs that focus on scripting and full-scale software development, JupyterLab emphasizes the use of notebooks for exploratory data analysis and research.\nNon-linear Workflow: JupyterLab allows for a non-linear workflow, where users can execute cells out of order and iteratively modify and test code." + "objectID": "course-materials/cheatsheets/lists.html#removing-elements", + "href": "course-materials/cheatsheets/lists.html#removing-elements", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "my_list.remove(25) # my_list becomes [10, 15, 30, 40, 50, 60, 70]\n\n\n\ndel my_list[0] # my_list becomes [15, 30, 40, 50, 60, 70]\n\n\n\nlast_element = my_list.pop() # my_list becomes [15, 30, 40, 50, 60]\n\n\n\nelement = my_list.pop(2) # my_list becomes [15, 30, 50, 60]" }, { - "objectID": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#jupyterlab-interface", - "href": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#jupyterlab-interface", - "title": "Interactive Session 1B", - "section": "JupyterLab Interface", - "text": "JupyterLab Interface\n\nPrimary panes include the Main Work Area pane, Sidebar, and Menu Bar.\n\n\n\nAs you work, Jupyer Lab will add additional tabs/panes that contain figures and data inspectors, or even other file types. You can rearrange these panes to organize your workspace however you like.\n\nYou can check out the JupyterLab User Guide for tons of information and helpful tips!\nJupyterLab is a powerful interactive development environment (IDE) that allows you to work with Jupyter Notebooks, text editors, terminals, and other components in a single, integrated environment. It’s widely used in data science, scientific computing, and education." + "objectID": "course-materials/cheatsheets/lists.html#list-operations", + "href": "course-materials/cheatsheets/lists.html#list-operations", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "length = len(my_list) # Output: 4\n\n\n\nis_in_list = 30 in my_list # Output: True\n\n\n\ncombined_list = my_list + [80, 90] # Output: [15, 30, 50, 60, 80, 90]\n\n\n\nrepeated_list = [1, 2, 3] * 3 # Output: [1, 2, 3, 1, 2, 3, 1, 2, 3]" }, { - "objectID": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#getting-started-with-jupyter-notebooks-in-jupyterlab", - "href": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#getting-started-with-jupyter-notebooks-in-jupyterlab", - "title": "Interactive Session 1B", - "section": "Getting Started with Jupyter Notebooks in JupyterLab", - "text": "Getting Started with Jupyter Notebooks in JupyterLab\n\nCreating a New Notebook\n\nOpen JupyterLab: Once JupyterLab is running, you’ll see the JupyterLab interface with the file browser on the left.\nCreate a New Notebook:\n\nClick on the + button in the file browser to open a new Launcher tab.\nUnder the “Notebook” section, click “Python 3” to create a new Python notebook.\n\nRename the Notebook:\n\nClick on the notebook title (usually “Untitled”) at the top of the notebook interface.\nEnter a new name for your notebook and click “Rename”.\n\n\n\n\nUnderstanding the Notebook Interface\nThe Jupyter Notebook interface is divided into cells. There are two main types of cells:\n\nCode Cells: For writing and executing Python code.\nMarkdown Cells: For writing formatted text using Markdown syntax.\n\n\n\nWriting and Running Code\nLet’s start by writing some simple Python code in a code cell.\n\nAdd a Code Cell:\n\nClick inside the cell and start typing your Python code.\n\n\n\n# Simple Python code\nprint(\"Hello, Jupyter!\")\n\nRun the Code Cell:\n\nClick the “Run” button in the toolbar or press Shift + Enter to execute the code.\nThe output will be displayed directly below the cell.\n\n\n\n\nWriting Markdown\nMarkdown cells allow you to write formatted text. You can use Markdown to create headings, lists, links, and more.\n\nAdd a Markdown Cell:\n\nClick on the “+” button in the toolbar to add a new cell.\nChange the cell type to “Markdown” from the dropdown menu in the toolbar.\n\nWrite Markdown Text:\n\n# My First Markdown Cell\n\nThis is a simple example of a Markdown cell in JupyterLab.\n\n## Features of Markdown\n\n- **Bold Text**: Use `**text**` for **bold**.\n- **Italic Text**: Use `*text*` for *italic*.\n- **Lists**: Create bullet points using `-` or `*`.\n- **Links**: [JupyterLab Documentation](https://jupyterlab.readthedocs.io/)\n\nRender the Markdown:\n\nClick the “Run” button or press Shift + Enter to render the Markdown text.\n\n\n\n\nCombining Code and Markdown\nJupyter Notebooks are powerful because they allow you to combine code and markdown in a single document. This is useful for creating interactive tutorials, reports, and data analyses.\n\n\nRendering Images\nJupyter Notebooks can render images directly in the output cells, which is particularly useful for data visualization.\n\nExample: Displaying an Image\n\n\nCode\nfrom IPython.display import Image, display\n\n# Display an image\nimg_path = 'https://jupyterlab.readthedocs.io/en/stable/_images/interface-jupyterlab.png'\ndisplay(Image(url=img_path, width=700))\n\n\n\n\n\n\n\n\nInteractive Features\nJupyter Notebooks support interactive features, such as widgets, which enhance the interactivity of your notebooks.\n\nExample: Using Interactive Widgets\nWidgets allow users to interact with your code and visualize results dynamically.\n\n\nCode\nimport ipywidgets as widgets\n\n# Create a simple slider widget\nslider = widgets.IntSlider(value=50, min=0, max=100, step=1, description='Value:')\ndisplay(slider)\n\n\n\n\n\n\n\n\nSaving and Exporting Notebooks\n\nSave the Notebook:\n\nClick the save icon in the toolbar or press Ctrl + S (Cmd + S on macOS) to save your work.\n\nExport the Notebook:\n\nJupyterLab allows you to export notebooks to various formats, such as PDF or HTML. Go to File > Export Notebook As and choose your desired format." + "objectID": "course-materials/cheatsheets/lists.html#looping-through-lists", + "href": "course-materials/cheatsheets/lists.html#looping-through-lists", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "for item in my_list:\n print(item)\n\n\n\nfor index, value in enumerate(my_list):\n print(f\"Index {index} has value {value}\")" }, { - "objectID": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#tips-for-using-jupyterlab", - "href": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#tips-for-using-jupyterlab", - "title": "Interactive Session 1B", - "section": "Tips for Using JupyterLab", - "text": "Tips for Using JupyterLab\n\nKeyboard Shortcuts: Familiarize yourself with keyboard shortcuts to speed up your workflow. You can view shortcuts by clicking Help > Keyboard Shortcuts. You can also refer to our class Jupyter Keyboard Shortcuts Cheatsheet\nUsing the File Browser: Drag and drop files into the file browser to upload them to your workspace.\nUsing the Variable Inspector: The variable inspector shows variable names, types, values/shapes, and counts (for collections). Open the Variable Inspector using Menu: View > Activate Command Palette, then type “variable inspector.” Or use the keyboard shortcut: Ctrl + Shift + I (Windows/Linux) or Cmd + Shift + I (Mac)" + "objectID": "course-materials/cheatsheets/lists.html#list-comprehensions", + "href": "course-materials/cheatsheets/lists.html#list-comprehensions", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "squares = [x**2 for x in range(5)] # Output: [0, 1, 4, 9, 16]\n\n\n\nevens = [x for x in range(10) if x % 2 == 0] # Output: [0, 2, 4, 6, 8]" }, { - "objectID": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#conclusion", - "href": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#conclusion", - "title": "Interactive Session 1B", - "section": "Conclusion", - "text": "Conclusion\nJupyterLab is a versatile tool that makes it easy to combine code, text, and visualizations in a single document. By mastering the basic functionality of Jupyter Notebooks, you can create powerful and interactive documents that enhance your data analysis and scientific computing tasks.\nFeel free to experiment with the code and markdown examples provided in this guide to deepen your understanding of JupyterLab. Happy coding!" + "objectID": "course-materials/cheatsheets/lists.html#list-methods", + "href": "course-materials/cheatsheets/lists.html#list-methods", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "my_list.sort() # Sorts in place\n\n\n\nsorted_list = sorted(my_list) # Returns a sorted copy\n\n\n\nmy_list.reverse() # Reverses in place\n\n\n\ncount = my_list.count(30) # Output: 1\n\n\n\nindex = my_list.index(50) # Output: 2" }, { - "objectID": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#resources", - "href": "course-materials/interactive-sessions/1b_Jupyter_Notebooks.html#resources", - "title": "Interactive Session 1B", - "section": "Resources", - "text": "Resources\nWe will get to know Jupyter Notebooks very well during the rest of this course, but here are even more resources you can use to learn and revisit:\n\nJupyter Notebook Gallery\nThere are many, many examples of textbooks, academic articles, journalism, analyses, and reports written in Jupyter Notebooks. Here is a link to a curated gallery containing many such examples. It’s worth exploring some of these just to get a sense of the diversity of applications and opportunities available using python and jupyter in data science!\n\n\nTutorials and Shortcourses\n\n1. Jupyter Notebook Documentation\n\nWebsite: Jupyter Documentation\nDescription: The official documentation for Jupyter Notebooks provides a comprehensive guide to installing, using, and customizing notebooks. It includes tutorials, tips, and examples to help you get started.\n\n\n\n2. Project Jupyter: Beginner’s Guide\n\nWebsite: Project Jupyter\nDescription: This page offers an interactive “Try Jupyter” experience, allowing you to run Jupyter Notebooks in the browser without installing anything locally. It is a great way to explore the basics of Jupyter in a hands-on manner.\n\n\n\n3. YouTube Tutorial Series by Corey Schafer\n\nVideo Playlist: Jupyter Notebooks Tutorial - Corey Schafer\nDescription: This YouTube series provides an in-depth introduction to Jupyter Notebooks. Corey Schafer covers installation, basic usage, and advanced features, making it easy to follow along and practice on your own.\n\n\n\n4. Jupyter Notebooks Beginner Guide - DataCamp\n\nWebsite: DataCamp Jupyter Notebook Tutorial\nDescription: This tutorial on DataCamp’s community blog offers a step-by-step guide to using Jupyter Notebooks for data science. It covers the basics and explores more advanced topics such as widgets and extensions.\n\n\n\n5. Real Python: Jupyter Notebook 101\n\nArticle: Jupyter Notebook 101\nDescription: This Real Python article introduces Jupyter Notebooks, covering installation, basic usage, and tips for using notebooks effectively. It is an excellent resource for Python developers who are new to Jupyter.\n\n\n\n6. Google Colab\n\nWebsite: Google Colab\nDescription: Google Colab is a free platform that lets you run Jupyter Notebooks in the cloud. You can find many tutorials and example notebooks on their site. For example, here is a link to a notebook they’ve created that includes many pandas snippets.\n\n\nEnd interactive session 1B" + "objectID": "course-materials/cheatsheets/lists.html#common-list-pitfalls", + "href": "course-materials/cheatsheets/lists.html#common-list-pitfalls", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "# Incorrect\nfor item in my_list:\n if item < 20:\n my_list.remove(item)\n\n# Correct (Using a copy)\nfor item in my_list[:]:\n if item < 20:\n my_list.remove(item)" }, { - "objectID": "course-materials/lectures/05_Session_1A.html#what-is-a-groupby-object", - "href": "course-materials/lectures/05_Session_1A.html#what-is-a-groupby-object", - "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", - "section": "What is a GroupBy Object?", - "text": "What is a GroupBy Object?\n\n\nCreated when you use the groupby() function in pandas\nA plan for splitting data into groups, not the result itself\nLazy evaluation: computations occur only when an aggregation method is called\nContains:\n\nReference to the original DataFrame\nColumns to group by\nInternal dictionary mapping group keys to row indices" + "objectID": "course-materials/cheatsheets/numpy.html", + "href": "course-materials/cheatsheets/numpy.html", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "import numpy as np\n\n\n\n\n\narr = np.array([1, 2, 3, 4, 5])\n\n\n\n# From Series\ns = pd.Series([1, 2, 3, 4, 5])\narr = s.to_numpy()\n\n# From DataFrame\ndf = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})\narr = df.to_numpy()\n\n\n\n\n\n\narr1 = np.array([1, 2, 3])\narr2 = np.array([4, 5, 6])\n\n# Addition\nresult = arr1 + arr2\n\n# Multiplication\nresult = arr1 * arr2\n\n# Division\nresult = arr1 / arr2\n\n\n\n# Square root\nsqrt_arr = np.sqrt(arr)\n\n# Exponential\nexp_arr = np.exp(arr)\n\n# Absolute value\nabs_arr = np.abs(arr)\n\n\n\n\n\n\n# Mean\nmean = np.mean(arr)\n\n# Median\nmedian = np.median(arr)\n\n# Standard deviation\nstd = np.std(arr)\n\n\n\n# Minimum\nmin_val = np.min(arr)\n\n# Maximum\nmax_val = np.max(arr)\n\n# Sum\ntotal = np.sum(arr)\n\n\n\n\n\n\narr = np.array([1, 2, 3, 4, 5, 6])\nreshaped = arr.reshape(2, 3)\n\n\n\ntransposed = arr.T\n\n\n\nflattened = arr.flatten()\n\n\n\n\n\n\n# Generate 5 random numbers between 0 and 1\nrandom_uniform = np.random.rand(5)\n\n# Generate 5 random integers between 1 and 10\nrandom_integers = np.random.randint(1, 11, 5)\n\n\n\nnp.random.seed(42) # For reproducibility\n\n\n\n\n\n\n# Check for NaN\nnp.isnan(arr)\n\n# Replace NaN with a value\nnp.nan_to_num(arr, nan=0.0)\n\n\n\n\n\n\n# Get unique values\nunique_values = np.unique(arr)\n\n# Get value counts (similar to pandas value_counts())\nvalues, counts = np.unique(arr, return_counts=True)\n\n\n\n# Similar to pandas' where, but returns an array\nresult = np.where(condition, x, y)\n\n\n\n# Concatenate arrays (similar to pd.concat())\nconcatenated = np.concatenate([arr1, arr2, arr3])\n\n\n\n\n\nPerformance: For large datasets, NumPy operations can be faster than pandas.\nMemory efficiency: NumPy arrays use less memory than pandas objects.\nSpecific mathematical operations: Some mathematical operations are more straightforward in NumPy.\nInterfacing with other libraries: Many scientific Python libraries use NumPy arrays.\n\nRemember, while these NumPy operations are useful, many have direct equivalents in pandas that work on Series and DataFrames. Always consider whether you can perform the operation directly in pandas before converting to NumPy arrays." }, { - "objectID": "course-materials/lectures/05_Session_1A.html#structure-of-a-groupby-object", - "href": "course-materials/lectures/05_Session_1A.html#structure-of-a-groupby-object", - "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", - "section": "Structure of a GroupBy Object", - "text": "Structure of a GroupBy Object\n\n\nInternal dictionary structure:\n{\n group_key_1: [row_index_1, row_index_3, ...],\n group_key_2: [row_index_2, row_index_4, ...],\n ...\n}\nThis structure allows for efficient data access and aggregation\nActual data isn’t copied or split until necessary" + "objectID": "course-materials/cheatsheets/numpy.html#importing-numpy", + "href": "course-materials/cheatsheets/numpy.html#importing-numpy", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "import numpy as np" }, { - "objectID": "course-materials/lectures/05_Session_1A.html#groupby-example", - "href": "course-materials/lectures/05_Session_1A.html#groupby-example", - "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", - "section": "GroupBy Example", - "text": "GroupBy Example\n\nimport pandas as pd\n\ndf = pd.DataFrame({\n 'Category': ['A', 'B', 'A', 'B', 'A', 'B'],\n 'Value': [1, 2, 3, 4, 5, 6]\n})\n\ngrouped = df.groupby('Category')\n# No computation yet!\n\nresult = grouped.sum() # Now we compute\nprint(result)\n\n Value\nCategory \nA 9\nB 12" + "objectID": "course-materials/cheatsheets/numpy.html#creating-numpy-arrays", + "href": "course-materials/cheatsheets/numpy.html#creating-numpy-arrays", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "arr = np.array([1, 2, 3, 4, 5])\n\n\n\n# From Series\ns = pd.Series([1, 2, 3, 4, 5])\narr = s.to_numpy()\n\n# From DataFrame\ndf = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})\narr = df.to_numpy()" }, { - "objectID": "course-materials/lectures/05_Session_1A.html#why-do-we-need-.copy", - "href": "course-materials/lectures/05_Session_1A.html#why-do-we-need-.copy", - "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", - "section": "Why Do We Need .copy()?", - "text": "Why Do We Need .copy()?\n\n\nMany pandas operations return views instead of copies\nViews are memory-efficient but can lead to unexpected modifications\n.copy() creates a new, independent object\nUse .copy() when you want to modify data without affecting the original" + "objectID": "course-materials/cheatsheets/numpy.html#basic-array-operations", + "href": "course-materials/cheatsheets/numpy.html#basic-array-operations", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "arr1 = np.array([1, 2, 3])\narr2 = np.array([4, 5, 6])\n\n# Addition\nresult = arr1 + arr2\n\n# Multiplication\nresult = arr1 * arr2\n\n# Division\nresult = arr1 / arr2\n\n\n\n# Square root\nsqrt_arr = np.sqrt(arr)\n\n# Exponential\nexp_arr = np.exp(arr)\n\n# Absolute value\nabs_arr = np.abs(arr)" }, { - "objectID": "course-materials/lectures/05_Session_1A.html#views-vs.-copies-in-pandas", - "href": "course-materials/lectures/05_Session_1A.html#views-vs.-copies-in-pandas", - "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", - "section": "Views vs. Copies in Pandas", - "text": "Views vs. Copies in Pandas\n\n\nFiltering operations usually create views:\n\ndf[df['column'] > value]\ndf.loc[condition]\n\nSome operations create copies by default:\n\ndf.drop(columns=['col'])\ndf.dropna()\ndf.reset_index()\n\nBut it’s not always clear which operations do what!" + "objectID": "course-materials/cheatsheets/numpy.html#statistical-operations", + "href": "course-materials/cheatsheets/numpy.html#statistical-operations", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "# Mean\nmean = np.mean(arr)\n\n# Median\nmedian = np.median(arr)\n\n# Standard deviation\nstd = np.std(arr)\n\n\n\n# Minimum\nmin_val = np.min(arr)\n\n# Maximum\nmax_val = np.max(arr)\n\n# Sum\ntotal = np.sum(arr)" }, { - "objectID": "course-materials/lectures/05_Session_1A.html#when-to-use-.copy", - "href": "course-materials/lectures/05_Session_1A.html#when-to-use-.copy", - "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", - "section": "When to Use .copy()", - "text": "When to Use .copy()\n\n\nWhen assigning a slice of a DataFrame to a new variable\nBefore making changes to a DataFrame you want to keep separate\nIn functions where you don’t want to modify the input data\nWhen chaining operations and you’re unsure about view vs. copy behavior\nTo ensure you have an independent copy, regardless of the operation" + "objectID": "course-materials/cheatsheets/numpy.html#array-manipulation", + "href": "course-materials/cheatsheets/numpy.html#array-manipulation", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "arr = np.array([1, 2, 3, 4, 5, 6])\nreshaped = arr.reshape(2, 3)\n\n\n\ntransposed = arr.T\n\n\n\nflattened = arr.flatten()" }, { - "objectID": "course-materials/lectures/05_Session_1A.html#copy-example", - "href": "course-materials/lectures/05_Session_1A.html#copy-example", - "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", - "section": ".copy() Example", - "text": ".copy() Example\n\n# Filtering creates a view\ndf_view = df[df['Category'] == 'A']\ndf_view['Value'] += 10 # This modifies the original df!\n\n# Using copy() creates an independent DataFrame\ndf_copy = df[df['Category'] == 'A'].copy()\ndf_copy['Value'] += 10 # This doesn't affect the original df\n\nprint(\"Original df:\")\nprint(df)\nprint(\"\\nModified copy:\")\nprint(df_copy)\n\nOriginal df:\n Category Value\n0 A 1\n1 B 2\n2 A 3\n3 B 4\n4 A 5\n5 B 6\n\nModified copy:\n Category Value\n0 A 11\n2 A 13\n4 A 15" + "objectID": "course-materials/cheatsheets/numpy.html#random-number-generation", + "href": "course-materials/cheatsheets/numpy.html#random-number-generation", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "# Generate 5 random numbers between 0 and 1\nrandom_uniform = np.random.rand(5)\n\n# Generate 5 random integers between 1 and 10\nrandom_integers = np.random.randint(1, 11, 5)\n\n\n\nnp.random.seed(42) # For reproducibility" }, { - "objectID": "course-materials/lectures/05_Session_1A.html#the-triple-constraint-dilemma", - "href": "course-materials/lectures/05_Session_1A.html#the-triple-constraint-dilemma", - "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", - "section": "The Triple Constraint Dilemma", - "text": "The Triple Constraint Dilemma\n\n\nIn software design, often you can only optimize two out of three:\n\nPerformance\nFlexibility\nEase of Use\n\nThis applies to data science tools like R and Python" + "objectID": "course-materials/cheatsheets/numpy.html#working-with-missing-data", + "href": "course-materials/cheatsheets/numpy.html#working-with-missing-data", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "# Check for NaN\nnp.isnan(arr)\n\n# Replace NaN with a value\nnp.nan_to_num(arr, nan=0.0)" }, { - "objectID": "course-materials/lectures/05_Session_1A.html#r-vs-python-trade-offs", - "href": "course-materials/lectures/05_Session_1A.html#r-vs-python-trade-offs", - "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", - "section": "R vs Python: Trade-offs", - "text": "R vs Python: Trade-offs\n\n\nR\n\n✓✓ Ease of Use\n✓ General Flexibility\n✗ General Performance\n\n\nPython\n\n✓✓ Performance\n✓ General Flexibility\n✗ Ease of Use (for data tasks)" + "objectID": "course-materials/cheatsheets/numpy.html#useful-numpy-functions-for-pandas-users", + "href": "course-materials/cheatsheets/numpy.html#useful-numpy-functions-for-pandas-users", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "# Get unique values\nunique_values = np.unique(arr)\n\n# Get value counts (similar to pandas value_counts())\nvalues, counts = np.unique(arr, return_counts=True)\n\n\n\n# Similar to pandas' where, but returns an array\nresult = np.where(condition, x, y)\n\n\n\n# Concatenate arrays (similar to pd.concat())\nconcatenated = np.concatenate([arr1, arr2, arr3])" }, { - "objectID": "course-materials/lectures/05_Session_1A.html#r-strengths-and-limitations", - "href": "course-materials/lectures/05_Session_1A.html#r-strengths-and-limitations", - "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", - "section": "R: Strengths and Limitations", - "text": "R: Strengths and Limitations\n\n\nStrengths:\n\nIntuitive for statistical operations\nConsistent data manipulation with tidyverse\nExcellent for quick analyses and visualizations\n\nLimitations:\n\nCan be slower for very large datasets\nLess efficient memory usage (more frequent copying)\nLimited in general-purpose programming tasks" + "objectID": "course-materials/cheatsheets/numpy.html#when-to-use-numpy-with-pandas", + "href": "course-materials/cheatsheets/numpy.html#when-to-use-numpy-with-pandas", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "Performance: For large datasets, NumPy operations can be faster than pandas.\nMemory efficiency: NumPy arrays use less memory than pandas objects.\nSpecific mathematical operations: Some mathematical operations are more straightforward in NumPy.\nInterfacing with other libraries: Many scientific Python libraries use NumPy arrays.\n\nRemember, while these NumPy operations are useful, many have direct equivalents in pandas that work on Series and DataFrames. Always consider whether you can perform the operation directly in pandas before converting to NumPy arrays." }, { - "objectID": "course-materials/lectures/05_Session_1A.html#python-strengths-and-limitations", - "href": "course-materials/lectures/05_Session_1A.html#python-strengths-and-limitations", - "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", - "section": "Python: Strengths and Limitations", - "text": "Python: Strengths and Limitations\n\n\nStrengths:\n\nEfficient for large-scale data processing\nVersatile for various programming tasks\nStrong in machine learning and deep learning\n\nLimitations:\n\nLess intuitive API for data manipulation (pandas)\nSteeper learning curve for data science tasks\nRequires more code for some statistical operations" + "objectID": "course-materials/cheatsheets/setting_up_python.html", + "href": "course-materials/cheatsheets/setting_up_python.html", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "This guide will help you set up Python 3.10 and JupyterLab on your local machine using Miniconda. We’ll also install core data science libraries." }, { - "objectID": "course-materials/lectures/05_Session_1A.html#implications-for-data-science", - "href": "course-materials/lectures/05_Session_1A.html#implications-for-data-science", - "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", - "section": "Implications for Data Science", - "text": "Implications for Data Science\n\n\nR excels in statistical computing and quick analyses\nPython shines in large-scale data processing and diverse applications\nChoice depends on specific needs:\n\nProject scale\nPerformance requirements\nTeam expertise\nIntegration with other systems" + "objectID": "course-materials/cheatsheets/setting_up_python.html#step-0-opening-a-terminal", + "href": "course-materials/cheatsheets/setting_up_python.html#step-0-opening-a-terminal", + "title": "EDS 217 Cheatsheet", + "section": "Step 0: Opening a Terminal", + "text": "Step 0: Opening a Terminal\nBefore we begin, you’ll need to know how to open a terminal (command-line interface) on your operating system:\n\nFor Windows:\n\nPress the Windows key + R to open the Run dialog.\nType cmd and press Enter. Alternatively, search for “Command Prompt” in the Start menu.\n\n\n\nFor macOS:\n\nPress Command + Space to open Spotlight Search.\nType “Terminal” and press Enter. Alternatively, go to Applications > Utilities > Terminal.\n\n\n\nFor Linux:\n\nMost Linux distributions use Ctrl + Alt + T as a keyboard shortcut to open the terminal.\nYou can also search for “Terminal” in your distribution’s application menu." }, { - "objectID": "course-materials/lectures/05_Session_1A.html#conclusion", - "href": "course-materials/lectures/05_Session_1A.html#conclusion", - "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", - "section": "Conclusion", - "text": "Conclusion\n\nUnderstanding these concepts helps in:\n\nChoosing the right tool for the job\nWriting efficient and correct code\nAppreciating the design decisions in data science tools\n\nBoth R and Python have their places in a data scientist’s toolkit\nConsider using both languages to leverage their respective strengths" + "objectID": "course-materials/cheatsheets/setting_up_python.html#step-1-download-and-install-miniconda", + "href": "course-materials/cheatsheets/setting_up_python.html#step-1-download-and-install-miniconda", + "title": "EDS 217 Cheatsheet", + "section": "Step 1: Download and Install Miniconda", + "text": "Step 1: Download and Install Miniconda\n\nFor Windows:\n\nDownload the Miniconda installer for Windows from the official website.\nRun the installer and follow the prompts.\nDuring installation, make sure to add Miniconda to your PATH environment variable when prompted.\n\n\n\nFor macOS:\n\nDownload the Miniconda installer for macOS from the official website.\nOpen Terminal and navigate to the directory containing the downloaded file.\nRun the following command:\nbash Miniconda3-latest-MacOSX-x86_64.sh\nFollow the prompts and accept the license agreement.\n\n\n\nFor Linux:\n\nDownload the Miniconda installer for Linux from the official website.\nOpen a terminal and navigate to the directory containing the downloaded file.\nRun the following command:\nbash Miniconda3-latest-Linux-x86_64.sh\nFollow the prompts and accept the license agreement." + }, + { + "objectID": "course-materials/cheatsheets/setting_up_python.html#step-2-set-up-python-3.10-and-core-libraries", + "href": "course-materials/cheatsheets/setting_up_python.html#step-2-set-up-python-3.10-and-core-libraries", + "title": "EDS 217 Cheatsheet", + "section": "Step 2: Set up Python 3.10 and Core Libraries", + "text": "Step 2: Set up Python 3.10 and Core Libraries\nOpen a new terminal or command prompt window to ensure the Miniconda installation is recognized.\nRun the following commands:\nconda install python=3.10\nconda install jupyter jupyterlab numpy pandas matplotlib seaborn\nThis will install Python 3.10, JupyterLab, and the core data science libraries in your base environment." }, { - "objectID": "course-materials/lectures/05_Session_1A.html#questions", - "href": "course-materials/lectures/05_Session_1A.html#questions", - "title": "What the Python (WTP)?: GroupBy(), copy(), and The Triple Dilemma", - "section": "Questions?", - "text": "Questions?" + "objectID": "course-materials/cheatsheets/setting_up_python.html#step-3-verify-installation", + "href": "course-materials/cheatsheets/setting_up_python.html#step-3-verify-installation", + "title": "EDS 217 Cheatsheet", + "section": "Step 3: Verify Installation", + "text": "Step 3: Verify Installation\n\nTo verify that Python 3.10 is installed, run:\npython --version\nTo launch JupyterLab, run:\njupyter lab\n\nThis should open JupyterLab in your default web browser. You can now create new notebooks and start coding!" }, { - "objectID": "course-materials/cheatsheets/JupyterLab.html", - "href": "course-materials/cheatsheets/JupyterLab.html", + "objectID": "course-materials/cheatsheets/setting_up_python.html#additional-notes", + "href": "course-materials/cheatsheets/setting_up_python.html#additional-notes", "title": "EDS 217 Cheatsheet", - "section": "", - "text": "Before we can use the Variable Inspector in JupyterLab, we need to install the extension. Follow these steps:\n\nStart a new JupyterLab session in your web browser.\nClick on the “+” button in the top left corner to open the Launcher (it might already be opened).\nUnder “Other”, click on “Terminal” to open a new terminal session.\nIn the terminal, type the following command and press Enter:\npip install lckr-jupyterlab-variableinspector\nWait for the installation to complete. You should see a message indicating successful installation.\nOnce the installation is complete, you need to restart JupyterLab for the changes to take effect. To do this:\n\nSave all your open notebooks and files.\nClose all browser tabs with JupyterLab.\nLogin to https://workbench-1.bren.ucsb.edu again.\nRestart a JupyterLab session\n\n\nAfter restarting JupyterLab, the Variable Inspector extension should be available for use.\n\n\n\nNow that you have installed the Variable Inspector extension, here’s how to use it:\nOpen the Variable Inspector: - Menu: View > Activate Command Palette, then type “variable inspector” - Shortcut: Ctrl + Shift + I (Windows/Linux) or Cmd + Shift + I (Mac) - Right-click in an open notebook and select “Open Variable Inspector” (will be at the bottom of the list)\nThe Variable Inspector shows: - Variable names - Types - Values/shapes - Count (for collections)\n\n\n\n\n\n\nLimits to the Variable Inspector\n\n\n\nThe variable inspector is not suitable for use with large dataframes or large arrays. You should use standard commands like df.head(), df.tail(), df.info(), df.describe() to inspect large dataframes.\n\n\n\n\nCode\n# Example variables\nx = 5\ny = \"Hello\"\nz = [1, 2, 3]\n\n# These will appear in the Variable Inspector" + "section": "Additional Notes", + "text": "Additional Notes\n\nTo update Miniconda and installed packages in the future, use:\nconda update --all\nWhile we’re using the base environment for this quick setup, it’s generally a good practice to create separate environments for different projects. You can explore this concept later as you become more familiar with conda." }, { - "objectID": "course-materials/cheatsheets/JupyterLab.html#variable-inspection-in-jupyterlab", - "href": "course-materials/cheatsheets/JupyterLab.html#variable-inspection-in-jupyterlab", + "objectID": "course-materials/cheatsheets/workflow_methods.html", + "href": "course-materials/cheatsheets/workflow_methods.html", "title": "EDS 217 Cheatsheet", "section": "", - "text": "Before we can use the Variable Inspector in JupyterLab, we need to install the extension. Follow these steps:\n\nStart a new JupyterLab session in your web browser.\nClick on the “+” button in the top left corner to open the Launcher (it might already be opened).\nUnder “Other”, click on “Terminal” to open a new terminal session.\nIn the terminal, type the following command and press Enter:\npip install lckr-jupyterlab-variableinspector\nWait for the installation to complete. You should see a message indicating successful installation.\nOnce the installation is complete, you need to restart JupyterLab for the changes to take effect. To do this:\n\nSave all your open notebooks and files.\nClose all browser tabs with JupyterLab.\nLogin to https://workbench-1.bren.ucsb.edu again.\nRestart a JupyterLab session\n\n\nAfter restarting JupyterLab, the Variable Inspector extension should be available for use.\n\n\n\nNow that you have installed the Variable Inspector extension, here’s how to use it:\nOpen the Variable Inspector: - Menu: View > Activate Command Palette, then type “variable inspector” - Shortcut: Ctrl + Shift + I (Windows/Linux) or Cmd + Shift + I (Mac) - Right-click in an open notebook and select “Open Variable Inspector” (will be at the bottom of the list)\nThe Variable Inspector shows: - Variable names - Types - Values/shapes - Count (for collections)\n\n\n\n\n\n\nLimits to the Variable Inspector\n\n\n\nThe variable inspector is not suitable for use with large dataframes or large arrays. You should use standard commands like df.head(), df.tail(), df.info(), df.describe() to inspect large dataframes.\n\n\n\n\nCode\n# Example variables\nx = 5\ny = \"Hello\"\nz = [1, 2, 3]\n\n# These will appear in the Variable Inspector" + "text": "This table maps commonly used pandas DataFrame methods to the steps in the course-specific data science workflow. Each method is linked to its official pandas documentation for easy reference.\n\n\n\nDataFrame Method —————————-\nImport\nExploration\nCleaning\nFiltering/ Selection\nTransforming\nSorting\nGrouping\nAggregating\nVisualizing\n\n\n\n\nread_csv()\n✓\n\n\n\n\n\n\n\n\n\n\nread_excel()\n✓\n\n\n\n\n\n\n\n\n\n\nhead()\n\n✓\n\n\n\n\n\n\n\n\n\ntail()\n\n✓\n\n\n\n\n\n\n\n\n\ninfo()\n\n✓\n✓\n\n\n\n\n\n\n\n\ndescribe()\n\n✓\n\n\n\n\n\n✓\n\n\n\ndtypes\n\n✓\n✓\n\n\n\n\n\n\n\n\nshape\n\n✓\n\n\n\n\n\n\n\n\n\ncolumns\n\n✓\n\n\n\n\n\n\n\n\n\nisnull()\n\n✓\n✓\n\n\n\n\n\n\n\n\nnotnull()\n\n✓\n✓\n\n\n\n\n\n\n\n\ndropna()\n\n\n✓\n\n✓\n\n\n\n\n\n\nfillna()\n\n\n✓\n\n✓\n\n\n\n\n\n\nreplace()\n\n\n✓\n\n✓\n\n\n\n\n\n\nastype()\n\n\n✓\n\n✓\n\n\n\n\n\n\nrename()\n\n\n✓\n\n✓\n\n\n\n\n\n\ndrop()\n\n\n✓\n✓\n✓\n\n\n\n\n\n\nduplicated()\n\n✓\n✓\n\n\n\n\n\n\n\n\ndrop_duplicates()\n\n\n✓\n\n✓\n\n\n\n\n\n\nvalue_counts()\n\n✓\n\n\n\n\n\n✓\n\n\n\nunique()\n\n✓\n\n\n\n\n\n\n\n\n\nnunique()\n\n✓\n\n\n\n\n\n✓\n\n\n\nsample()\n\n✓\n\n✓\n\n\n\n\n\n\n\ncorr()\n\n✓\n\n\n\n\n\n✓\n✓\n\n\ncov()\n\n✓\n\n\n\n\n\n✓\n\n\n\ngroupby()\n\n\n\n\n\n\n✓\n\n\n\n\nagg()\n\n\n\n\n\n\n✓\n✓\n\n\n\napply()\n\n\n\n\n✓\n\n\n\n\n\n\nmerge()\n\n\n\n\n✓\n\n\n\n\n\n\njoin()\n\n\n\n\n✓\n\n\n\n\n\n\nconcat()\n\n\n\n\n✓\n\n\n\n\n\n\npivot()\n\n\n\n\n✓\n\n\n\n\n\n\nmelt()\n\n\n\n\n✓\n\n\n\n\n\n\nsort_values()\n\n\n\n\n\n✓\n\n\n\n\n\nnlargest()\n\n\n\n✓\n\n✓\n\n\n\n\n\nnsmallest()\n\n\n\n✓\n\n✓\n\n\n\n\n\nquery()\n\n\n\n✓\n\n\n\n\n\n\n\neval()\n\n\n\n\n✓\n\n\n\n\n\n\ncut()\n\n\n\n\n✓\n\n\n\n\n\n\nqcut()\n\n\n\n\n✓\n\n\n\n\n\n\nget_dummies()\n\n\n\n\n✓\n\n\n\n\n\n\niloc[]\n\n\n\n✓\n\n\n\n\n\n\n\nloc[]\n\n\n\n✓\n\n\n\n\n\n\n\nplot()\n\n✓\n\n\n\n\n\n\n✓\n\n\n\nNote: This table includes some of the most commonly used DataFrame methods, but it’s not exhaustive. Some methods may be applicable to multiple steps depending on the specific use case." }, { - "objectID": "course-materials/cheatsheets/JupyterLab.html#essential-magic-commands", - "href": "course-materials/cheatsheets/JupyterLab.html#essential-magic-commands", + "objectID": "course-materials/cheatsheets/workflow_methods.html#pandas-dataframe-methods-in-data-science-workflows", + "href": "course-materials/cheatsheets/workflow_methods.html#pandas-dataframe-methods-in-data-science-workflows", "title": "EDS 217 Cheatsheet", - "section": "Essential Magic Commands", - "text": "Essential Magic Commands\nMagic commands start with % (line magics) or %% (cell magics). Note that available magic commands may vary depending on your Jupyter environment and installed extensions.\n\nViewing Variables\n\n\nCode\n# List all variables\n%whos\n\n# List just variable names\n%who\n\n\nVariable Type Data/Info\n----------------------------------\nojs_define function <function ojs_define at 0x1a6637e20>\nx int 5\ny str Hello\nz list n=3\nojs_define x y z \n\n\n\n\nRunning Shell Commands\n\n\nCode\n# Run a shell command\n!echo \"Hello from the shell!\"\n\n# Capture output in a variable\nfiles = !ls\nprint(files)\n\n\nHello from the shell!\n['JupyterLab.qmd', 'JupyterLab.quarto_ipynb', 'Pandas_Cheat_Sheet.pdf', 'bar_plots.html', 'bar_plots.qmd', 'chart_customization.qmd', 'comprehensions.qmd', 'control_flows.qmd', 'data_aggregation.html', 'data_aggregation.qmd', 'data_cleaning.html', 'data_cleaning.qmd', 'data_grouping.qmd', 'data_merging.html', 'data_merging.qmd', 'data_selection.qmd', 'dictionaries.html', 'dictionaries.qmd', 'first_steps.html', 'first_steps.ipynb', 'functions.html', 'functions.qmd', 'lists.html', 'lists.qmd', 'matplotlib.html', 'matplotlib.qmd', 'numpy.html', 'numpy.qmd', 'ocean_temperatures.csv', 'output.csv', 'pandas_dataframes.qmd', 'pandas_series.qmd', 'print.html', 'print.qmd', 'random_numbers.qmd', 'read_csv.qmd', 'seaborn.qmd', 'sets.qmd', 'setting_up_python.html', 'setting_up_python.qmd', 'timeseries.html', 'timeseries.qmd', 'workflow_methods.html', 'workflow_methods.qmd']" + "section": "", + "text": "This table maps commonly used pandas DataFrame methods to the steps in the course-specific data science workflow. Each method is linked to its official pandas documentation for easy reference.\n\n\n\nDataFrame Method —————————-\nImport\nExploration\nCleaning\nFiltering/ Selection\nTransforming\nSorting\nGrouping\nAggregating\nVisualizing\n\n\n\n\nread_csv()\n✓\n\n\n\n\n\n\n\n\n\n\nread_excel()\n✓\n\n\n\n\n\n\n\n\n\n\nhead()\n\n✓\n\n\n\n\n\n\n\n\n\ntail()\n\n✓\n\n\n\n\n\n\n\n\n\ninfo()\n\n✓\n✓\n\n\n\n\n\n\n\n\ndescribe()\n\n✓\n\n\n\n\n\n✓\n\n\n\ndtypes\n\n✓\n✓\n\n\n\n\n\n\n\n\nshape\n\n✓\n\n\n\n\n\n\n\n\n\ncolumns\n\n✓\n\n\n\n\n\n\n\n\n\nisnull()\n\n✓\n✓\n\n\n\n\n\n\n\n\nnotnull()\n\n✓\n✓\n\n\n\n\n\n\n\n\ndropna()\n\n\n✓\n\n✓\n\n\n\n\n\n\nfillna()\n\n\n✓\n\n✓\n\n\n\n\n\n\nreplace()\n\n\n✓\n\n✓\n\n\n\n\n\n\nastype()\n\n\n✓\n\n✓\n\n\n\n\n\n\nrename()\n\n\n✓\n\n✓\n\n\n\n\n\n\ndrop()\n\n\n✓\n✓\n✓\n\n\n\n\n\n\nduplicated()\n\n✓\n✓\n\n\n\n\n\n\n\n\ndrop_duplicates()\n\n\n✓\n\n✓\n\n\n\n\n\n\nvalue_counts()\n\n✓\n\n\n\n\n\n✓\n\n\n\nunique()\n\n✓\n\n\n\n\n\n\n\n\n\nnunique()\n\n✓\n\n\n\n\n\n✓\n\n\n\nsample()\n\n✓\n\n✓\n\n\n\n\n\n\n\ncorr()\n\n✓\n\n\n\n\n\n✓\n✓\n\n\ncov()\n\n✓\n\n\n\n\n\n✓\n\n\n\ngroupby()\n\n\n\n\n\n\n✓\n\n\n\n\nagg()\n\n\n\n\n\n\n✓\n✓\n\n\n\napply()\n\n\n\n\n✓\n\n\n\n\n\n\nmerge()\n\n\n\n\n✓\n\n\n\n\n\n\njoin()\n\n\n\n\n✓\n\n\n\n\n\n\nconcat()\n\n\n\n\n✓\n\n\n\n\n\n\npivot()\n\n\n\n\n✓\n\n\n\n\n\n\nmelt()\n\n\n\n\n✓\n\n\n\n\n\n\nsort_values()\n\n\n\n\n\n✓\n\n\n\n\n\nnlargest()\n\n\n\n✓\n\n✓\n\n\n\n\n\nnsmallest()\n\n\n\n✓\n\n✓\n\n\n\n\n\nquery()\n\n\n\n✓\n\n\n\n\n\n\n\neval()\n\n\n\n\n✓\n\n\n\n\n\n\ncut()\n\n\n\n\n✓\n\n\n\n\n\n\nqcut()\n\n\n\n\n✓\n\n\n\n\n\n\nget_dummies()\n\n\n\n\n✓\n\n\n\n\n\n\niloc[]\n\n\n\n✓\n\n\n\n\n\n\n\nloc[]\n\n\n\n✓\n\n\n\n\n\n\n\nplot()\n\n✓\n\n\n\n\n\n\n✓\n\n\n\nNote: This table includes some of the most commonly used DataFrame methods, but it’s not exhaustive. Some methods may be applicable to multiple steps depending on the specific use case." }, { - "objectID": "course-materials/cheatsheets/JupyterLab.html#useful-keyboard-shortcuts", - "href": "course-materials/cheatsheets/JupyterLab.html#useful-keyboard-shortcuts", + "objectID": "course-materials/cheatsheets/workflow_methods.html#key-takeaways", + "href": "course-materials/cheatsheets/workflow_methods.html#key-takeaways", "title": "EDS 217 Cheatsheet", - "section": "Useful Keyboard Shortcuts", - "text": "Useful Keyboard Shortcuts\n\n\n\n\n\n\n\n\nAction\nWindows/Linux\nMac\n\n\n\n\nRun cell\nShift + Enter\nShift + Enter\n\n\nRun cell and insert below\nAlt + Enter\nOption + Enter\n\n\nRun cell and select below\nCtrl + Enter\nCmd + Enter\n\n\nEnter command mode\nEsc\nEsc\n\n\nEnter edit mode\nEnter\nEnter\n\n\nSave notebook\nCtrl + S\nCmd + S\n\n\nInsert cell above\nA (in command mode)\nA (in command mode)\n\n\nInsert cell below\nB (in command mode)\nB (in command mode)\n\n\nCut cell\nX (in command mode)\nX (in command mode)\n\n\nCopy cell\nC (in command mode)\nC (in command mode)\n\n\nPaste cell\nV (in command mode)\nV (in command mode)\n\n\nUndo cell action\nZ (in command mode)\nZ (in command mode)\n\n\nChange to code cell\nY (in command mode)\nY (in command mode)\n\n\nChange to markdown cell\nM (in command mode)\nM (in command mode)\n\n\nSplit cell at cursor\nCtrl + Shift + -\nCmd + Shift + -\n\n\nMerge selected cells\nShift + M (in command mode)\nShift + M (in command mode)\n\n\nToggle line numbers\nShift + L (in command mode)\nShift + L (in command mode)\n\n\nToggle output\nO (in command mode)\nO (in command mode)" + "section": "Key Takeaways", + "text": "Key Takeaways\n\nImport primarily involves reading data from various sources.\nExploration methods help understand the structure and content of the data.\nCleaning methods focus on handling missing data, duplicates, and data type issues.\nFiltering/Selection methods allow you to subset your data based on various conditions.\nTransforming methods cover a wide range of data manipulation tasks.\nSorting methods help arrange data in a specific order.\nGrouping is often a precursor to aggregation operations.\nAggregating methods compute summary statistics on data.\nVisualizing methods help create graphical representations of the data.\n\nRemember that the applicability of methods can vary depending on the specific project and dataset. This table serves as a general guide to help you navigate the pandas DataFrame methods in the context of your course’s data science workflow. The links to the official documentation provide more detailed information about each method’s usage and parameters." }, { - "objectID": "course-materials/cheatsheets/JupyterLab.html#tips-for-beginners", - "href": "course-materials/cheatsheets/JupyterLab.html#tips-for-beginners", - "title": "EDS 217 Cheatsheet", - "section": "Tips for Beginners", - "text": "Tips for Beginners\n\nUse Tab for code completion\nAdd ? after a function name for more detailed help (e.g., print?)\nUse dir() to see available attributes/methods (e.g., dir(str))\nUse the help() command to get information about functions and objects." + "objectID": "course-materials/day2.html#class-materials", + "href": "course-materials/day2.html#class-materials", + "title": "Python Data Collections", + "section": "Class materials", + "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 2 / morning\n🐍 Lists\n🐍 Dictionaries\n\n\nday 2 / afternoon\n🙌 Working with Lists, Dictionaries, and Sets\n📝 List and Dictionary Comprehensions" }, { - "objectID": "course-materials/cheatsheets/JupyterLab.html#resources-for-further-learning", - "href": "course-materials/cheatsheets/JupyterLab.html#resources-for-further-learning", - "title": "EDS 217 Cheatsheet", - "section": "Resources for Further Learning", - "text": "Resources for Further Learning\n\nJupyterLab Documentation\nIPython Documentation (for magic commands)\nJupyter Notebook Cheatsheet\nDataCamp JupyterLab Tutorial" + "objectID": "course-materials/day2.html#end-of-day-practice", + "href": "course-materials/day2.html#end-of-day-practice", + "title": "Python Data Collections", + "section": "End-of-day practice", + "text": "End-of-day practice\nComplete the following tasks / activities before heading home for the day!\n\n Day 2 Practice: Python Data Structures Practice" }, { - "objectID": "course-materials/live-coding/dictionaries.html", - "href": "course-materials/live-coding/dictionaries.html", - "title": "[Live Coding] Session 2B", - "section": "", - "text": "Introduction to Dictionaries (5 minutes)\nCreating and Accessing Dictionaries (10 minutes)\nManipulating Dictionaries (10 minutes)\nIterating Over Dictionaries (5 minutes)\nStoring Structured Data Using Dictionaries (10 minutes)\nPractical Application in Data Science (5 minutes)\n\n\n\n\n\n\nObjective: Introduce what dictionaries are and their importance in Python.\nKey Points:\n\nDefinition: Dictionaries are collections of key-value pairs.\nUnordered and indexed by keys, making data access fast and efficient.\n\nLive Code Example:\n\nexample_dict = {'name': 'Earth', 'moons': 1}\nprint(\"Example dictionary:\", example_dict)\n\nExample dictionary: {'name': 'Earth', 'moons': 1}\n\n\n\n\n\n\n\nObjective: Show how to create dictionaries using different methods and how to access elements.\nKey Points:\n\nCreating dictionaries using curly braces {} and the dict() constructor.\nAccessing values using keys, demonstrating safe access with .get().\n\nLive Code Example:\n\n# Creating a dictionary using dict()\nanother_dict = dict(name='Mars', moons=2)\nprint(\"Another dictionary (dict()):\", another_dict)\n\nanother_dict2 = {'name': 'Mars',\n 'moons': 2\n }\n\nprint(\"Another dictionary ({}):\", another_dict2)\nprint(\"Are they the same?\", another_dict==another_dict2)\n\n# Accessing elements\nprint(\"Temperature using get (no default):\", example_dict.get('temp'))\nprint(\"Temperature using get (with default):\", example_dict.get('temp', 'No temperature data'))\n\nAnother dictionary (dict()): {'name': 'Mars', 'moons': 2}\nAnother dictionary ({}): {'name': 'Mars', 'moons': 2}\nAre they the same? True\nTemperature using get (no default): None\nTemperature using get (with default): No temperature data\n\n\n\n\n\n\n\nObjective: Teach how to add, update, delete dictionary items.\nKey Points:\n\nAdding and updating by assigning values to keys.\nRemoving items using del and pop().\n\nLive Code Example:\n\n# Adding a new key-value pair\nanother_dict['atmosphere'] = 'thin'\nprint(\"Updated with atmosphere:\", another_dict)\n\n# Removing an entry using del\ndel another_dict['atmosphere']\nprint(\"After deletion:\", another_dict)\n\n# Removing an entry using pop\nmoons = another_dict.pop('moons', 'No moons key found')\nprint(\"Removed moons:\", moons)\nprint(\"After popping moons:\", another_dict)\n\nUpdated with atmosphere: {'name': 'Mars', 'moons': 2, 'atmosphere': 'thin'}\nAfter deletion: {'name': 'Mars', 'moons': 2}\nRemoved moons: 2\nAfter popping moons: {'name': 'Mars'}\n\n\n\n\n\n\n\nObjective: Explain how to iterate over dictionary keys, values, and key-value pairs.\nKey Points:\n\nUsing .keys(), .values(), and .items() for different iteration needs.\n\nLive Code Example:\n\n# Creating a new dictionary for iteration examples\niteration_dict = {'planet': 'Earth', 'moons': 1, 'orbit': 'Sun'}\n\n# Iterating over keys\nprint(\"Keys:\")\nfor key in iteration_dict.keys():\n print(f\"Key: {key}\")\n\n# Iterating over values\nprint(\"\\nValues:\")\nfor value in iteration_dict.values():\n print(f\"Value: {value}\")\n\n# Iterating over items\nprint(\"\\nKey-Value Pairs:\")\nfor key, value in iteration_dict.items():\n print(f\"{key}: {value}\")\n\nKeys:\nKey: planet\nKey: moons\nKey: orbit\n\nValues:\nValue: Earth\nValue: 1\nValue: Sun\n\nKey-Value Pairs:\nplanet: Earth\nmoons: 1\norbit: Sun\n\n\nAdditional Notes:\n\nThe dict.keys(), dict.values(), and dict.items() methods are used to return view objects that provide a dynamic view on the dictionary’s keys, values, and key-value pairs respectively.\nThese views are iterable and reflect changes to the dictionary, making them highly useful for looping and other operations that involve dictionary elements.\nWhat Each Function Returns\n\ndict.keys():\n\n\nReturns a view object displaying all the keys in the dictionary (default)\nUseful for iterating over keys or checking if certain keys exist within the dictionary.\n\n\ndict.values():\n\n\nReturns a view object that contains all the values in the dictionary.\nThis is helpful for operations that need to access every value, such as aggregations or conditions applied to dictionary values.\n\n\ndict.items():\n\n\nReturns a view object with tuples containing (key, value) pairs.\nExtremely useful for looping through both keys and values simultaneously, allowing operations that depend on both elements.\n\nThese methods are particularly useful in data analysis, data cleaning, or any task where data stored in dictionaries needs systematic processing.\nTo learn more about how these iterables can be utilized in Python, you can visit the official Python documentation on iterables and iterators: Python Iterables and Iterators Documentation\n\n\n\n\n\n\nObjective: Show how dictionaries can handle complex, structured data.\nKey Points:\n\nNested dictionaries and lists to create multi-dimensional data structures.\n\nLive Code Example:\n\n# Nested dictionary for environmental data\nenvironmental_data = {\n 'Location A': {'temperature': 19, 'conditions': ['sunny', 'dry']},\n 'Location B': {'temperature': 22, 'conditions': ['rainy', 'humid']}\n}\nprint(\"Environmental data for Location A:\", environmental_data['Location A']['conditions'])\n\nEnvironmental data for Location A: ['sunny', 'dry']\n\n\n\n\n\n\n\nObjective: Demonstrate the use of dictionaries in data science for data aggregation.\nKey Points:\n\nUsing dictionaries to count occurrences and summarize data.\n\nLive Code Example:\n\nweather_log = ['sunny', 'rainy', 'sunny', 'cloudy', 'sunny', 'rainy']\nweather_count = {}\nfor condition in weather_log:\n weather_count[condition] = weather_count.get(condition, 0) + 1\nprint(\"Weather condition counts:\", weather_count)\n\nWeather condition counts: {'sunny': 3, 'rainy': 2, 'cloudy': 1}\n\n\n\n\n\n\n\n\nRecap: Highlight the flexibility and power of dictionaries in Python programming, especially for data manipulation and structured data operations." + "objectID": "course-materials/day2.html#additional-resources", + "href": "course-materials/day2.html#additional-resources", + "title": "Python Data Collections", + "section": "Additional Resources", + "text": "Additional Resources" }, { - "objectID": "course-materials/live-coding/dictionaries.html#session-outline", - "href": "course-materials/live-coding/dictionaries.html#session-outline", - "title": "[Live Coding] Session 2B", - "section": "", - "text": "Introduction to Dictionaries (5 minutes)\nCreating and Accessing Dictionaries (10 minutes)\nManipulating Dictionaries (10 minutes)\nIterating Over Dictionaries (5 minutes)\nStoring Structured Data Using Dictionaries (10 minutes)\nPractical Application in Data Science (5 minutes)\n\n\n\n\n\n\nObjective: Introduce what dictionaries are and their importance in Python.\nKey Points:\n\nDefinition: Dictionaries are collections of key-value pairs.\nUnordered and indexed by keys, making data access fast and efficient.\n\nLive Code Example:\n\nexample_dict = {'name': 'Earth', 'moons': 1}\nprint(\"Example dictionary:\", example_dict)\n\nExample dictionary: {'name': 'Earth', 'moons': 1}\n\n\n\n\n\n\n\nObjective: Show how to create dictionaries using different methods and how to access elements.\nKey Points:\n\nCreating dictionaries using curly braces {} and the dict() constructor.\nAccessing values using keys, demonstrating safe access with .get().\n\nLive Code Example:\n\n# Creating a dictionary using dict()\nanother_dict = dict(name='Mars', moons=2)\nprint(\"Another dictionary (dict()):\", another_dict)\n\nanother_dict2 = {'name': 'Mars',\n 'moons': 2\n }\n\nprint(\"Another dictionary ({}):\", another_dict2)\nprint(\"Are they the same?\", another_dict==another_dict2)\n\n# Accessing elements\nprint(\"Temperature using get (no default):\", example_dict.get('temp'))\nprint(\"Temperature using get (with default):\", example_dict.get('temp', 'No temperature data'))\n\nAnother dictionary (dict()): {'name': 'Mars', 'moons': 2}\nAnother dictionary ({}): {'name': 'Mars', 'moons': 2}\nAre they the same? True\nTemperature using get (no default): None\nTemperature using get (with default): No temperature data\n\n\n\n\n\n\n\nObjective: Teach how to add, update, delete dictionary items.\nKey Points:\n\nAdding and updating by assigning values to keys.\nRemoving items using del and pop().\n\nLive Code Example:\n\n# Adding a new key-value pair\nanother_dict['atmosphere'] = 'thin'\nprint(\"Updated with atmosphere:\", another_dict)\n\n# Removing an entry using del\ndel another_dict['atmosphere']\nprint(\"After deletion:\", another_dict)\n\n# Removing an entry using pop\nmoons = another_dict.pop('moons', 'No moons key found')\nprint(\"Removed moons:\", moons)\nprint(\"After popping moons:\", another_dict)\n\nUpdated with atmosphere: {'name': 'Mars', 'moons': 2, 'atmosphere': 'thin'}\nAfter deletion: {'name': 'Mars', 'moons': 2}\nRemoved moons: 2\nAfter popping moons: {'name': 'Mars'}\n\n\n\n\n\n\n\nObjective: Explain how to iterate over dictionary keys, values, and key-value pairs.\nKey Points:\n\nUsing .keys(), .values(), and .items() for different iteration needs.\n\nLive Code Example:\n\n# Creating a new dictionary for iteration examples\niteration_dict = {'planet': 'Earth', 'moons': 1, 'orbit': 'Sun'}\n\n# Iterating over keys\nprint(\"Keys:\")\nfor key in iteration_dict.keys():\n print(f\"Key: {key}\")\n\n# Iterating over values\nprint(\"\\nValues:\")\nfor value in iteration_dict.values():\n print(f\"Value: {value}\")\n\n# Iterating over items\nprint(\"\\nKey-Value Pairs:\")\nfor key, value in iteration_dict.items():\n print(f\"{key}: {value}\")\n\nKeys:\nKey: planet\nKey: moons\nKey: orbit\n\nValues:\nValue: Earth\nValue: 1\nValue: Sun\n\nKey-Value Pairs:\nplanet: Earth\nmoons: 1\norbit: Sun\n\n\nAdditional Notes:\n\nThe dict.keys(), dict.values(), and dict.items() methods are used to return view objects that provide a dynamic view on the dictionary’s keys, values, and key-value pairs respectively.\nThese views are iterable and reflect changes to the dictionary, making them highly useful for looping and other operations that involve dictionary elements.\nWhat Each Function Returns\n\ndict.keys():\n\n\nReturns a view object displaying all the keys in the dictionary (default)\nUseful for iterating over keys or checking if certain keys exist within the dictionary.\n\n\ndict.values():\n\n\nReturns a view object that contains all the values in the dictionary.\nThis is helpful for operations that need to access every value, such as aggregations or conditions applied to dictionary values.\n\n\ndict.items():\n\n\nReturns a view object with tuples containing (key, value) pairs.\nExtremely useful for looping through both keys and values simultaneously, allowing operations that depend on both elements.\n\nThese methods are particularly useful in data analysis, data cleaning, or any task where data stored in dictionaries needs systematic processing.\nTo learn more about how these iterables can be utilized in Python, you can visit the official Python documentation on iterables and iterators: Python Iterables and Iterators Documentation\n\n\n\n\n\n\nObjective: Show how dictionaries can handle complex, structured data.\nKey Points:\n\nNested dictionaries and lists to create multi-dimensional data structures.\n\nLive Code Example:\n\n# Nested dictionary for environmental data\nenvironmental_data = {\n 'Location A': {'temperature': 19, 'conditions': ['sunny', 'dry']},\n 'Location B': {'temperature': 22, 'conditions': ['rainy', 'humid']}\n}\nprint(\"Environmental data for Location A:\", environmental_data['Location A']['conditions'])\n\nEnvironmental data for Location A: ['sunny', 'dry']\n\n\n\n\n\n\n\nObjective: Demonstrate the use of dictionaries in data science for data aggregation.\nKey Points:\n\nUsing dictionaries to count occurrences and summarize data.\n\nLive Code Example:\n\nweather_log = ['sunny', 'rainy', 'sunny', 'cloudy', 'sunny', 'rainy']\nweather_count = {}\nfor condition in weather_log:\n weather_count[condition] = weather_count.get(condition, 0) + 1\nprint(\"Weather condition counts:\", weather_count)\n\nWeather condition counts: {'sunny': 3, 'rainy': 2, 'cloudy': 1}\n\n\n\n\n\n\n\n\nRecap: Highlight the flexibility and power of dictionaries in Python programming, especially for data manipulation and structured data operations." + "objectID": "course-materials/day4.html#class-materials", + "href": "course-materials/day4.html#class-materials", + "title": "Working with DataFrames in Pandas", + "section": "Class materials", + "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 4 / morning\n🐼 Intro to DataFrames\n🙌 Coding Colab: Working with DataFrames\n\n\nday 4 / afternoon\n🐼 DataFrame Workflows\n📝 Data Import/Export" }, { - "objectID": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html", - "href": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html", - "title": "Day 2: 🙌 Coding Colab", - "section": "", - "text": "Before we begin, here are quick links to our course cheatsheets. These may be helpful during the exercise:\n\nPython Basics Cheatsheet\nList Cheatsheet\nDictionaries Cheatsheet\nSets Cheatsheet\n\nFeel free to refer to these cheatsheets throughout the exercise if you need a quick reminder about syntax or functionality." + "objectID": "course-materials/day4.html#end-of-day-practice", + "href": "course-materials/day4.html#end-of-day-practice", + "title": "Working with DataFrames in Pandas", + "section": "End-of-day practice", + "text": "End-of-day practice\nComplete the following tasks / activities before heading home for the day!\n\n Day 4 Practice: Reading, Visualizing, and Exporting Data in Pandas" }, { - "objectID": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#quick-references", - "href": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#quick-references", - "title": "Day 2: 🙌 Coding Colab", - "section": "", - "text": "Before we begin, here are quick links to our course cheatsheets. These may be helpful during the exercise:\n\nPython Basics Cheatsheet\nList Cheatsheet\nDictionaries Cheatsheet\nSets Cheatsheet\n\nFeel free to refer to these cheatsheets throughout the exercise if you need a quick reminder about syntax or functionality." + "objectID": "course-materials/day4.html#additional-resources", + "href": "course-materials/day4.html#additional-resources", + "title": "Working with DataFrames in Pandas", + "section": "Additional Resources", + "text": "Additional Resources" }, { - "objectID": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#introduction-to-paired-programming-5-minutes", - "href": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#introduction-to-paired-programming-5-minutes", - "title": "Day 2: 🙌 Coding Colab", - "section": "Introduction to Paired Programming (5 minutes)", - "text": "Introduction to Paired Programming (5 minutes)\nWelcome to today’s Coding Colab! In this session, you’ll be working in pairs to explore and reinforce your understanding of lists and dictionaries, while also discovering the unique features of sets.\n\nBenefits of Paired Programming\n\nKnowledge sharing: Learn from each other’s experiences and approaches.\nImproved code quality: Catch errors earlier with two sets of eyes.\nEnhanced problem-solving: Discuss ideas for more creative solutions.\nSkill development: Improve communication and teamwork skills.\n\n\n\nHow to Make the Most of Paired Programming\n\nAssign roles: One person is the “driver” (typing), the other is the “navigator” (reviewing).\nSwitch roles regularly: Swap every 10-15 minutes to stay engaged.\nCommunicate clearly: Explain your thought process and ask questions.\nBe open to ideas: Listen to your partner’s suggestions.\nStay focused: Keep the conversation relevant to the task." + "objectID": "course-materials/day6.html#class-materials", + "href": "course-materials/day6.html#class-materials", + "title": "Data Handling and Visualization, Day 1", + "section": "Class materials", + "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 6 / morning\n🐼 Grouping, joining, sorting, and applying\n🙌 Coding Colab: Data Manipulation\n\n\nday 6 / afternoon\n🐼 Working with dates\nEnd-of-day practice" }, { - "objectID": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#exercise-overview", - "href": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#exercise-overview", - "title": "Day 2: 🙌 Coding Colab", - "section": "Exercise Overview", - "text": "Exercise Overview\nThis Coding Colab will reinforce your understanding of lists and dictionaries while introducing you to sets. You’ll work through a series of tasks, discussing and implementing solutions together." + "objectID": "course-materials/day6.html#end-of-day-practice", + "href": "course-materials/day6.html#end-of-day-practice", + "title": "Data Handling and Visualization, Day 1", + "section": "End-of-day practice", + "text": "End-of-day practice\nComplete the following tasks / activities before heading home for the day!\n\n Day 6 Practice: 🕺 Eurovision Data Science 💃" }, { - "objectID": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#part-1-lists-and-dictionaries-review-15-minutes", - "href": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#part-1-lists-and-dictionaries-review-15-minutes", - "title": "Day 2: 🙌 Coding Colab", - "section": "Part 1: Lists and Dictionaries Review (15 minutes)", - "text": "Part 1: Lists and Dictionaries Review (15 minutes)\n\nTask 1: List Operations\nCreate a list of your favorite fruits and perform the following operations:\n\nCreate a list called fruits with at least 3 fruit names.\nAdd a new fruit to the end of the list.\nRemove the second fruit from the list.\nPrint the final list.\n\n\n\nTask 2: Dictionary Operations\nCreate a dictionary representing a simple inventory system:\n\nCreate a dictionary called inventory with at least 3 items and their quantities.\nAdd a new item to the inventory.\nUpdate the quantity of an existing item.\nPrint the final inventory." + "objectID": "course-materials/day6.html#additional-resources", + "href": "course-materials/day6.html#additional-resources", + "title": "Data Handling and Visualization, Day 1", + "section": "Additional Resources", + "text": "Additional Resources" }, { - "objectID": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#part-2-introducing-sets-15-minutes", - "href": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#part-2-introducing-sets-15-minutes", - "title": "Day 2: 🙌 Coding Colab", - "section": "Part 2: Introducing Sets (15 minutes)", - "text": "Part 2: Introducing Sets (15 minutes)\nHere’s a Sets Cheatsheet. Sets are a lot like lists, so take a look at the cheatsheet to see how they are created and manipulated!\n\nTask 3: Creating and Manipulating Sets\n\nCreate two sets: evens with even numbers from 2 to 10, and odds with odd numbers from 1 to 9.\nPrint both sets.\nFind and print the union of the two sets.\nFind and print the intersection of the two sets.\nAdd a new element to the evens set.\n\n\n\nTask 4: Combining Set Operations and List Operations\nUsing a set is a great way to remove duplicates in a list.\n\nCreate a list with some duplicates: numbers = [1, 2, 2, 3, 3, 3, 4, 4, 5]\nUse a set to remove duplicates.\nCreate a new list from the set.\nPrint the new list without duplicates" + "objectID": "course-materials/day8.html#class-materials", + "href": "course-materials/day8.html#class-materials", + "title": "Building a Python Data Science Workflow", + "section": "Class materials", + "text": "Class materials\n\n\n\n\n\n\n\n\n Session\n Session 1\n Session 2\n\n\n\n\nday 8 / morning\nWorking on Final Data Science Project (all day)\n\n\n\nday 8 / afternoon" }, { - "objectID": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#conclusion-and-discussion-10-minutes", - "href": "course-materials/coding-colabs/2c_lists_dictionaries_sets.html#conclusion-and-discussion-10-minutes", - "title": "Day 2: 🙌 Coding Colab", - "section": "Conclusion and Discussion (10 minutes)", - "text": "Conclusion and Discussion (10 minutes)\nAs a pair, discuss the following questions:\n\nWhat are the main differences between lists, dictionaries, and sets?\nIn what situations would you prefer to use a set over a list or dictionary?\nHow did working in pairs help you understand these concepts better?" + "objectID": "course-materials/day8.html#syncing-your-classwork-to-github", + "href": "course-materials/day8.html#syncing-your-classwork-to-github", + "title": "Building a Python Data Science Workflow", + "section": "Syncing your classwork to Github", + "text": "Syncing your classwork to Github\nHere are some directions for syncing your classwork with a GitHub repository" }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#introduction", - "href": "course-materials/lectures/05_Drop_or_Impute.html#introduction", - "title": "EDS 217 - Lecture", - "section": "Introduction", - "text": "Introduction\n\nData cleaning is crucial in data analysis\nMissing data is a common challenge\nTwo main approaches:\n\nDropping missing data\nImputation\n\nUnderstanding the nature of missingness is key" + "objectID": "course-materials/day8.html#end-of-day-practice", + "href": "course-materials/day8.html#end-of-day-practice", + "title": "Building a Python Data Science Workflow", + "section": "End-of-day practice", + "text": "End-of-day practice\nThere are no additional end-of-day tasks / activities today!" }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#types-of-missing-data", - "href": "course-materials/lectures/05_Drop_or_Impute.html#types-of-missing-data", - "title": "EDS 217 - Lecture", - "section": "Types of Missing Data", - "text": "Types of Missing Data\n\n\nMissing Completely at Random (MCAR)\nMissing at Random (MAR)\nMissing Not at Random (MNAR)" + "objectID": "course-materials/day8.html#additional-resources", + "href": "course-materials/day8.html#additional-resources", + "title": "Building a Python Data Science Workflow", + "section": "Additional Resources", + "text": "Additional Resources" }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#missing-completely-at-random-mcar", - "href": "course-materials/lectures/05_Drop_or_Impute.html#missing-completely-at-random-mcar", - "title": "EDS 217 - Lecture", - "section": "Missing Completely at Random (MCAR)", - "text": "Missing Completely at Random (MCAR)\n\nNo relationship between missingness and any values\nExample: Survey responses lost due to a computer glitch\nLeast problematic type of missing data\nDropping MCAR data is generally safe but reduces sample size" + "objectID": "course-materials/interactive-sessions/1a_iPython_JupyterLab.html", + "href": "course-materials/interactive-sessions/1a_iPython_JupyterLab.html", + "title": "Interactive Session 1A", + "section": "", + "text": "This is a short exercise to introduce you to the IPython REPL within a JupyterLab session hosted on a Posit Workbench server. This exercise will help you become familiar with the interactive environment we will use throughout the class (and throughout your time in the MEDS program) as well as an introduction to some basic Python operations.\n\n\nExercise: Introduction to IPython REPL in JupyterLab\nObjective: Learn how to use the IPython REPL in JupyterLab for basic Python programming and explore some interactive features.\n\n\n\n\n\n\nNote\n\n\n\nThe Read-Eval-Print Loop (REPL) is an interactive programming environment that allows users to execute Python code line-by-line, providing immediate feedback and facilitating rapid testing, debugging, and learning.\n\n\n\n\nStep 1: Access JupyterLab\n\nLog in to the Posit Workbench Server\n\nOpen a web browser and go to workbench-1.bren.ucsb.edu.\nEnter your login credentials to access the server.\n\nSelect JupyterLab\n\nOnce logged in, click on the “New Session” button, and select “JupyterLab” from the list of options\n\n\nStart JupyterLab Session\n\nClick the “Start Session” button in the lower right of the modal window. You don’t need to edit either the Session Name or Cluster.\n\nWait for the Session to Launch\n\nYour browser will auto join the session as soon as the server starts it up.\n\n\n\n\n\nStep 2: Open the IPython REPL\nWhen the session launches you should see an interface that looks like this:\n\n\nStart a Terminal\n\nSelect “Terminal” from the list of available options in the Launcher pane. This will open a new terminal tab.\n\nLaunch IPython\n\nIn the terminal, type ipython and press Enter to start the IPython REPL.\n\n\n\n\n\nStep 3: Basic IPython Commands\nIn the IPython REPL, try the following commands to get familiar with the environment:\n\nBasic Arithmetic Operations\n\nCalculate the sum of two numbers:\n3 + 5\nMultiply two numbers:\n4 * 7\nDivide two numbers:\n10 / 2\n\nVariable Assignment\n\nAssign a value to a variable and use it in a calculation:\nx = 10\ny = 5\nresult = x * y\nresult\n\nBuilt-in Functions\n\nUse a built-in function to find the maximum of a list of numbers:\nnumbers = [3, 9, 1, 6, 2]\nmax(numbers)\n\nInteractive Help\n\nUse the help() function to get more information about a built-in function:\nhelp(print)\nUse the ? to get a quick description of an object or function:\nlen?\n\n\n\n\n\nStep 4: Explore IPython Features\n\nTab Completion\n\nStart typing a command or variable name and press Tab to auto-complete or view suggestions:\nnum # Press Tab here\n\nMagic Commands\n\nUse the %timeit magic command to time the execution of a statement:\n%timeit sum(range(1000))\n\nHistory\n\nView the command history using the %history magic command:\n%history\n\nClear the Console\n\nClear the current console session with:\n%clear\n\n\n\n\n\nStep 5: Exit the IPython REPL\n\nTo exit the IPython REPL, type exit() or press Ctrl+D.\n\n\n\n\nWrap-Up\nCongratulations! You have completed the introduction to the IPython REPL in JupyterLab. You learned how to perform basic operations, use interactive help, explore magic commands, and utilize IPython features.\nFeel free to explore more IPython functionalities or ask questions if you need further assistance.\n\n\nEnd interactive session 1A" }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#mcar-example-assigning-nan-randomly", - "href": "course-materials/lectures/05_Drop_or_Impute.html#mcar-example-assigning-nan-randomly", - "title": "EDS 217 - Lecture", - "section": "MCAR Example (Assigning nan randomly)", - "text": "MCAR Example (Assigning nan randomly)\n\nimport pandas as pd\nimport numpy as np\n\n# Create sample data with MCAR\nnp.random.seed(42)\ndf = pd.DataFrame({'A': np.random.rand(100), 'B': np.random.rand(100)})\ndf.loc[np.random.choice(df.index, 10, replace=False), 'A'] = np.nan\nprint(df.isnull().sum())\n\nA 10\nB 0\ndtype: int64" + "objectID": "course-materials/lectures/data_types.html#types-of-data-in-python", + "href": "course-materials/lectures/data_types.html#types-of-data-in-python", + "title": "Basic Data Types in Python", + "section": "Types of Data in Python", + "text": "Types of Data in Python\nPython categorizes data into two main types:\n\nValues: Singular items like numbers or strings.\nCollections: Groupings of values, like lists or dictionaries.\n\nMutable vs Immutable\n\nMutable: Objects whose content can be changed after creation.\nImmutable: Objects that cannot be altered after they are created." }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#missing-at-random-mar", - "href": "course-materials/lectures/05_Drop_or_Impute.html#missing-at-random-mar", - "title": "EDS 217 - Lecture", - "section": "Missing at Random (MAR)", - "text": "Missing at Random (MAR)\n\nMissingness is related to other observed variables\nExample: Older participants more likely to skip income questions\nMore common in real-world datasets\nDropping MAR data can introduce bias" + "objectID": "course-materials/lectures/data_types.html#overview-of-main-data-types", + "href": "course-materials/lectures/data_types.html#overview-of-main-data-types", + "title": "Basic Data Types in Python", + "section": "Overview of Main Data Types", + "text": "Overview of Main Data Types\n\n\n\n\n\n\n\n\nCategory\nMutable\nImmutable\n\n\n\n\nValues\n-\nint, float, complex, str\n\n\nCollections\nlist, dict, set, bytearray\ntuple, frozenset" }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#mar-example-assigning-nan-randomly-filtered-on-column-value", - "href": "course-materials/lectures/05_Drop_or_Impute.html#mar-example-assigning-nan-randomly-filtered-on-column-value", - "title": "EDS 217 - Lecture", - "section": "MAR Example (Assigning nan randomly, filtered on column value)", - "text": "MAR Example (Assigning nan randomly, filtered on column value)\n\n# Create sample data with MAR\nnp.random.seed(42)\ndf = pd.DataFrame({\n 'Age': np.random.randint(18, 80, 100),\n 'Income': np.random.randint(20000, 100000, 100)\n})\ndf.loc[df['Age'] > 60, 'Income'] = np.where(\n np.random.rand(len(df[df['Age'] > 60])) < 0.3, \n np.nan, \n df.loc[df['Age'] > 60, 'Income']\n)\nprint(df[df['Age'] > 60]['Income'].isnull().sum() / len(df[df['Age'] > 60]))\n\n0.2972972972972973" + "objectID": "course-materials/lectures/data_types.html#numeric-types", + "href": "course-materials/lectures/data_types.html#numeric-types", + "title": "Basic Data Types in Python", + "section": "Numeric Types", + "text": "Numeric Types\nIntegers (int)\n\nUse: Counting, indexing, and more.\nConstruction: x = 5\nImmutable: Cannot change the value of x without creating a new int." }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#missing-not-at-random-mnar", - "href": "course-materials/lectures/05_Drop_or_Impute.html#missing-not-at-random-mnar", - "title": "EDS 217 - Lecture", - "section": "Missing Not at Random (MNAR)", - "text": "Missing Not at Random (MNAR)\n\nMissingness is related to the missing values themselves\nExample: People with high incomes more likely to skip income questions\nMost problematic type of missing data\nNeither dropping nor simple imputation may be appropriate" + "objectID": "course-materials/lectures/data_types.html#numeric-types-continued", + "href": "course-materials/lectures/data_types.html#numeric-types-continued", + "title": "Basic Data Types in Python", + "section": "Numeric Types (continued)", + "text": "Numeric Types (continued)\nFloating-Point Numbers (float)\n\nUse: Representing real numbers for measurements, fractions, etc.\nConstruction: y = 3.14\nImmutable: Like integers, any change creates a new float." }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#dropping-missing-data", - "href": "course-materials/lectures/05_Drop_or_Impute.html#dropping-missing-data", - "title": "EDS 217 - Lecture", - "section": "Dropping Missing Data", - "text": "Dropping Missing Data\nPros:\n\n\nSimple and quick\nMaintains the distribution of complete cases\nAppropriate for MCAR data\n\n\nCons:\n\n\nReduces sample size\nCan introduce bias for MAR or MNAR data\nMay lose important information" + "objectID": "course-materials/lectures/data_types.html#text-type", + "href": "course-materials/lectures/data_types.html#text-type", + "title": "Basic Data Types in Python", + "section": "Text Type", + "text": "Text Type\nStrings (str)\n\nUse: Handling textual data.\nConstruction: s = \"Data Science\"\nImmutable: Modifying s requires creating a new string." }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#drop-example", - "href": "course-materials/lectures/05_Drop_or_Impute.html#drop-example", - "title": "EDS 217 - Lecture", - "section": "Drop Example", - "text": "Drop Example\n\n# Dropping missing data\ndf_dropped = df.dropna()\nprint(f\"Original shape: {df.shape}, After dropping: {df_dropped.shape}\")\n\nOriginal shape: (100, 2), After dropping: (89, 2)" + "objectID": "course-materials/lectures/data_types.html#sequence-types", + "href": "course-materials/lectures/data_types.html#sequence-types", + "title": "Basic Data Types in Python", + "section": "Sequence Types", + "text": "Sequence Types\nLists (list)\n\nUse: Storing an ordered collection of items.\nConstruction: my_list = [1, 2, 3]\nMutable: Items can be added, removed, or changed." }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#imputation", - "href": "course-materials/lectures/05_Drop_or_Impute.html#imputation", - "title": "EDS 217 - Lecture", - "section": "Imputation", - "text": "Imputation\nPros:\n\n\nPreserves sample size\nCan reduce bias for MAR data\nAllows use of all available information\n\n\nCons:\n\n\nCan introduce bias if done incorrectly\nMay underestimate variability\nCan be computationally intensive for complex methods" + "objectID": "course-materials/lectures/data_types.html#sequence-types-continued", + "href": "course-materials/lectures/data_types.html#sequence-types-continued", + "title": "Basic Data Types in Python", + "section": "Sequence Types (continued)", + "text": "Sequence Types (continued)\nTuples (tuple)\n\nUse: Immutable lists. Often used where a fixed, unchangeable sequence is needed.\nConstruction: my_tuple = (1, 2, 3)\nImmutable: Cannot alter the contents once created." }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#imputation-example", - "href": "course-materials/lectures/05_Drop_or_Impute.html#imputation-example", - "title": "EDS 217 - Lecture", - "section": "Imputation Example", - "text": "Imputation Example\n\n# Simple mean imputation\ndf_imputed = df.fillna(df.mean())\nprint(f\"Original missing: {df['Income'].isnull().sum()}, After imputation: {df_imputed['Income'].isnull().sum()}\")\n\nOriginal missing: 11, After imputation: 0" + "objectID": "course-materials/lectures/data_types.html#set-types", + "href": "course-materials/lectures/data_types.html#set-types", + "title": "Basic Data Types in Python", + "section": "Set Types", + "text": "Set Types\nSets (set)\n\nUse: Unique collection of items, great for membership testing, removing duplicates.\nConstruction: my_set = {1, 2, 3}\nMutable: Can add or remove items." }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#imputation-methods", - "href": "course-materials/lectures/05_Drop_or_Impute.html#imputation-methods", - "title": "EDS 217 - Lecture", - "section": "Imputation Methods", - "text": "Imputation Methods\n\nSimple imputation:\n\nMean, median, mode\nLast observation carried forward (LOCF)\n\nAdvanced imputation:\n\nMultiple Imputation\nK-Nearest Neighbors (KNN)\nRegression imputation" + "objectID": "course-materials/lectures/data_types.html#set-types-continued", + "href": "course-materials/lectures/data_types.html#set-types-continued", + "title": "Basic Data Types in Python", + "section": "Set Types (continued)", + "text": "Set Types (continued)\nFrozen Sets (frozenset)\n\nUse: Immutable version of sets.\nConstruction: my_frozenset = frozenset([1, 2, 3])\nImmutable: Safe for use as dictionary keys." }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#best-practices", - "href": "course-materials/lectures/05_Drop_or_Impute.html#best-practices", - "title": "EDS 217 - Lecture", - "section": "Best Practices", - "text": "Best Practices\n\n\nUnderstand your data and the missingness mechanism\nVisualize patterns of missingness\nConsider the impact on your analysis\nUse appropriate methods based on the type of missingness\nConduct sensitivity analyses\nDocument your approach and assumptions" + "objectID": "course-materials/lectures/data_types.html#mapping-types", + "href": "course-materials/lectures/data_types.html#mapping-types", + "title": "Basic Data Types in Python", + "section": "Mapping Types", + "text": "Mapping Types\nDictionaries (dict)\n\nUse: Key-value pairs for fast lookup and data management.\nConstruction: my_dict = {'key': 'value'}\nMutable: Add, remove, or change associations." }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#conclusion", - "href": "course-materials/lectures/05_Drop_or_Impute.html#conclusion", - "title": "EDS 217 - Lecture", + "objectID": "course-materials/lectures/data_types.html#conclusion", + "href": "course-materials/lectures/data_types.html#conclusion", + "title": "Basic Data Types in Python", "section": "Conclusion", - "text": "Conclusion\n\nUnderstanding the nature of missingness is crucial\nBoth dropping and imputation have pros and cons\nChoose the appropriate method based on:\n\nType of missingness (MCAR, MAR, MNAR)\nSample size\nAnalysis goals\n\nAlways document your approach and conduct sensitivity analyses" + "text": "Conclusion\nUnderstanding these basic types is crucial for data handling and manipulation in Python, especially in data science where the type of data dictates the analysis technique. As we move into more advanced Python we will get to know more complex data types.\nFor more information, you can always refer to the Python official documentation." }, { - "objectID": "course-materials/lectures/05_Drop_or_Impute.html#questions", - "href": "course-materials/lectures/05_Drop_or_Impute.html#questions", - "title": "EDS 217 - Lecture", - "section": "Questions?", - "text": "Questions?\nThank you for your attention!" + "objectID": "course-materials/live-coding/3a_control_flows.html#overview", + "href": "course-materials/live-coding/3a_control_flows.html#overview", + "title": "Live Coding Session 3A", + "section": "Overview", + "text": "Overview\nIn this session, we will be exploring Control Flows - if-elif, for, while and other ways of altering the flow of code execution. Live coding is a great way to learn programming as it allows you to see the process of writing code in real-time, including how to deal with unexpected issues and debug errors." }, { - "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html", - "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html", - "title": "Day 6: 🙌 Coding Colab", - "section": "", - "text": "In this coding colab, you’ll analyze global temperature anomalies and CO2 concentration data. You’ll practice data manipulation, joining datasets, time series analysis, and visualization techniques." + "objectID": "course-materials/live-coding/3a_control_flows.html#objectives", + "href": "course-materials/live-coding/3a_control_flows.html#objectives", + "title": "Live Coding Session 3A", + "section": "Objectives", + "text": "Objectives\n\nUnderstand the fundamentals of flow control in Python.\nApply if-elif-else constructions in practical examples.\nUse for and while loops to iterate through collections.\nDevelop the ability to troubleshoot and debug in a live setting." }, { - "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#introduction", - "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#introduction", - "title": "Day 6: 🙌 Coding Colab", - "section": "", - "text": "In this coding colab, you’ll analyze global temperature anomalies and CO2 concentration data. You’ll practice data manipulation, joining datasets, time series analysis, and visualization techniques." + "objectID": "course-materials/live-coding/3a_control_flows.html#getting-started", + "href": "course-materials/live-coding/3a_control_flows.html#getting-started", + "title": "Live Coding Session 3A", + "section": "Getting Started", + "text": "Getting Started\nBefore we begin our interactive session, please follow these steps to set up your Jupyter Notebook:\n\nOpen JupyterLab and create a new notebook:\n\nClick on the + button in the top left corner\nSelect Python 3.10.0 from the Notebook options\n\nRename your notebook:\n\nRight-click on the Untitled.ipynb tab\nSelect “Rename”\nName your notebook with the format: Session_XY_Topic.ipynb (Replace X with the day number and Y with the session number)\n\nAdd a title cell:\n\nIn the first cell of your notebook, change the cell type to “Markdown”\nAdd the following content (replace the placeholders with the actual information):\n\n\n# Day X: Session Y - [Session Topic]\n\n[Link to session webpage]\n\nDate: [Current Date]\n\nSet up your notebook:\nPlease set up your Jupyter Notebook with the following structure. We’ll fill in the content together during our session.\n\n ### Introduction to Control Flows\n\n <insert code cell below> \n\n ### Conditionals\n\n #### Basic If Statement\n\n <insert code cell below> \n \n #### Adding Else\n\n <insert code cell below> \n \n #### Using Elif\n\n <insert code cell below> \n \n ### Loops\n\n #### For Loops\n\n <insert code cell below> \n \n #### While Loops\n\n <insert code cell below> \n \n ### Applying Control Flows in Data Science\n\n <insert code cell below> \n \n ### Conclusion\n\n <insert code cell below> \n \n\n\n\n\n\n\nCaution\n\n\n\nDon’t forget to save your work frequently by clicking the save icon or using the keyboard shortcut (Ctrl+S or Cmd+S).\n\n\nRemember, we’ll be coding together, so don’t worry about filling in the content now. Just set up the structure, and we’ll dive into the details during our session!\n\nParticipation:\n\nTry to code along with me during the session.\nFeel free to ask questions at any time. Remember, if you have a question, others probably do too!\n\nResources:\n\nI will be sharing snippets of code and notes. Make sure to take your own notes and save snippets in your notebook for future reference.\nCheck out our class control flows cheatsheet." }, { - "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#learning-objectives", - "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#learning-objectives", - "title": "Day 6: 🙌 Coding Colab", - "section": "Learning Objectives", - "text": "Learning Objectives\nBy the end of this colab, you will be able to:\n\nLoad and preprocess time series data\nJoin datasets based on datetime indices\nPerform basic time series analysis and resampling\nApply data manipulation techniques to extract insights from environmental datasets" + "objectID": "course-materials/live-coding/3a_control_flows.html#session-format", + "href": "course-materials/live-coding/3a_control_flows.html#session-format", + "title": "Live Coding Session 3A", + "section": "Session Format", + "text": "Session Format\n\nIntroduction\n\nA brief discussion about the topic in Python programming and its importance in data science.\n\n\n\nDemonstration\n\nI will demonstrate code examples live. Follow along and write the code into your own Jupyter notebook.\n\n\n\nPractice\n\nYou will have the opportunity to try exercises on your own to apply what you’ve learned.\n\n\n\nQ&A\n\nWe will have a Q&A session at the end where you can ask specific questions about the code, concepts, or issues encountered during the session." }, { - "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#setup", - "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#setup", - "title": "Day 6: 🙌 Coding Colab", - "section": "Setup", - "text": "Setup\nLet’s start by importing necessary libraries and loading our datasets:\n\n\nCode\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\n\n# Load the temperature anomaly dataset\ntemp_url = \"https://bit.ly/monthly_temp\"\ntemp_df = pd.read_csv(temp_url, parse_dates=['Date'])\n\n# Load the CO2 concentration dataset\nco2_url = \"https://bit.ly/monthly_CO2\"\nco2_df = pd.read_csv(co2_url, parse_dates=['Date'])\n\nprint(\"Temperature data:\")\nprint(temp_df.head())\nprint(\"\\nCO2 data:\")\nprint(co2_df.head())\n\n\nTemperature data:\n Date MonthlyAnomaly\n0 1880-01-01 -0.20\n1 1880-02-01 -0.25\n2 1880-03-01 -0.09\n3 1880-04-01 -0.16\n4 1880-05-01 -0.09\n\nCO2 data:\n Date CO2Concentration\n0 1958-04-01 317.45\n1 1958-05-01 317.51\n2 1958-06-01 317.27\n3 1958-07-01 315.87\n4 1958-08-01 314.93" + "objectID": "course-materials/live-coding/3a_control_flows.html#after-the-session", + "href": "course-materials/live-coding/3a_control_flows.html#after-the-session", + "title": "Live Coding Session 3A", + "section": "After the Session", + "text": "After the Session\n\nReview your notes and try to replicate the exercises on your own.\nExperiment with the code by modifying parameters or adding new features to deepen your understanding.\nCheck out our class flow control cheatsheet." }, { - "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-1-data-preparation", - "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-1-data-preparation", - "title": "Day 6: 🙌 Coding Colab", - "section": "Task 1: Data Preparation", - "text": "Task 1: Data Preparation\n\nSet the ‘Date’ column as the index for both dataframes.\nEnsure that there are no missing values in either dataset." + "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html", + "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html", + "title": "Interactive Session 2C", + "section": "", + "text": "Objective:\nThis session aims to help you understand how to interpret error messages in Python. By generating errors in a controlled environment, you’ll learn how to read error reports, identify the source of the problem, and correct your code. This is an essential skill for debugging and improving your Python programming abilities.\nEstimated Time: 45-60 minutes" }, { - "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-2-joining-datasets", - "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-2-joining-datasets", - "title": "Day 6: 🙌 Coding Colab", - "section": "Task 2: Joining Datasets", - "text": "Task 2: Joining Datasets\n\nMerge the temperature and CO2 datasets based on their date index.\nHandle any missing values that may have been introduced by the merge.\nCreate some plots showing temperature anomalies and CO2 concentrations over time using pandas built-in plotting functions." + "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-1-introduction-to-python-errors", + "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-1-introduction-to-python-errors", + "title": "Interactive Session 2C", + "section": "Part 1: Introduction to Python Errors", + "text": "Part 1: Introduction to Python Errors\n\n1.1 Generating a Syntax Error\nIn Python, a syntax error occurs when the code you write doesn’t conform to the rules of the language.\n\nStep 1: Run the following code in a Jupyter notebook cell to generate a syntax error.\nprint(\"Hello World\nStep 2: Observe the error message. It should look something like this:\nFile \"<ipython-input-1>\", line 1\n print(\"Hello World\n ^\nSyntaxError: EOL while scanning string literal\nStep 3: Explanation: The error message indicates that the End Of Line (EOL) was reached while the string literal was still open. A string literal is what is created inside the open \" and close \". The caret (^) points to where Python expected the closing quote.\nStep 4: Fix the Error: Correct the code by adding the missing closing quotation mark.\nprint(\"Hello World\")" }, { - "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-3-time-series-analysis", - "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-3-time-series-analysis", - "title": "Day 6: 🙌 Coding Colab", - "section": "Task 3: Time Series Analysis", - "text": "Task 3: Time Series Analysis\n\nResample the data to annual averages.\nCalculate the year-over-year change in temperature anomalies and CO2 concentrations.\nCreate a scatter plot (use the plt.scatter() function) of annual temperature anomalies vs CO2 concentrations." + "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-2-name-errors-with-variables", + "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-2-name-errors-with-variables", + "title": "Interactive Session 2C", + "section": "Part 2: Name Errors with Variables", + "text": "Part 2: Name Errors with Variables\n\n2.1 Using an Undefined Variable\nA NameError occurs when you try to use a variable that hasn’t been defined.\n\nStep 1: Run the following code to generate a NameError.\nprint(variable)\nStep 2: Observe the error message.\nNameError: name 'variable' is not defined\nStep 3: Explanation: Python is telling you that the variable variable has not been defined. This means you are trying to use a variable that Python doesn’t recognize.\nStep 4: Fix the Error: Define the variable before using it.\nvariable = \"I'm now defined!\"\nprint(variable)\n\n\n\n\n\n\n\nCommon NameError patterns in Python\n\n\n\nA NameError often occurs when Python can’t find a variable or function you’re trying to use. This is usually because of:\n\nTypos in Function or Variable Names:\n\nIf you mistype a function or variable name, Python will raise a NameError because it doesn’t recognize the name.\nExample:\nprnt(\"Hello, World!\") # NameError: name 'prnt' is not defined\n\nFix: Correct the typo to print(\"Hello, World!\").\n\n\nUsing Literals as Variables:\n\nA NameError can also happen if you accidentally try to use a string or number as if it were a variable.\nExample:\n\"Hello\" = 5 # NameError: can't assign to literal\n\nFix: Make sure you’re using valid variable names and not trying to assign values to literals.\n\n\n\nRemember: Always double-check your spelling and ensure that you’re using variable names correctly!" }, { - "objectID": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-4-seasonal-analysis", - "href": "course-materials/coding-colabs/6b_advanced_data_manipulation.html#task-4-seasonal-analysis", - "title": "Day 6: 🙌 Coding Colab", - "section": "Task 4: Seasonal Analysis", - "text": "Task 4: Seasonal Analysis\n\nCreate a function to extract the season from a given date (hint: use the date.month attribute and if-elif-else to assign the season in your function).\nUse the function to create a new column called Season\nCalculate the average temperature anomaly and CO2 concentration for each season.\nCreate a box plot (use sns.boxplot) showing the distribution of temperature anomalies for each season." + "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-3-type-errors-with-functions", + "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-3-type-errors-with-functions", + "title": "Interactive Session 2C", + "section": "Part 3: Type Errors with Functions", + "text": "Part 3: Type Errors with Functions\n\n3.1 Passing Incorrect Data Types\nA TypeError occurs when an operation or function is applied to an object of an inappropriate type.\n\nStep 1: Run the following code to generate a TypeError.\nnumber = 5\nprint(number + \"10\")\nStep 2: Observe the error message.\nTypeError: unsupported operand type(s) for +: 'int' and 'str'\nStep 3: Explanation: The error indicates that you are trying to add an integer (int) and a string (str), which is not allowed in Python.\nStep 4: Fix the Error: Convert the string \"10\" to an integer or the integer number to a string.\nprint(number + 10) # Correct approach 1\n\n# or\n\nprint(str(number) + \"10\") # Correct approach 2" }, { - "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#getting-started", - "href": "course-materials/interactive-sessions/2b_dictionaries.html#getting-started", - "title": "Interactive Session 2B", - "section": "Getting Started", - "text": "Getting Started\nBefore we begin our interactive session, please follow these steps to set up your Jupyter Notebook:\n\nOpen JupyterLab and create a new notebook:\n\nClick on the + button in the top left corner\nSelect Python 3.10.0 from the Notebook options\n\nRename your notebook:\n\nRight-click on the Untitled.ipynb tab\nSelect “Rename”\nName your notebook with the format: Session_2B_Dictionaries.ipynb\n\nAdd a title cell:\n\nIn the first cell of your notebook, change the cell type to “Markdown”\nAdd the following content (replace the placeholders with the actual information):\n\n\n# Day 2: Session B - Dictionaries\n\n[Link to session webpage](https://eds-217-essential-python.github.io/course-materials/interactive-sessions/2b_dictionaries.html)\n\nDate: 09/04/2024\n\nAdd a code cell:\n\nBelow the title cell, add a new cell\nEnsure it’s set as a “Code” cell\nThis will be where you start writing your Python code for the session\n\nThroughout the session:\n\nTake notes in Markdown cells\nCopy or write code in Code cells\nRun cells to test your code\nAsk questions if you need clarification\n\n\n\n\n\n\n\n\nCaution\n\n\n\nRemember to save your work frequently by clicking the save icon or using the keyboard shortcut (Ctrl+S or Cmd+S).\n\n\nLet’s begin our interactive session!" + "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-4-index-errors-with-lists", + "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-4-index-errors-with-lists", + "title": "Interactive Session 2C", + "section": "Part 4: Index Errors with Lists", + "text": "Part 4: Index Errors with Lists\n\n4.1 Accessing an Invalid Index\nAn IndexError occurs when you try to access an index that is out of the range of a list.\n\nStep 1: Run the following code to generate an IndexError.\nmy_list = [1, 2, 3]\nprint(my_list[5])\nStep 2: Observe the error message.\nIndexError: list index out of range\nStep 3: Explanation: Python is telling you that the index 5 is out of range for the list my_list, which only has indices 0, 1, 2.\nStep 4: Fix the Error: Access a valid index or use dynamic methods to avoid hardcoding indices.\nprint(my_list[2]) # Last valid index\n\n# or\n\nprint(my_list[-1]) # Access the last element using negative indexing" }, { - "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#part-1-basic-concepts-with-species-lookup-table", - "href": "course-materials/interactive-sessions/2b_dictionaries.html#part-1-basic-concepts-with-species-lookup-table", - "title": "Interactive Session 2B", - "section": "Part 1: Basic Concepts with Species Lookup Table", - "text": "Part 1: Basic Concepts with Species Lookup Table\n\nIntroduction to Dictionaries\nDictionaries in Python are collections of key-value pairs that allow for efficient data storage and retrieval. Each key maps to a specific value, making dictionaries ideal for representing real-world data in a structured format.\nProbably the easiest mental model for thinking about structured data is a spreadsheet. You are all familiar with Excel spreadsheets, with their numbered rows and lettered columns. In the spreadsheet, data is often “structured” so that each row is an entry, and each column is perhaps a variable recorded for that entry.\n\n\n\nstructured-data" + "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-5-attribute-errors", + "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-5-attribute-errors", + "title": "Interactive Session 2C", + "section": "Part 5: Attribute Errors", + "text": "Part 5: Attribute Errors\n\n5.1 Using Attributes Incorrectly\nAn AttributeError occurs when you try to access an attribute or method that doesn’t exist on the object.\n\nStep 1: Run the following code to generate an AttributeError.\nmy_string = \"Hello\"\nmy_string.append(\" World\")\nStep 2: Observe the error message.\nAttributeError: 'str' object has no attribute 'append'\nStep 3: Explanation: Python is telling you that the str object (a string) does not have an append method, which is a method for lists.\nStep 4: Fix the Error: Use string concatenation instead of append.\nmy_string = \"Hello\"\nmy_string = my_string + \" World\"\nprint(my_string)" }, { - "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#instructions", - "href": "course-materials/interactive-sessions/2b_dictionaries.html#instructions", - "title": "Interactive Session 2B", - "section": "Instructions", - "text": "Instructions\nWe will work through this material together, writing a new notebook as we go.\n\n\n\n\n\n\nNote\n\n\n\n🐍     This symbol designates an important note about Python structure, syntax, or another quirk.\n\n\n\n✏️   This symbol designates code you should add to your notebook and run." + "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-6-tracing-errors-through-a-function-call-stack", + "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-6-tracing-errors-through-a-function-call-stack", + "title": "Interactive Session 2C", + "section": "Part 6: Tracing Errors Through a Function Call Stack", + "text": "Part 6: Tracing Errors Through a Function Call Stack\n\n6.1 Understanding a Complicated Error Stack Trace\nErrors can sometimes appear deep within a function call, triggered by code that was written earlier in your script. When this happens, understanding the stack trace (the sequence of function calls leading to the error) is crucial for identifying the root cause. In this part of the exercise, you’ll explore an example where an error in a plotting function arises from an earlier mistake in your code.\n\nStep 1: Run the following code, which attempts to plot a simple line graph using Matplotlib.\n\nimport matplotlib.pyplot as plt\n\ndef generate_plot(data):\n plt.plot(data)\n plt.show()\n\n# Step 2: Introduce an error\nmy_data = [1, 2, \"three\", 4, 5] # Mixing strings and integers in the list\n\n# Step 3: Call the function to generate the plot\ngenerate_plot(my_data)\n\nStep 2: Observe the error message.\n\nFile \"<ipython-input-1>\", line 5, in generate_plot\n plt.plot(data)\n...\nFile \"/path/to/matplotlib/lines.py\", line XYZ, in _xy_from_xy\n raise ValueError(\"some explanation about incompatible types\")\nValueError: could not convert string to float: 'three'\n\nStep 3: Explanation: This error occurs because the plot function in Matplotlib expects numerical data to plot. The error message points to a deeper issue in the lines.py file inside the Matplotlib library, but the actual problem originates from your my_data list, which includes a string (“three”) instead of a numeric value.\nStep 4: Trace the Error:\n\nThe error originates in the plt.plot(data) function call.\nMatplotlib’s internal functions (_xy_from_xy in this case) try to process the data but encounter an issue when they can’t convert the string “three” into a float.\n\nStep 5: Fix the Error: Correct the data by ensuring all elements are numeric.\nmy_data = [1, 2, 3, 4, 5] # Correcting the list to contain only integers\ngenerate_plot(my_data) # Now this will work without an error" }, { - "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#dictionaries", - "href": "course-materials/interactive-sessions/2b_dictionaries.html#dictionaries", - "title": "Interactive Session 2B", - "section": "Dictionaries", - "text": "Dictionaries\n\nTLDR: Dictionaries are a very common collection type that allows data to be organized using a key:value framework. Because of the similarity between key:value pairs and many data structures (e.g. “lookup tables”), you will see Dictionaries quite a bit when working in python\n\nThe first collection we will look at today is the dictionary, or dict. This is one of the most powerful data structures in python. It is a mutable, unordered collection, which means that it can be altered, but elements within the structure cannot be referenced by their position and they cannot be sorted.\nYou can create a dictionary using the {}, providing both a key and a value, which are separated by a :." + "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-7-tracing-errors-in-jupyter-notebooks", + "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#part-7-tracing-errors-in-jupyter-notebooks", + "title": "Interactive Session 2C", + "section": "Part 7: Tracing Errors in Jupyter Notebooks", + "text": "Part 7: Tracing Errors in Jupyter Notebooks\nWhen you run code in a Jupyter Notebook, the Python interpreter refers to the code in the notebook cells as it generates a stack trace when an error occurs. Here’s how Jupyter Notebooks handle this:\n\nHow Jupyter Notebooks Generate Stack Traces\n\nCell Execution:\n\nEach time you run a cell in a Jupyter Notebook, the code in that cell is executed by the Python interpreter. The code from each cell is treated as part of a sequential script, but each cell is an individual execution block.\n\nInput Label:\n\nJupyter assigns each cell an input label, such as In [1]:, In [2]:, etc. This label is used to identify the specific cell where the code was executed.\n\nStack Trace Generation:\n\nWhen an error occurs, Python generates a stack trace that shows the sequence of function calls leading to the error. In a Jupyter Notebook, this stack trace includes references to the notebook cells that were executed.\nThe stack trace will point to the line number within the cell and the input label, such as In [2], indicating where in your notebook the error originated.\n\nExample Stack Trace in Jupyter:\n\nSuppose you have the following code in a cell labeled In [2]:\ndef divide(x, y):\n return x / y\n\ndivide(10, 0)\nRunning this code will generate a ZeroDivisionError, and the stack trace might look like this:\n---------------------------------------------------------------------------\nZeroDivisionError Traceback (most recent call last)\n<ipython-input-2-d7d8f8a6c1c1> in <module>\n 2 return x / y\n 3 \n----> 4 divide(10, 0)\n 5 \n\n<ipython-input-2-d7d8f8a6c1c1> in divide(x, y)\n 1 def divide(x, y):\n----> 2 return x / y\n 3 \n 4 divide(10, 0)\nExplanation:\n\nThe Traceback (most recent call last) shows the series of calls leading to the error.\nThe <ipython-input-2-d7d8f8a6c1c1> refers to the code in cell In [2].\nThe stack trace pinpoints the exact line where the error occurred within that cell.\n\n\nMultiple Cell References:\n\nIf your code calls functions defined in different cells, the stack trace will show references to multiple cells. For example, if a function is defined in one cell and then called in another, the stack trace will include both cells in the sequence of calls.\n\nLimitations:\n\nThe stack trace in Jupyter Notebooks is specific to the cells that have been executed. If you modify a cell and re-run it, the new code is associated with that cell’s input label, and previous stack traces will not reflect those changes.\n\n\n\n\nSummary:\nIn Jupyter Notebooks, stack traces refer to the specific cells (In [X]) where the code was executed. The stack trace will show you the input label of the cell and the line number where the error occurred, helping you to quickly locate and fix issues in your notebook. Understanding how Jupyter references your code in stack traces is crucial for effective debugging.\n\n\nGeneral Summary of Stack Traces\n\nWhat to Look For: In complex stack traces, start by looking at the error message itself, which often appears at the bottom of the stack. Work your way backward through the stack to identify where in your code the problem originated.\nTracing Function Calls: Understand how data flows through your functions. An error in a deeply nested function may often be triggered by an incorrect input or state set earlier in the code." }, { - "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#creating-manipulating-dictionaries", - "href": "course-materials/interactive-sessions/2b_dictionaries.html#creating-manipulating-dictionaries", - "title": "Interactive Session 2B", - "section": "Creating & Manipulating Dictionaries", - "text": "Creating & Manipulating Dictionaries\nWe’ll start by creating a dictionary to store the common name of various species found in California’s coastal tidepools.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Define a dictionary with species data containing latin names and corresponding common names.\nspecies_dict = {\n \"P ochraceus\": \"Ochre sea star\",\n \"M californianus\": \"California mussel\",\n \"H rufescens\": \"Red abalone\"\n}\n\n\n\n\n\n\n\n\nNote\n\n\n\n🐍 <b>Note.</b> The use of whitespace and indentation is important in python. In the example above, the dictionary entries are indented relative to the brackets <code>{</code> and <code>}</code>. In addition, there is no space between the <code>'key'</code>, the <code>:</code>, and the <code>'value'</code> for each entry. Finally, notice that there is a <code>,</code> following each dictionary entry. This pattern is the same as all of the other <i>collection</i> data types we've seen so far, including <b>list</b>, <b>set</b>, and <b>tuple</b>.\n\n\n\nAccessing elements in a dictionary\nAccessing an element in a dictionary is easy if you know what you are looking for.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\nspecies_dict['M californianus']\n\n\n'California mussel'\n\n\n\n\nAdding a New Species\nBecause dictionaries are mutable, it is easy to add additional entries and doing so is straightforward. You specify the key and the value it maps to.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Adding a new entry for Leather star\nspecies_dict[\"D imbricata\"] = \"Leather star\"\n\n\n\n\nAccessing and Modifying Data\nAccessing data in a dictionary can be done directly by the key, and modifications are just as direct.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Accessing a species by its latin name\nprint(\"Common name for P ochraceus:\", species_dict[\"P ochraceus\"])\n\n\nCommon name for P ochraceus: Ochre sea star\n\n\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Updating the common name for Ochre Sea Star abalone\nspecies_dict[\"P ochraceus\"] = \"Purple Starfish\"\nprint(\"Updated data for Pisaster ochraceus:\", species_dict[\"P ochraceus\"])\n\n\nUpdated data for Pisaster ochraceus: Purple Starfish\n\n\n\n\nRemoving a Dictionary Element\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Removing \"P ochraceus\"\ndel species_dict[\"P ochraceus\"]\nprint(f\"Deleted data for Pisaster ochraceus, new dictionary: {species_dict}\")\n\n\nDeleted data for Pisaster ochraceus, new dictionary: {'M californianus': 'California mussel', 'H rufescens': 'Red abalone', 'D imbricata': 'Leather star'}" + "objectID": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#error-summary", + "href": "course-materials/interactive-sessions/2c_exceptions_and_errors.html#error-summary", + "title": "Interactive Session 2C", + "section": "Error Summary", + "text": "Error Summary\n\nAlways read the error message carefully; it usually points directly to the problem.\nSyntaxErrors - and, to a lesser extent, NameErrors are often due to small mistakes like typos, missing parentheses, or missing quotes.\nTypeErrors often occur when trying to perform operations on incompatible data types.\nAttributeErrors occur when you are trying to use a method that doesn’t exist for an object. These can also show up due to typos in your code that make the interpreter think you are trying to call a method.\nWhile every error type has a specific meaning, always check your code for typos when trying to debug an error. Many typos do not prevent the interpreter from running your code and the eventual error caused by a typo might be hard to interpret!\n\n\nBy the end of this session, you should feel more comfortable identifying and fixing common Python errors. This skill is critical for debugging and developing more complex programs in the future.\n\n\nEnd interactive session 2C" }, { - "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#accessing-dictionary-keys-and-values", - "href": "course-materials/interactive-sessions/2b_dictionaries.html#accessing-dictionary-keys-and-values", - "title": "Interactive Session 2B", - "section": "Accessing dictionary keys and values", - "text": "Accessing dictionary keys and values\nEvery dictionary has builtin methods to retrieve its keys and values. These functions are called, appropriately, keys() and values()\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Accessing the dictionary keys:\nlatin_names = species_dict.keys()\nprint(f\"Dictionary keys (latin names): {latin_names}\")\n\n\nDictionary keys (latin names): dict_keys(['M californianus', 'H rufescens', 'D imbricata'])\n\n\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Accessing the dictionary values\ncommon_names = species_dict.values()\nprint(f\"Dictionary values (common names): {common_names}\")\n\n\nDictionary values (common names): dict_values(['California mussel', 'Red abalone', 'Leather star'])\n\n\n\n\n\n\n\n\nNote\n\n\n\n🐍 Note. The keys() and values() functions return a dict_key object and dict_values object, respectively. Each of these objects contains a list of either the keys or values. You can force the result of the keys() or values() function into a list by wrapping either one in a list() command." + "objectID": "course-materials/live-coding/5a_selecting_and_filtering.html#overview", + "href": "course-materials/live-coding/5a_selecting_and_filtering.html#overview", + "title": "Live Coding Session 5A", + "section": "Overview", + "text": "Overview\nIn this session, we will be exploring how to select and filter data from DataFrames." }, { - "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#looping-through-dictionaries", - "href": "course-materials/interactive-sessions/2b_dictionaries.html#looping-through-dictionaries", - "title": "Interactive Session 2B", - "section": "Looping through Dictionaries ", - "text": "Looping through Dictionaries \nPython has an efficient way to loop through all the keys and values of a dictionary at the same time. The items() method returns a tuple containing a (key, value) for each element in a dictionary. In practice this means that we can loop through a dictionary in the following way:\n\n\nCode\nmy_dict = {'name': 'Homer Simpson',\n 'occupation': 'nuclear engineer',\n 'address': '742 Evergreen Terrace',\n 'city': 'Springfield',\n 'state': ' ? '\n }\n\nfor key, value in my_dict.items():\n print(f\"{key.capitalize()}: {value}.\")\n\n\nName: Homer Simpson.\nOccupation: nuclear engineer.\nAddress: 742 Evergreen Terrace.\nCity: Springfield.\nState: ? .\n\n\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\nAdd a new code cell and code to loop through the species_dict dictionary and print out a sentence providing the common name of each species (e.g. “The common name of M californianus” is…“)." + "objectID": "course-materials/live-coding/5a_selecting_and_filtering.html#objectives", + "href": "course-materials/live-coding/5a_selecting_and_filtering.html#objectives", + "title": "Live Coding Session 5A", + "section": "Objectives", + "text": "Objectives\n\nApply various indexing methods to select rows and columns in dataframes.\nUse boolean logic to filter data based on values\nDevelop the ability to troubleshoot and debug in a live setting." }, { - "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#accessing-un-assigned-elements-in-dictionaries", - "href": "course-materials/interactive-sessions/2b_dictionaries.html#accessing-un-assigned-elements-in-dictionaries", - "title": "Interactive Session 2B", - "section": "Accessing un-assigned elements in Dictionaries", - "text": "Accessing un-assigned elements in Dictionaries\nAttempting to retrieve an element of a dictionary that doesn’t exist is the same as requesting an index of a list that doesn’t exist - Python will raise an Exception. For example, if you attempt to retrieve the definition of a field that hasn’t been defined, then you get an error:\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\nspecies_dict[\"E dofleini\"]\nYou should get a KeyError exception:\nKeyError: ‘E dofleini’\nTo avoid getting an error when requesting an element from a dict, you can use the get() function. The get() function will return None if the element doesn’t exist:\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\nspecies_description = species_dict.get(\"E dofleini\")\nprint(\"Accessing non-existent latin name, E dofleini:\\n\", species_description)\n\n\nAccessing non-existent latin name, E dofleini:\n None\n\n\nYou can also provide an argument to python to return if the item isn’t found:\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\nspecies_description = species_dict.get(\"E dofleini\", \"Species not found in dictionary\")\nprint(\"Accessing non-existent latin name, E dofleini:\\n\", species_description)\n\n\nAccessing non-existent latin name, E dofleini:\n Species not found in dictionary" + "objectID": "course-materials/live-coding/5a_selecting_and_filtering.html#getting-started", + "href": "course-materials/live-coding/5a_selecting_and_filtering.html#getting-started", + "title": "Live Coding Session 5A", + "section": "Getting Started", + "text": "Getting Started\nWe will be using the data stored in the csv at this url:\n\nurl = 'https://bit.ly/eds217-studentdata'\n\nTo get the most out of this session, please follow these guidelines:\n\nPrepare Your Environment:\n\nMake sure JupyterLab is up and running on your machine.\nOpen a new Jupyter notebook where you can write your own code as we go along.\nMake sure to name the notebook something informative so you can refer back to it later.\n\nSetup Your Notebook:\nBefore we begin the live coding session, please set up your Jupyter notebook with the following structure. This will help you organize your notes and code as we progress through the lesson.\n\nCreate a new Jupyter notebook.\nIn the first cell, create a markdown cell with the title of our session:\n\n# Basic Pandas Selection and Filtering\n\nBelow that, create markdown cells for each of the following topics we’ll cover. Leave empty code cells between each markdown cell where you’ll write your code during the session:\n\n# Introduction to Pandas Selection and Filtering\n\n## 1. Setup\n\n[Empty Code Cell]\n\n## 2. Basic Selection\n\n[Empty Code Cell]\n\n## 3. Filtering Based on Column Values\n\n### 3a. Single Condition Filtering\n\n[Empty Code Cell]\n\n### 3b. Multiple Conditions with Logical Operators\n\n[Empty Code Cell]\n\n### 3c. Using the filter command\n\n[ Emptyh Code Cell]\n\n## 4. Combining Selection and Filtering\n\n[Empty Code Cell]\n\n## 5. Using .isin() for Multiple Values\n\n[Empty Code Cell]\n\n## 6. Filtering with String Methods\n\n[Empty Code Cell]\n\n## 7. Advanced Selection: .loc vs .iloc\n\n[Empty Code Cell]\n\n## Conclusion\nAs we progress through the live coding session, you’ll fill in the code cells with the examples we work on together.\nFeel free to add additional markdown cells for your own notes or observations throughout the session.\n\nBy setting up your notebook this way, you’ll have a clear structure to follow along with the lesson and easily reference specific topics later for review. Remember, you can always add more cells or modify the structure as needed during the session!\n\nParticipation:\n\nTry to code along with me during the session.\nFeel free to ask questions at any time. Remember, if you have a question, others probably do too!\n\nResources:\n\nI will be sharing snippets of code and notes. Make sure to take your own notes and save snippets in your notebook for future reference.\nCheck out our class data selection and filtering cheatsheet." }, { - "objectID": "course-materials/interactive-sessions/2b_dictionaries.html#summary-and-additional-resources", - "href": "course-materials/interactive-sessions/2b_dictionaries.html#summary-and-additional-resources", - "title": "Interactive Session 2B", - "section": "Summary and Additional Resources", - "text": "Summary and Additional Resources\nWe’ve explored the creation, modification, and application of dictionaries in Python, highlighting their utility in storing structured data. As you progress in Python, you’ll find dictionaries indispensable across various applications, from data analysis to machine learning.\nFor further study, consult the following resources: - Python’s Official Documentation on Dictionaries - Our class Dictionary Cheatsheet\n\n\nEnd interactive session 2B" + "objectID": "course-materials/cheatsheets/functions.html", + "href": "course-materials/cheatsheets/functions.html", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "In Python, a function is defined using the def keyword, followed by the function name and parentheses () that may include parameters.\n\n\nCode\ndef function_name(parameters):\n # Function body\n return result\n\n\n\n\n\n\n\nCode\ndef celsius_to_fahrenheit(celsius):\n \"\"\"Convert Celsius to Fahrenheit.\"\"\"\n fahrenheit = (celsius * 9/5) + 32\n return fahrenheit\n\n\n\n\n\nCall a function by using its name followed by parentheses, and pass arguments if the function requires them.\n\n\nCode\ntemperature_celsius = 25\ntemperature_fahrenheit = celsius_to_fahrenheit(temperature_celsius)\nprint(temperature_fahrenheit) # Output: 77.0\n\n\n77.0\n\n\n\n\n\n\n\n\n\n\nCode\ndef kilometers_to_miles(kilometers):\n \"\"\"Convert kilometers to miles.\"\"\"\n miles = kilometers * 0.621371\n return miles\n\n# Usage\ndistance_km = 10\ndistance_miles = kilometers_to_miles(distance_km)\nprint(distance_miles) # Output: 6.21371\n\n\n6.21371\n\n\n\n\n\ndef mps_to_kmph(mps):\n \"\"\"Convert meters per second to kilometers per hour.\"\"\"\n kmph = mps * 3.6\n return kmph\n\n# Usage\nspeed_mps = 5\nspeed_kmph = mps_to_kmph(speed_mps)\nprint(speed_kmph) # Output: 18.0\n\n\n\n\n\n\nYou can return multiple values from a function by using a tuple.\nimport statistics\n\ndef calculate_mean_std(data):\n \"\"\"Calculate mean and standard deviation of a dataset.\"\"\"\n mean = statistics.mean(data)\n std_dev = statistics.stdev(data)\n return mean, std_dev\n\n# Usage\ndata = [12, 15, 20, 22, 25]\nmean, std_dev = calculate_mean_std(data)\nprint(f\"Mean: {mean}, Standard Deviation: {std_dev}\")\n\n\n\n\nYou can set default values for parameters, making them optional when calling the function.\n\n\ndef convert_temperature(temp, from_unit='C', to_unit='F'):\n \"\"\"Convert temperature between Celsius and Fahrenheit.\"\"\"\n if from_unit == 'C' and to_unit == 'F':\n return (temp * 9/5) + 32\n elif from_unit == 'F' and to_unit == 'C':\n return (temp - 32) * 5/9\n else:\n return temp # No conversion needed\n\n# Usage\ntemp_in_fahrenheit = convert_temperature(25) # Defaults to C to F\ntemp_in_celsius = convert_temperature(77, from_unit='F', to_unit='C')\nprint(temp_in_fahrenheit) # Output: 77.0\nprint(temp_in_celsius) # Output: 25.0\n\n\n\n\nYou can call a function using keyword arguments to make it clearer which arguments are being set, especially useful when many parameters are involved.\n# Call using keyword arguments\ntemp = convert_temperature(temp=25, from_unit='C', to_unit='F')\n\n\n\nA higher-order function is a function that can take other functions as arguments or return them as results.\n\n\ndef apply_conversion(conversion_func, data):\n \"\"\"Apply a conversion function to a list of data.\"\"\"\n return [conversion_func(value) for value in data]\n\n# Convert a list of temperatures from Celsius to Fahrenheit\ntemperatures_celsius = [0, 20, 30, 40]\ntemperatures_fahrenheit = apply_conversion(celsius_to_fahrenheit, temperatures_celsius)\nprint(temperatures_fahrenheit) # Output: [32.0, 68.0, 86.0, 104.0]\n\n\n\n\n\n\nDegree days are a measure of heat accumulation used to predict plant and animal development rates.\ndef calculate_degree_days(daily_temps, base_temp=10):\n \"\"\"Calculate degree days for a series of daily temperatures.\"\"\"\n degree_days = 0\n for temp in daily_temps:\n if temp > base_temp:\n degree_days += temp - base_temp\n return degree_days\n\n# Usage\ndaily_temps = [12, 15, 10, 18, 20, 7]\ndegree_days = calculate_degree_days(daily_temps)\nprint(degree_days) # Output: 35\n\n\n\n\nFunctions encapsulate reusable code logic and can simplify complex operations.\nParameters allow for input variability, while return values provide output.\nUse default parameters and keyword arguments to enhance flexibility and readability.\nHigher-order functions enable more abstract and powerful code structures." }, { - "objectID": "course-materials/coding-colabs/7c_visualizations.html", - "href": "course-materials/coding-colabs/7c_visualizations.html", - "title": "Day 7: 🙌 Coding Colab", + "objectID": "course-materials/cheatsheets/functions.html#basics-of-functions", + "href": "course-materials/cheatsheets/functions.html#basics-of-functions", + "title": "EDS 217 Cheatsheet", "section": "", - "text": "In this collaborative coding exercise, you’ll work with a partner to explore a dataset using the seaborn library. You’ll focus on a workflow that includes:\n\nExploring distributions with histograms\nExamining correlations among variables\nInvestigating relationships more closely with regression plots and joint distribution plots\n\nWe’ll be using the Palmer Penguins dataset, which contains information about different penguin species, their physical characteristics, and the islands they inhabit." + "text": "In Python, a function is defined using the def keyword, followed by the function name and parentheses () that may include parameters.\n\n\nCode\ndef function_name(parameters):\n # Function body\n return result\n\n\n\n\n\n\n\nCode\ndef celsius_to_fahrenheit(celsius):\n \"\"\"Convert Celsius to Fahrenheit.\"\"\"\n fahrenheit = (celsius * 9/5) + 32\n return fahrenheit\n\n\n\n\n\nCall a function by using its name followed by parentheses, and pass arguments if the function requires them.\n\n\nCode\ntemperature_celsius = 25\ntemperature_fahrenheit = celsius_to_fahrenheit(temperature_celsius)\nprint(temperature_fahrenheit) # Output: 77.0\n\n\n77.0" }, { - "objectID": "course-materials/coding-colabs/7c_visualizations.html#introduction", - "href": "course-materials/coding-colabs/7c_visualizations.html#introduction", - "title": "Day 7: 🙌 Coding Colab", + "objectID": "course-materials/cheatsheets/functions.html#common-unit-conversions", + "href": "course-materials/cheatsheets/functions.html#common-unit-conversions", + "title": "EDS 217 Cheatsheet", "section": "", - "text": "In this collaborative coding exercise, you’ll work with a partner to explore a dataset using the seaborn library. You’ll focus on a workflow that includes:\n\nExploring distributions with histograms\nExamining correlations among variables\nInvestigating relationships more closely with regression plots and joint distribution plots\n\nWe’ll be using the Palmer Penguins dataset, which contains information about different penguin species, their physical characteristics, and the islands they inhabit." + "text": "Code\ndef kilometers_to_miles(kilometers):\n \"\"\"Convert kilometers to miles.\"\"\"\n miles = kilometers * 0.621371\n return miles\n\n# Usage\ndistance_km = 10\ndistance_miles = kilometers_to_miles(distance_km)\nprint(distance_miles) # Output: 6.21371\n\n\n6.21371\n\n\n\n\n\ndef mps_to_kmph(mps):\n \"\"\"Convert meters per second to kilometers per hour.\"\"\"\n kmph = mps * 3.6\n return kmph\n\n# Usage\nspeed_mps = 5\nspeed_kmph = mps_to_kmph(speed_mps)\nprint(speed_kmph) # Output: 18.0" }, { - "objectID": "course-materials/coding-colabs/7c_visualizations.html#setup", - "href": "course-materials/coding-colabs/7c_visualizations.html#setup", - "title": "Day 7: 🙌 Coding Colab", - "section": "Setup", - "text": "Setup\nFirst, let’s import the necessary libraries and load our dataset.\n\n\nCode\nimport pandas as pd\nimport seaborn as sns\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n# Set the style for better-looking plots\nsns.set_style(\"whitegrid\")\n\n# Load the Palmer Penguins dataset\npenguins = sns.load_dataset(\"penguins\")\n\n# Display the first few rows and basic information about the dataset\nprint(penguins.head())\nprint(penguins.info())\n\n\n species island bill_length_mm bill_depth_mm flipper_length_mm \\\n0 Adelie Torgersen 39.1 18.7 181.0 \n1 Adelie Torgersen 39.5 17.4 186.0 \n2 Adelie Torgersen 40.3 18.0 195.0 \n3 Adelie Torgersen NaN NaN NaN \n4 Adelie Torgersen 36.7 19.3 193.0 \n\n body_mass_g sex \n0 3750.0 Male \n1 3800.0 Female \n2 3250.0 Female \n3 NaN NaN \n4 3450.0 Female \n<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 344 entries, 0 to 343\nData columns (total 7 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 species 344 non-null object \n 1 island 344 non-null object \n 2 bill_length_mm 342 non-null float64\n 3 bill_depth_mm 342 non-null float64\n 4 flipper_length_mm 342 non-null float64\n 5 body_mass_g 342 non-null float64\n 6 sex 333 non-null object \ndtypes: float64(4), object(3)\nmemory usage: 18.9+ KB\nNone" + "objectID": "course-materials/cheatsheets/functions.html#handling-multiple-return-values", + "href": "course-materials/cheatsheets/functions.html#handling-multiple-return-values", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "You can return multiple values from a function by using a tuple.\nimport statistics\n\ndef calculate_mean_std(data):\n \"\"\"Calculate mean and standard deviation of a dataset.\"\"\"\n mean = statistics.mean(data)\n std_dev = statistics.stdev(data)\n return mean, std_dev\n\n# Usage\ndata = [12, 15, 20, 22, 25]\nmean, std_dev = calculate_mean_std(data)\nprint(f\"Mean: {mean}, Standard Deviation: {std_dev}\")" }, { - "objectID": "course-materials/coding-colabs/7c_visualizations.html#task-1-exploring-distributions-with-histograms", - "href": "course-materials/coding-colabs/7c_visualizations.html#task-1-exploring-distributions-with-histograms", - "title": "Day 7: 🙌 Coding Colab", - "section": "Task 1: Exploring Distributions with Histograms", - "text": "Task 1: Exploring Distributions with Histograms\nLet’s start by exploring the distributions of various numerical variables in our dataset using histograms.\n\nCreate histograms for ‘bill_length_mm’, ‘bill_depth_mm’, ‘flipper_length_mm’, and ‘body_mass_g’.\nExperiment with different numbers of bins to see how it affects the visualization.\nTry using sns.histplot() with the ‘kde’ parameter set to True to overlay a kernel density estimate." + "objectID": "course-materials/cheatsheets/functions.html#default-parameters", + "href": "course-materials/cheatsheets/functions.html#default-parameters", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "You can set default values for parameters, making them optional when calling the function.\n\n\ndef convert_temperature(temp, from_unit='C', to_unit='F'):\n \"\"\"Convert temperature between Celsius and Fahrenheit.\"\"\"\n if from_unit == 'C' and to_unit == 'F':\n return (temp * 9/5) + 32\n elif from_unit == 'F' and to_unit == 'C':\n return (temp - 32) * 5/9\n else:\n return temp # No conversion needed\n\n# Usage\ntemp_in_fahrenheit = convert_temperature(25) # Defaults to C to F\ntemp_in_celsius = convert_temperature(77, from_unit='F', to_unit='C')\nprint(temp_in_fahrenheit) # Output: 77.0\nprint(temp_in_celsius) # Output: 25.0" }, { - "objectID": "course-materials/coding-colabs/7c_visualizations.html#task-2-examining-correlations", - "href": "course-materials/coding-colabs/7c_visualizations.html#task-2-examining-correlations", - "title": "Day 7: 🙌 Coding Colab", - "section": "Task 2: Examining Correlations", - "text": "Task 2: Examining Correlations\nNow, let’s look at the correlations between the numerical variables in our dataset using Seaborn’s built-in correlation plot.\n\nUse sns.pairplot() to create a grid of scatter plots for all numeric variables.\nModify the pairplot to show the species information using different colors.\nInterpret the pairplot: which variables seem to be most strongly correlated? Do you notice any patterns related to species?" + "objectID": "course-materials/cheatsheets/functions.html#using-keyword-arguments", + "href": "course-materials/cheatsheets/functions.html#using-keyword-arguments", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "You can call a function using keyword arguments to make it clearer which arguments are being set, especially useful when many parameters are involved.\n# Call using keyword arguments\ntemp = convert_temperature(temp=25, from_unit='C', to_unit='F')" }, { - "objectID": "course-materials/coding-colabs/7c_visualizations.html#task-3-investigating-relationships-with-regression-plots", - "href": "course-materials/coding-colabs/7c_visualizations.html#task-3-investigating-relationships-with-regression-plots", - "title": "Day 7: 🙌 Coding Colab", - "section": "Task 3: Investigating Relationships with Regression Plots", - "text": "Task 3: Investigating Relationships with Regression Plots\nLet’s dig deeper into the relationships between variables using regression plots.\n\nCreate a regression plot (sns.regplot) showing the relationship between ‘flipper_length_mm’ and ‘body_mass_g’.\nCreate another regplot showing the relationship between ‘bill_length_mm’ and ‘bill_depth_mm’.\nTry adding the ‘species’ information to one of these plots using different colors. Hint: You might want to use sns.lmplot for this." + "objectID": "course-materials/cheatsheets/functions.html#higher-order-functions", + "href": "course-materials/cheatsheets/functions.html#higher-order-functions", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "A higher-order function is a function that can take other functions as arguments or return them as results.\n\n\ndef apply_conversion(conversion_func, data):\n \"\"\"Apply a conversion function to a list of data.\"\"\"\n return [conversion_func(value) for value in data]\n\n# Convert a list of temperatures from Celsius to Fahrenheit\ntemperatures_celsius = [0, 20, 30, 40]\ntemperatures_fahrenheit = apply_conversion(celsius_to_fahrenheit, temperatures_celsius)\nprint(temperatures_fahrenheit) # Output: [32.0, 68.0, 86.0, 104.0]" }, { - "objectID": "course-materials/coding-colabs/7c_visualizations.html#task-4-joint-distribution-plots", - "href": "course-materials/coding-colabs/7c_visualizations.html#task-4-joint-distribution-plots", - "title": "Day 7: 🙌 Coding Colab", - "section": "Task 4: Joint Distribution Plots", - "text": "Task 4: Joint Distribution Plots\nFinally, let’s use joint distribution plots to examine both the relationship between two variables and their individual distributions.\n\nCreate a joint plot for ‘flipper_length_mm’ and ‘body_mass_g’.\nExperiment with different kind parameters in the joint plot (e.g., ‘scatter’, ‘kde’, ‘hex’).\nCreate another joint plot, this time for ‘bill_length_mm’ and ‘bill_depth_mm’, colored by species." + "objectID": "course-materials/cheatsheets/functions.html#practical-example-climate-data-analysis", + "href": "course-materials/cheatsheets/functions.html#practical-example-climate-data-analysis", + "title": "EDS 217 Cheatsheet", + "section": "", + "text": "Degree days are a measure of heat accumulation used to predict plant and animal development rates.\ndef calculate_degree_days(daily_temps, base_temp=10):\n \"\"\"Calculate degree days for a series of daily temperatures.\"\"\"\n degree_days = 0\n for temp in daily_temps:\n if temp > base_temp:\n degree_days += temp - base_temp\n return degree_days\n\n# Usage\ndaily_temps = [12, 15, 10, 18, 20, 7]\ndegree_days = calculate_degree_days(daily_temps)\nprint(degree_days) # Output: 35\n\n\n\n\nFunctions encapsulate reusable code logic and can simplify complex operations.\nParameters allow for input variability, while return values provide output.\nUse default parameters and keyword arguments to enhance flexibility and readability.\nHigher-order functions enable more abstract and powerful code structures." }, { - "objectID": "course-materials/coding-colabs/7c_visualizations.html#bonus-challenge", - "href": "course-materials/coding-colabs/7c_visualizations.html#bonus-challenge", - "title": "Day 7: 🙌 Coding Colab", - "section": "Bonus Challenge", - "text": "Bonus Challenge\nIf you finish early, try this bonus challenge:\nCreate a correlation matrix heatmap using Seaborn’s sns.heatmap() function. This will provide a different view of the correlations between variables compared to the pairplot.\n\nCreate a correlation matrix using the numerical columns in the dataset.\n\n\n\n\n\n\n\nCreating correlation matricies in pandas\n\n\n\nPandas dataframes include two built-in methods that can be combined to quickly create a correlation matrix between all the numerical data in a dataframe.\n\n.select_dtypes() is a method that selects only the columns of a dataframe that match a type of data. Running the .select_dtypes(include=np.number) method on a dataframe will return a new dataframe that contains only the columns that have a numeric datatype.\n.corr() is a method that creates a correlation matrix between every column in a dataframe. For it to work, you need to make sure you only have numeric data in your dataframe, so chaining this method after the .select_dtypes() method will get you a complete correlation matrix in a single line of code!\n\n\n\n\nVisualize this correlation matrix using sns.heatmap().\nCustomize the heatmap by adding annotations and adjusting the colormap.\nCompare the insights from this heatmap with those from the pairplot. What additional information does each visualization provide?" + "objectID": "course-materials/live-coding/4d_data_import_export.html#overview", + "href": "course-materials/live-coding/4d_data_import_export.html#overview", + "title": "Live Coding Session 4D", + "section": "Overview", + "text": "Overview\nIn this session, we will be exploring data import using the read_csv() function in pandas. Live coding is a great way to learn programming as it allows you to see the process of writing code in real-time, including how to deal with unexpected issues and debug errors." }, { - "objectID": "course-materials/coding-colabs/7c_visualizations.html#conclusion", - "href": "course-materials/coding-colabs/7c_visualizations.html#conclusion", - "title": "Day 7: 🙌 Coding Colab", - "section": "Conclusion", - "text": "Conclusion\nYou’ve practiced using seaborn to explore a dataset through various visualization techniques. Often these visualizations can be very helpful at the start of a data exploration activity as they are fundamental to exploratory data analysis in Python. As such, they will be valuable as you continue to work with more complex datasets.\n\nEnd Coding Colab Session (Day 7)" + "objectID": "course-materials/live-coding/4d_data_import_export.html#objectives", + "href": "course-materials/live-coding/4d_data_import_export.html#objectives", + "title": "Live Coding Session 4D", + "section": "Objectives", + "text": "Objectives\n\nUnderstand the fundamentals of flow control in Python.\nUse read_csv() options to handle different .csv file structures.\nLearn how to parse dates and handle missing data during import.\nLearn how to filter columns and handle large files.\n\nDevelop the ability to troubleshoot and debug in a live setting." }, { - "objectID": "course-materials/lectures/seaborn.html#philosophy-of-seaborn", - "href": "course-materials/lectures/seaborn.html#philosophy-of-seaborn", - "title": "Introduction to Seaborn", - "section": "Philosophy of Seaborn", - "text": "Philosophy of Seaborn\nSeaborn aims to make visualization a central part of exploring and understanding data.\nIts dataset-oriented plotting functions operate on dataframes and arrays containing whole datasets.\nIt tries to automatically perform semantic mapping and statistical aggregation to produce informative plots." + "objectID": "course-materials/live-coding/4d_data_import_export.html#getting-started", + "href": "course-materials/live-coding/4d_data_import_export.html#getting-started", + "title": "Live Coding Session 4D", + "section": "Getting Started", + "text": "Getting Started\nTo get the most out of this session, please follow these guidelines:\nPrepare Your Environment: - Log into our server and start JupyterLab. - Open a new Jupyter notebook where you can write your own code as we go along. - Make sure to name the notebook something informative so you can refer back to it later.\n\nStep 1: Create a New Notebook\n\nOpen Jupyter Lab or Jupyter Notebook.\nCreate a new Python notebook.\nRename your notebook to pd_read_csv.ipynb.\n\n\n\nStep 2: Import Required Libraries\nIn the first cell of your notebook, import the necessary libraries:\n\nimport pandas as pd\nimport numpy as np\n\n\n\nStep 3: Set Up Data URLs\nTo ensure we’re all working with the same data, copy and paste the following URLs into a new code cell and run the cell (SHIFT-ENTER):\n\n# URLs for different CSV files we'll be using\nurl_basic = 'https://bit.ly/eds217-basic'\nurl_missing = 'https://bit.ly/eds217-missing'\nurl_dates = 'https://bit.ly/eds217-dates'\nurl_no_header = 'https://bit.ly/eds217-noheader'\nurl_tsv = 'https://bit.ly/eds217-tabs'\nurl_large = 'https://bit.ly/eds217-large'\n\n\n\nStep 4: Prepare Markdown Cells for Notes\nCreate several markdown cells throughout your notebook to take notes during the session. Here are some suggested headers:\n\nBasic Usage and Column Selection\nHandling Missing Data\nParsing Dates\nWorking with Files Without Headers\nWorking with Tab-Separated Values (TSV) Files\nHandling Large Files: Reading a Subset of Data\n\n\n\nStep 5: Create Code Cells for Each Topic\nUnder each markdown header, create empty code cells where you’ll write and execute code during the live session.\n\n\nStep 6: Final Preparations\n\nEnsure you have a stable internet connection to access the CSV files.\nHave the Pandas documentation page open in a separate tab for quick reference: https://pandas.pydata.org/docs/\n\n\n\nReady to Go!\nYou’re now set up and ready to follow along with the live coding session. Remember to actively code along and take notes in your markdown cells. Don’t hesitate to ask questions during the session!\nHappy coding!" }, { - "objectID": "course-materials/lectures/seaborn.html#main-ideas-in-seaborn", - "href": "course-materials/lectures/seaborn.html#main-ideas-in-seaborn", - "title": "Introduction to Seaborn", - "section": "Main Ideas in Seaborn", - "text": "Main Ideas in Seaborn\n\nIntegration with Pandas: Works well with Pandas data structures.\nBuilt-in Themes: Provides built-in themes for styling matplotlib graphics.\nColor Palettes: Offers a variety of color palettes to reveal patterns in the data.\nStatistical Estimation: Seaborn includes functions to fit and visualize linear regression models." + "objectID": "course-materials/live-coding/4d_data_import_export.html#session-format", + "href": "course-materials/live-coding/4d_data_import_export.html#session-format", + "title": "Live Coding Session 4D", + "section": "Session Format", + "text": "Session Format\n\nIntroduction\n\nBrief discussion about the topic and its importance in data science.\n\n\n\nDemonstration\n\nI will demonstrate code examples live. Follow along and write the code into your own Jupyter notebook.\n\n\n\nPractice\n\nYou will have the opportunity to try exercises on your own to apply what you’ve learned.\n\n\n\nQ&A\n\nWe will have a Q&A session at the end where you can ask specific questions about the code, concepts, or issues encountered during the session." }, { - "objectID": "course-materials/lectures/seaborn.html#major-features-of-seaborn", - "href": "course-materials/lectures/seaborn.html#major-features-of-seaborn", - "title": "Introduction to Seaborn", - "section": "Major Features of Seaborn", - "text": "Major Features of Seaborn\nSeaborn simplifies many aspects of creating complex visualizations in Python. Some of its major features include:\n\nFacetGrids and PairGrids: For plotting conditional relationships.\nFactorplot: For categorical variables.\nJointplot: For joint distributions.\nTime Series functionality: Through functions like tsplot." + "objectID": "course-materials/live-coding/4d_data_import_export.html#after-the-session", + "href": "course-materials/live-coding/4d_data_import_export.html#after-the-session", + "title": "Live Coding Session 4D", + "section": "After the Session", + "text": "After the Session\n\nReview your notes and try to replicate the exercises on your own.\nExperiment with the code by modifying parameters or adding new features to deepen your understanding.\nCheck out our class read_csv() cheatsheet." }, { - "objectID": "course-materials/lectures/seaborn.html#using-seaborn", - "href": "course-materials/lectures/seaborn.html#using-seaborn", - "title": "Introduction to Seaborn", - "section": "Using seaborn", - "text": "Using seaborn\nimport seaborn as sns\nWhy sns?" + "objectID": "course-materials/coding-colabs/4b_pandas_dataframes.html#introduction", + "href": "course-materials/coding-colabs/4b_pandas_dataframes.html#introduction", + "title": "Coding Colab", + "section": "Introduction", + "text": "Introduction\nIn this collaborative coding exercise, you’ll work with a partner to practice importing, cleaning, exploring, and analyzing DataFrames using pandas. You’ll be working with a dataset containing yearly visitor information about national parks in the United States.\nHelpful class CheatSheets:\nPandas DataFrames\nPandas PDF Cheat Sheet\nDataFrame Workflows" }, { - "objectID": "course-materials/lectures/seaborn.html#theme-options", - "href": "course-materials/lectures/seaborn.html#theme-options", - "title": "Introduction to Seaborn", - "section": "Theme Options", - "text": "Theme Options\n# Set the theme to whitegrid\nsns.set_theme(style=\"whitegrid\")\n\ndarkgrid: The default theme. Background is a dark gray grid (not to be confused with a solid gray).\nwhitegrid: Similar to darkgrid but with a lighter background. This theme is particularly useful for plots with dense data points." + "objectID": "course-materials/coding-colabs/4b_pandas_dataframes.html#setup", + "href": "course-materials/coding-colabs/4b_pandas_dataframes.html#setup", + "title": "Coding Colab", + "section": "Setup", + "text": "Setup\nFirst, let’s import the necessary libraries and load our dataset.\n\n\nCode\nimport pandas as pd\nimport numpy as np\n\n# Load the dataset\nurl = \"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-09-17/national_parks.csv\"\nparks_df = pd.read_csv(url)" }, { - "objectID": "course-materials/lectures/seaborn.html#themes-continued", - "href": "course-materials/lectures/seaborn.html#themes-continued", - "title": "Introduction to Seaborn", - "section": "Themes (continued)", - "text": "Themes (continued)\n\ndark: This theme provides a dark background without any grid lines. It’s suitable for presentations or where visuals are prioritized.\nwhite: Offers a clean, white background without grid lines. This is well in situations where the data and annotations need to stand out without any additional distraction.\nticks: This theme is similar to the white theme but adds ticks on the axes, which enhances the precision of interpreting the data." + "objectID": "course-materials/coding-colabs/4b_pandas_dataframes.html#task-1-data-exploration-and-cleaning", + "href": "course-materials/coding-colabs/4b_pandas_dataframes.html#task-1-data-exploration-and-cleaning", + "title": "Coding Colab", + "section": "Task 1: Data Exploration and Cleaning", + "text": "Task 1: Data Exploration and Cleaning\nWith your partner, explore the DataFrame and perform some initial cleaning. Create cells in your notebook that provide the following information\n\n\n\n\n\n\nTip\n\n\n\nUse print() statements and/or f-strings to create your output in a way that makes it easy to understand your results.\n\n\n\nHow many rows and columns does the DataFrame have?\nWhat are the column names?\nWhat data types are used in each column?\nAre there any missing values in the DataFrame?\nRemove the rows where year is Total (these are summary rows we don’t need for our analysis).\nConvert the year column to numeric type." }, { - "objectID": "course-materials/lectures/seaborn.html#getting-ready-to-seaborn", - "href": "course-materials/lectures/seaborn.html#getting-ready-to-seaborn", - "title": "Introduction to Seaborn", - "section": "Getting ready to Seaborn", - "text": "Getting ready to Seaborn\nImport the library and set a style\n\nimport seaborn as sns # (but now you know it should have been ssn 🤓)\nsns.set(style=\"darkgrid\") # This is the default, so skip it if wanted" + "objectID": "course-materials/coding-colabs/4b_pandas_dataframes.html#task-2-basic-filtering-and-analysis", + "href": "course-materials/coding-colabs/4b_pandas_dataframes.html#task-2-basic-filtering-and-analysis", + "title": "Coding Colab", + "section": "Task 2: Basic Filtering and Analysis", + "text": "Task 2: Basic Filtering and Analysis\nNow, let’s practice some basic filtering and analysis operations:\n\nCreate a new DataFrame containing only data for the years 2000-2015 and only data for National Parks (unit_type is National Park)\nFind the total number of visitors across all National Parks for each year from 2000-2015.\nCalculate the average yearly visitors for each National Park during the 2000-2015 period.\nIdentify the top 5 most visited National Parks (based on total visitors) during the 2000-2015 period." }, { - "objectID": "course-materials/lectures/seaborn.html#conclusion", - "href": "course-materials/lectures/seaborn.html#conclusion", - "title": "Introduction to Seaborn", - "section": "Conclusion", - "text": "Conclusion\nSeaborn is a versatile and powerful tool for statistical data visualization in Python. Whether you need to visualize the distribution of a dataset, the relationship between multiple variables, or the dependencies between categorical data, Seaborn has a plot type to make your analysis more intuitive and insightful." + "objectID": "course-materials/coding-colabs/4b_pandas_dataframes.html#task-3-thinking-in-dataframes", + "href": "course-materials/coding-colabs/4b_pandas_dataframes.html#task-3-thinking-in-dataframes", + "title": "Coding Colab", + "section": "Task 3: Thinking in DataFrames", + "text": "Task 3: Thinking in DataFrames\n\nIn 2016, a blog post from 538.com explored these data. Take a look at the graphics in the post that use our data and discuss with your partner what steps and functions you think would be necessary to filter, group, and aggregate your dataframe in order to make any of the plots. See if you can make “rough drafts” of any of them using the simple DataFrame.plot() function." }, { - "objectID": "course-materials/coding-colabs/3d_pandas_series.html", - "href": "course-materials/coding-colabs/3d_pandas_series.html", - "title": "Day 3: 🙌 Coding Colab", - "section": "", - "text": "In this collaborative coding exercise, we’ll explore Pandas Series, a fundamental data structure in the Pandas library. You’ll work together to create, manipulate, and analyze Series objects." + "objectID": "course-materials/coding-colabs/4b_pandas_dataframes.html#conclusion", + "href": "course-materials/coding-colabs/4b_pandas_dataframes.html#conclusion", + "title": "Coding Colab", + "section": "Conclusion", + "text": "Conclusion\nGreat job working through these exercises! You’ve practiced importing data, cleaning a dataset, exploring DataFrames, and performing various filtering and analysis operations using pandas. These skills are fundamental to data analysis in Python and will be valuable as you continue to work with more complex datasets." }, { - "objectID": "course-materials/coding-colabs/3d_pandas_series.html#introduction", - "href": "course-materials/coding-colabs/3d_pandas_series.html#introduction", - "title": "Day 3: 🙌 Coding Colab", + "objectID": "course-materials/interactive-sessions/2a_getting_help.html", + "href": "course-materials/interactive-sessions/2a_getting_help.html", + "title": "Interactive Session 2A", "section": "", - "text": "In this collaborative coding exercise, we’ll explore Pandas Series, a fundamental data structure in the Pandas library. You’ll work together to create, manipulate, and analyze Series objects." + "text": "Objective: Learn how to get help, work with variables, and explore methods available for different Python objects in Jupyter Notebooks.\nEstimated Time: 45-60 minutes" }, { - "objectID": "course-materials/coding-colabs/3d_pandas_series.html#resources", - "href": "course-materials/coding-colabs/3d_pandas_series.html#resources", - "title": "Day 3: 🙌 Coding Colab", - "section": "Resources", - "text": "Resources\nHere’s our course cheatsheet on pandas Series:\n\nPandas Series Cheatsheet\n\nFeel free to refer to this cheatsheet throughout the exercise if you need a quick reminder about syntax or functionality." + "objectID": "course-materials/interactive-sessions/2a_getting_help.html#getting-help-in-python", + "href": "course-materials/interactive-sessions/2a_getting_help.html#getting-help-in-python", + "title": "Interactive Session 2A", + "section": "1. Getting Help in Python", + "text": "1. Getting Help in Python\n\nUsing help()\n\nIn a Jupyter Notebook cell, type:\n\n\nCode\n #| echo: false\n\nhelp(len)\n\n\nHelp on built-in function len in module builtins:\n\nlen(obj, /)\n Return the number of items in a container.\n\n\n\nRun the cell to see detailed information about the len() function.\n\n\n\nTrying help() Yourself\n\nUse the help() function on other built-in functions like print or sum.\n\n\n\nUsing ? and ??\n\nType:\nRun the cell to see quick documentation.\nTry:\nThis gives more detailed information, including source code (if available)." }, { - "objectID": "course-materials/coding-colabs/3d_pandas_series.html#setup", - "href": "course-materials/coding-colabs/3d_pandas_series.html#setup", - "title": "Day 3: 🙌 Coding Colab", - "section": "Setup", - "text": "Setup\nFirst, let’s import the necessary libraries and create a sample Series.\n\nimport pandas as pd\nimport numpy as np\n\n# Create a sample Series\nfruits = pd.Series(['apple', 'banana', 'cherry', 'date', 'elderberry'], name='Fruits')\nprint(fruits)\n\n0 apple\n1 banana\n2 cherry\n3 date\n4 elderberry\nName: Fruits, dtype: object" + "objectID": "course-materials/interactive-sessions/2a_getting_help.html#working-with-variables", + "href": "course-materials/interactive-sessions/2a_getting_help.html#working-with-variables", + "title": "Interactive Session 2A", + "section": "2. Working with Variables", + "text": "2. Working with Variables\n\nCreating Variables\n\nIn a new cell, create a few variables:\nUse type() to check the data type of each variable:\n\n\nCode\ntype(a)\ntype(b)\ntype(c)\n\n\nstr\n\n\n\n\n\nExploring Variables\n\nExperiment with creating your own variables and checking their types.\nChange the values and data types and see what happens." }, { - "objectID": "course-materials/coding-colabs/3d_pandas_series.html#exercise-1-creating-a-series", - "href": "course-materials/coding-colabs/3d_pandas_series.html#exercise-1-creating-a-series", - "title": "Day 3: 🙌 Coding Colab", - "section": "Exercise 1: Creating a Series", - "text": "Exercise 1: Creating a Series\nWork together to create a Series representing the prices of the fruits in our fruits Series.\n\n# Your code here\n# Create a Series called 'prices' with the same index as 'fruits'\n# Use these prices: apple: $0.5, banana: $0.3, cherry: $1.0, date: $1.5, elderberry: $2.0" + "objectID": "course-materials/interactive-sessions/2a_getting_help.html#exploring-methods-available-for-objects", + "href": "course-materials/interactive-sessions/2a_getting_help.html#exploring-methods-available-for-objects", + "title": "Interactive Session 2A", + "section": "3. Exploring Methods Available for Objects", + "text": "3. Exploring Methods Available for Objects\n\nUsing dir()\n\nUse dir() to explore available methods for objects:\n\n\nCode\ndir(a)\ndir(b)\ndir(c)\n\n\n['__add__',\n '__class__',\n '__contains__',\n '__delattr__',\n '__dir__',\n '__doc__',\n '__eq__',\n '__format__',\n '__ge__',\n '__getattribute__',\n '__getitem__',\n '__getnewargs__',\n '__getstate__',\n '__gt__',\n '__hash__',\n '__init__',\n '__init_subclass__',\n '__iter__',\n '__le__',\n '__len__',\n '__lt__',\n '__mod__',\n '__mul__',\n '__ne__',\n '__new__',\n '__reduce__',\n '__reduce_ex__',\n '__repr__',\n '__rmod__',\n '__rmul__',\n '__setattr__',\n '__sizeof__',\n '__str__',\n '__subclasshook__',\n 'capitalize',\n 'casefold',\n 'center',\n 'count',\n 'encode',\n 'endswith',\n 'expandtabs',\n 'find',\n 'format',\n 'format_map',\n 'index',\n 'isalnum',\n 'isalpha',\n 'isascii',\n 'isdecimal',\n 'isdigit',\n 'isidentifier',\n 'islower',\n 'isnumeric',\n 'isprintable',\n 'isspace',\n 'istitle',\n 'isupper',\n 'join',\n 'ljust',\n 'lower',\n 'lstrip',\n 'maketrans',\n 'partition',\n 'removeprefix',\n 'removesuffix',\n 'replace',\n 'rfind',\n 'rindex',\n 'rjust',\n 'rpartition',\n 'rsplit',\n 'rstrip',\n 'split',\n 'splitlines',\n 'startswith',\n 'strip',\n 'swapcase',\n 'title',\n 'translate',\n 'upper',\n 'zfill']\n\n\n\n\n\nUsing help() with Methods\n\nPick a method from the list returned by dir() and use help() to learn more about it:\n\n\nCode\nhelp(c.upper)\n\n\nHelp on built-in function upper:\n\nupper() method of builtins.str instance\n Return a copy of the string converted to uppercase.\n\n\n\n\n\n\nExploring Methods\n\nTry calling a method on your variables:\n\n\n'HELLO, WORLD!'\n\n\n\n\n```\n\nEnd interactive session 2A" }, { - "objectID": "course-materials/coding-colabs/3d_pandas_series.html#exercise-2-series-operations", - "href": "course-materials/coding-colabs/3d_pandas_series.html#exercise-2-series-operations", - "title": "Day 3: 🙌 Coding Colab", - "section": "Exercise 2: Series Operations", - "text": "Exercise 2: Series Operations\nCollaborate to perform the following operations:\n\nCalculate the total price of all fruits.\nFind the most expensive fruit.\nApply a 10% discount to all fruits priced over $1.0.\n\n\n# Your code here\n# 1. Calculate the total price of all fruits\n# 2. Find the most expensive fruit\n# 3. Apply a 10% discount to all fruits priced over $1.0" + "objectID": "course-materials/coding-colabs/5c_cleaning_data.html", + "href": "course-materials/coding-colabs/5c_cleaning_data.html", + "title": "Day 5: 🙌 Coding Colab", + "section": "", + "text": "In this collaborative coding exercise, you will work together and apply your new data cleaning skills to a simple dataframe that has a suprising number of problems." }, { - "objectID": "course-materials/coding-colabs/3d_pandas_series.html#exercise-3-series-analysis", - "href": "course-materials/coding-colabs/3d_pandas_series.html#exercise-3-series-analysis", - "title": "Day 3: 🙌 Coding Colab", - "section": "Exercise 3: Series Analysis", - "text": "Exercise 3: Series Analysis\nWork as a team to answer the following questions:\n\nWhat is the average price of the fruits?\nHow many fruits cost less than $1.0?\nWhat is the price range (difference between max and min prices)?\n\n\n# Your code here\n# 1. Calculate the average price of the fruits\n# 2. Count how many fruits cost less than $1.0\n# 3. Calculate the price range (difference between max and min prices)" + "objectID": "course-materials/coding-colabs/5c_cleaning_data.html#introduction", + "href": "course-materials/coding-colabs/5c_cleaning_data.html#introduction", + "title": "Day 5: 🙌 Coding Colab", + "section": "", + "text": "In this collaborative coding exercise, you will work together and apply your new data cleaning skills to a simple dataframe that has a suprising number of problems." }, { - "objectID": "course-materials/coding-colabs/3d_pandas_series.html#exercise-4-series-manipulation", - "href": "course-materials/coding-colabs/3d_pandas_series.html#exercise-4-series-manipulation", - "title": "Day 3: 🙌 Coding Colab", - "section": "Exercise 4: Series Manipulation", - "text": "Exercise 4: Series Manipulation\nCollaborate to perform these manipulations on the fruits and prices Series:\n\nAdd a new fruit ‘fig’ with a price of $1.2 to both Series using pd.concat\nRemove ‘banana’ from both Series.\nSort both Series by fruit name (alphabetically).\n\n\n# Your code here\n# 1. Add 'fig' to both Series (price: $1.2)\n# 2. Remove 'banana' from both Series\n# 3. Sort both Series alphabetically by fruit name" + "objectID": "course-materials/coding-colabs/5c_cleaning_data.html#resources", + "href": "course-materials/coding-colabs/5c_cleaning_data.html#resources", + "title": "Day 5: 🙌 Coding Colab", + "section": "Resources", + "text": "Resources\nHere’s our course cheatsheet on cleaning data:\n\nPandas Cleaning Cheatsheet\n\nFeel free to refer to this cheatsheet throughout the exercise if you need a quick reminder about syntax or functionality." }, { - "objectID": "course-materials/coding-colabs/3d_pandas_series.html#conclusion", - "href": "course-materials/coding-colabs/3d_pandas_series.html#conclusion", - "title": "Day 3: 🙌 Coding Colab", - "section": "Conclusion", - "text": "Conclusion\nIn this collaborative exercise, you’ve practiced creating, manipulating, and analyzing Pandas Series. You’ve learned how to perform basic operations, apply conditions, and modify Series objects. These skills will be valuable as you work with more complex datasets in the future." + "objectID": "course-materials/coding-colabs/5c_cleaning_data.html#setup", + "href": "course-materials/coding-colabs/5c_cleaning_data.html#setup", + "title": "Day 5: 🙌 Coding Colab", + "section": "Setup", + "text": "Setup\nFirst, let’s import the necessary libraries and load an example messy dataframe.\n\nimport pandas as pd\nimport numpy as np\n\nurl = 'https://bit.ly/messy_csv'\nmessy_df = pd.read_csv(url)" }, { - "objectID": "course-materials/coding-colabs/3d_pandas_series.html#discussion-questions", - "href": "course-materials/coding-colabs/3d_pandas_series.html#discussion-questions", - "title": "Day 3: 🙌 Coding Colab", - "section": "Discussion Questions", - "text": "Discussion Questions\n\nWhat advantages does using a Pandas Series offer compared to using a Python list or dictionary?\nCan you think of a real-world scenario where you might use a Pandas Series instead of a DataFrame?\nWhat challenges did you face while working with Series in this exercise, and how did you overcome them?\n\nDiscuss these questions with your team and share your insights.\n\nEnd Coding Colab Session (Day 4)" + "objectID": "course-materials/coding-colabs/5c_cleaning_data.html#practical-exercise-cleaning-a-messy-environmental-dataset", + "href": "course-materials/coding-colabs/5c_cleaning_data.html#practical-exercise-cleaning-a-messy-environmental-dataset", + "title": "Day 5: 🙌 Coding Colab", + "section": "Practical Exercise: Cleaning a Messy Environmental Dataset", + "text": "Practical Exercise: Cleaning a Messy Environmental Dataset\nLet’s apply what we’ve learned so far to clean the messy environmental dataset.\nYour task is to clean this dataframe by\n\nRemoving duplicates\nHandling missing values (either fill or dropna to remove rows with missing data)\nEnsuring consistent data types (dates, strings)\nFormatting the ‘site’ column for consistency\nMaking sure all column names are lower case, without whitespace.\n\nTry to implement these steps using the techniques we’ve learned.\n\nEnd Coding Colab Session (Day 4)" }, { - "objectID": "course-materials/cheatsheets/comprehensions.html", - "href": "course-materials/cheatsheets/comprehensions.html", + "objectID": "course-materials/cheatsheets/sets.html", + "href": "course-materials/cheatsheets/sets.html", "title": "EDS 217 Cheatsheet", "section": "", - "text": "This cheatsheet provides a quick reference for using comprehensions in Python, including list comprehensions, dictionary comprehensions, and how to incorporate conditional logic. Use this as a guide during your master’s program to write more concise and readable code." - }, - { - "objectID": "course-materials/cheatsheets/comprehensions.html#list-comprehensions", - "href": "course-materials/cheatsheets/comprehensions.html#list-comprehensions", - "title": "EDS 217 Cheatsheet", - "section": "List Comprehensions", - "text": "List Comprehensions\n\nBasic Syntax\nA list comprehension provides a concise way to create lists. The basic syntax is:\n\n\nCode\n# [expression for item in iterable]\nsquares = [i ** 2 for i in range(1, 6)]\nprint(squares)\n\n\n[1, 4, 9, 16, 25]\n\n\n\n\nWith Conditional Logic\nYou can add a condition to include only certain items in the new list:\n\n\nCode\n# [expression for item in iterable if condition]\neven_squares = [i ** 2 for i in range(1, 6) if i % 2 == 0]\nprint(even_squares)\n\n\n[4, 16]\n\n\n\n\nNested List Comprehensions\nList comprehensions can be nested to handle more complex data structures:\n\n\nCode\n# [(expression1, expression2) for item1 in iterable1 for item2 in iterable2]\npairs = [(i, j) for i in range(1, 4) for j in range(1, 3)]\nprint(pairs)\n\n\n[(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)]\n\n\n\n\nEvaluating Functions in a List Comprehension\nYou can use list comprehensions to apply a function to each item in an iterable:\n\n\nCode\n# Function to evaluate\ndef square(x):\n return x ** 2\n\n# List comprehension applying the function\nsquares = [square(i) for i in range(1, 6)]\nprint(squares)\n\n\n[1, 4, 9, 16, 25]" + "text": "Code\n# Empty set\nempty_set = set()\nprint(f\"Empty set: {empty_set}\")\n\n# Set from a list\nset_from_list = set([1, 2, 3, 4, 5])\nprint(f\"Set from list: {set_from_list}\")\n\n# Set literal\nset_literal = {1, 2, 3, 4, 5}\nprint(f\"Set literal: {set_literal}\")\n\n\nEmpty set: set()\nSet from list: {1, 2, 3, 4, 5}\nSet literal: {1, 2, 3, 4, 5}" }, { - "objectID": "course-materials/cheatsheets/comprehensions.html#dictionary-comprehensions", - "href": "course-materials/cheatsheets/comprehensions.html#dictionary-comprehensions", + "objectID": "course-materials/cheatsheets/sets.html#creating-sets", + "href": "course-materials/cheatsheets/sets.html#creating-sets", "title": "EDS 217 Cheatsheet", - "section": "Dictionary Comprehensions", - "text": "Dictionary Comprehensions\n\nBasic Syntax\nDictionary comprehensions provide a concise way to create dictionaries. The basic syntax is:\n\n\nCode\n# {key_expression: value_expression for item in iterable}\n# Example: Mapping fruit names to their lengths\nfruits = ['apple', 'banana', 'cherry']\nfruit_lengths = {fruit: len(fruit) for fruit in fruits}\nprint(fruit_lengths)\n\n\n{'apple': 5, 'banana': 6, 'cherry': 6}\n\n\n\n\nWithout zip\nYou can create a dictionary without using zip by leveraging the index:\n\n\nCode\n# {key_expression: value_expression for index in range(len(list))}\n# Example: Mapping employee IDs to names\nemployee_ids = [101, 102, 103]\nemployee_names = ['Alice', 'Bob', 'Charlie']\nid_to_name = {employee_ids[i]: employee_names[i] for i in range(len(employee_ids))}\nprint(id_to_name)\n\n\n{101: 'Alice', 102: 'Bob', 103: 'Charlie'}\n\n\n\n\nWith Conditional Logic\nYou can include conditions to filter out key-value pairs:\n\n\nCode\n# {key_expression: value_expression for item in iterable if condition}\n# Example: Filtering students who passed\nstudents = ['Alice', 'Bob', 'Charlie']\nscores = [85, 62, 90]\npassing_students = {students[i]: scores[i] for i in range(len(students)) if scores[i] >= 70}\nprint(passing_students)\n\n\n{'Alice': 85, 'Charlie': 90}\n\n\n\n\nEvaluating Functions in a Dictionary Comprehension\nYou can use dictionary comprehensions to apply a function to values in an iterable:\n\n\nCode\n# Function to evaluate\ndef capitalize_name(name):\n return name.upper()\n\n# Example: Mapping student names to capitalized names\nstudents = ['alice', 'bob', 'charlie']\ncapitalized_names = {name: capitalize_name(name) for name in students}\nprint(capitalized_names)\n\n\n{'alice': 'ALICE', 'bob': 'BOB', 'charlie': 'CHARLIE'}" + "section": "", + "text": "Code\n# Empty set\nempty_set = set()\nprint(f\"Empty set: {empty_set}\")\n\n# Set from a list\nset_from_list = set([1, 2, 3, 4, 5])\nprint(f\"Set from list: {set_from_list}\")\n\n# Set literal\nset_literal = {1, 2, 3, 4, 5}\nprint(f\"Set literal: {set_literal}\")\n\n\nEmpty set: set()\nSet from list: {1, 2, 3, 4, 5}\nSet literal: {1, 2, 3, 4, 5}" }, { - "objectID": "course-materials/cheatsheets/comprehensions.html#best-practices-for-using-comprehensions", - "href": "course-materials/cheatsheets/comprehensions.html#best-practices-for-using-comprehensions", + "objectID": "course-materials/cheatsheets/sets.html#basic-operations", + "href": "course-materials/cheatsheets/sets.html#basic-operations", "title": "EDS 217 Cheatsheet", - "section": "Best Practices for Using Comprehensions", - "text": "Best Practices for Using Comprehensions\n\nKeep It Simple: Use comprehensions for simple transformations and filtering. For complex logic, consider using traditional loops for better readability.\nNested Comprehensions: While powerful, nested comprehensions can be hard to read. Use them sparingly and consider breaking down the logic into multiple steps if needed.\nReadability: Always prioritize code readability. If a comprehension is difficult to understand, it might be better to use a loop." + "section": "Basic Operations", + "text": "Basic Operations\n\n\nCode\ns = {1, 2, 3, 4, 5}\nprint(f\"Initial set: {s}\")\n\n# Add an element\ns.add(6)\nprint(f\"After adding 6: {s}\")\n\n# Remove an element\ns.remove(3) # Raises KeyError if not found\nprint(f\"After removing 3: {s}\")\n\ns.discard(10) # Doesn't raise error if not found\nprint(f\"After discarding 10 (not in set): {s}\")\n\n# Pop a random element\npopped = s.pop()\nprint(f\"Popped element: {popped}\")\nprint(f\"Set after pop: {s}\")\n\n# Check membership\nprint(f\"Is 2 in the set? {2 in s}\")\n\n# Clear the set\ns.clear()\nprint(f\"Set after clear: {s}\")\n\n\nInitial set: {1, 2, 3, 4, 5}\nAfter adding 6: {1, 2, 3, 4, 5, 6}\nAfter removing 3: {1, 2, 4, 5, 6}\nAfter discarding 10 (not in set): {1, 2, 4, 5, 6}\nPopped element: 1\nSet after pop: {2, 4, 5, 6}\nIs 2 in the set? True\nSet after clear: set()" }, { - "objectID": "course-materials/cheatsheets/comprehensions.html#additional-resources", - "href": "course-materials/cheatsheets/comprehensions.html#additional-resources", + "objectID": "course-materials/cheatsheets/sets.html#set-methods", + "href": "course-materials/cheatsheets/sets.html#set-methods", "title": "EDS 217 Cheatsheet", - "section": "Additional Resources", - "text": "Additional Resources\n\nOfficial Python Documentation: List Comprehensions\nPython Dictionary Comprehensions: Dictionary Comprehensions\nPEP 202: PEP 202 - List Comprehensions" + "section": "Set Methods", + "text": "Set Methods\n\n\nCode\na = {1, 2, 3}\nb = {3, 4, 5}\nprint(f\"Set a: {a}\")\nprint(f\"Set b: {b}\")\n\n\nSet a: {1, 2, 3}\nSet b: {3, 4, 5}\n\n\n\nUnion\n\n\nCode\nunion_set = a.union(b)\nprint(f\"Union of a and b: {union_set}\")\n\n\nUnion of a and b: {1, 2, 3, 4, 5}\n\n\n\n\nIntersection\n\n\nCode\nintersection_set = a.intersection(b)\nprint(f\"Intersection of a and b: {intersection_set}\")\n\n\nIntersection of a and b: {3}\n\n\n\n\nDifference\n\n\nCode\ndifference_set = a.difference(b)\nprint(f\"Difference of a and b: {difference_set}\")\n\n\nDifference of a and b: {1, 2}\n\n\n\n\nSymmetric difference\n\n\nCode\nsymmetric_difference_set = a.symmetric_difference(b)\nprint(f\"Symmetric difference of a and b: {symmetric_difference_set}\")\n\n\nSymmetric difference of a and b: {1, 2, 4, 5}\n\n\n\n\nSubset and superset\n\n\nCode\nis_subset = a.issubset(b)\nis_superset = a.issuperset(b)\nprint(f\"Is a a subset of b? {is_subset}\")\nprint(f\"Is a a superset of b? {is_superset}\")\n\n\nIs a a subset of b? False\nIs a a superset of b? False" }, { "objectID": "course-materials/cheatsheets/control_flows.html", @@ -3287,7 +3385,7 @@ "href": "course-materials/eod-practice/eod-day2.html#part-3-fun-with-random-selections", "title": "Day 2: Tasks & Activities", "section": "Part 3: Fun with Random Selections", - "text": "Part 3: Fun with Random Selections\nLet’s add a fun element to our exercise using the random module. Before we dive into the main task, let’s look at how we can use the random library to select random items from a dictionary.\n\nExample: Random Selection from a Dictionary\nHere’s a simple example of how to select random items from a dictionary:\n\n\nCode\nimport random\n\n# Sample dictionary\nfruit_colors = {\n \"apple\": \"red\",\n \"banana\": \"yellow\",\n \"grape\": \"purple\",\n \"kiwi\": \"brown\",\n \"orange\": \"orange\"\n}\n\n# Select a single random key-value pair\nrandom_fruit, random_color = random.choice(list(fruit_colors.items()))\nprint(f\"Randomly selected fruit: {random_fruit}\")\nprint(f\"Its color: {random_color}\")\n\n# To get just a random key:\nrandom_fruit = random.choice(list(fruit_colors.keys()))\nprint(f\"Another randomly selected fruit: {random_fruit}\")\n\n# To select multiple random items:\nnum_selections = 3\nrandom_fruits = random.sample(list(fruit_colors.keys()), num_selections)\nprint(f\"Randomly selected {num_selections} fruits: {random_fruits}\")\n\n\nRandomly selected fruit: apple\nIts color: red\nAnother randomly selected fruit: banana\nRandomly selected 3 fruits: ['kiwi', 'banana', 'orange']\n\n\nThis example demonstrates how to:\n\nConvert a dictionary to a list of key-value pairs or keys\nUse random.choice() to select a single random item from a list\nUse random.sample() to select multiple unique random items from a list\n\nNote: random.choice() selects a single item, while random.sample() can select multiple unique items. For our snack-sharing task below, random.sample() might be more useful!\n\n\nTask 5: Random Snack Sharing\nCreate a function that randomly selects a classmate to share their snack with another random classmate. Print out the results as “Name will share [snack] with Name”.\n#| echo: true\ndef assign_random_snacks(classmate_info):\n # Your code here\n print(f\"{sharer} will share {snack} with {receiver}\")\n\n# Test your function\nassign_random_snacks(classmate_info)" + "text": "Part 3: Fun with Random Selections\nLet’s add a fun element to our exercise using the random module. Before we dive into the main task, let’s look at how we can use the random library to select random items from a dictionary.\n\nExample: Random Selection from a Dictionary\nHere’s a simple example of how to select random items from a dictionary:\n\n\nCode\nimport random\n\n# Sample dictionary\nfruit_colors = {\n \"apple\": \"red\",\n \"banana\": \"yellow\",\n \"grape\": \"purple\",\n \"kiwi\": \"brown\",\n \"orange\": \"orange\"\n}\n\n# Select a single random key-value pair\nrandom_fruit, random_color = random.choice(list(fruit_colors.items()))\nprint(f\"Randomly selected fruit: {random_fruit}\")\nprint(f\"Its color: {random_color}\")\n\n# To get just a random key:\nrandom_fruit = random.choice(list(fruit_colors.keys()))\nprint(f\"Another randomly selected fruit: {random_fruit}\")\n\n# To select multiple random items:\nnum_selections = 3\nrandom_fruits = random.sample(list(fruit_colors.keys()), num_selections)\nprint(f\"Randomly selected {num_selections} fruits: {random_fruits}\")\n\n\nRandomly selected fruit: grape\nIts color: purple\nAnother randomly selected fruit: apple\nRandomly selected 3 fruits: ['kiwi', 'grape', 'apple']\n\n\nThis example demonstrates how to:\n\nConvert a dictionary to a list of key-value pairs or keys\nUse random.choice() to select a single random item from a list\nUse random.sample() to select multiple unique random items from a list\n\nNote: random.choice() selects a single item, while random.sample() can select multiple unique items. For our snack-sharing task below, random.sample() might be more useful!\n\n\nTask 5: Random Snack Sharing\nCreate a function that randomly selects a classmate to share their snack with another random classmate. Print out the results as “Name will share [snack] with Name”.\n#| echo: true\ndef assign_random_snacks(classmate_info):\n # Your code here\n print(f\"{sharer} will share {snack} with {receiver}\")\n\n# Test your function\nassign_random_snacks(classmate_info)" }, { "objectID": "course-materials/eod-practice/eod-day2.html#conclusion", @@ -3980,7 +4078,7 @@ "href": "course-materials/interactive-sessions/6a_grouping_joining_sorting_1_old.html#multi-level-grouping", "title": "Interactive Session 6A", "section": "Multi-level Grouping", - "text": "Multi-level Grouping\nWe can group by multiple columns to create a hierarchical index.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Add a 'year' column\ndf['year'] = df['date'].dt.year\n\n# Group by location and year\nmulti_grouped = df.groupby(['location', 'year'])\n\nprint(multi_grouped.mean())\n\n\n date temperature humidity\nlocation year \nA 2023 2023-01-03 20.0 50.333333\nB 2023 2023-01-04 23.0 46.666667\n\n\n\nUnderstanding Groupby Objects\nAfter using the groupby() function, it’s important to understand what kind of object we’re working with and how it differs from a regular DataFrame. Let’s explore this in more detail.\n\nThe Groupby Object\nWhen you apply the groupby() function to a DataFrame, the result is a DataFrameGroupBy object. This object is not a DataFrame itself, but rather a special object that contains information about the groups.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Create a groupby object\ngrouped = df.groupby('location')\n\n# Check the type of the grouped object\nprint(type(grouped))\n\n# Try to view the grouped object\nprint(grouped)\n\n\n<class 'pandas.core.groupby.generic.DataFrameGroupBy'>\n<pandas.core.groupby.generic.DataFrameGroupBy object at 0x10d825310>\n\n\nAs you can see, simply printing the groupby object doesn’t show us the data. Instead, it gives us information about the groupby operation.\n\n\n\nAccessing Group Data\nTo actually see the data in each group, we need to iterate over the groups or use aggregation functions.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Iterate over groups\nfor name, group in grouped:\n print(f\"Group: {name}\")\n print(group)\n print()\n\n# Using an aggregation function\nprint(grouped.mean())\n\n\nGroup: A\n location date temperature humidity year\n0 A 2023-01-01 20 50 2023\n2 A 2023-01-03 19 52 2023\n4 A 2023-01-05 21 49 2023\n\nGroup: B\n location date temperature humidity year\n1 B 2023-01-02 22 48 2023\n3 B 2023-01-04 24 45 2023\n5 B 2023-01-06 23 47 2023\n\n date temperature humidity year\nlocation \nA 2023-01-03 20.0 50.333333 2023.0\nB 2023-01-04 23.0 46.666667 2023.0\n\n\n\n\nKey Differences from DataFrames\n\nStructure: A groupby object is not a table-like structure like a DataFrame. It’s more like a container of groups.\nDirect Access: You can’t access columns or rows of a groupby object directly like you can with a DataFrame.\nLazy Evaluation: Groupby operations are lazy - they don’t actually compute anything until you call an aggregation function or iterate over the groups.\nAggregation Required: To get meaningful results from a groupby object, you typically need to apply some kind of aggregation function (like mean(), sum(), count(), etc.).\n\n\n\nConverting Groupby Results to DataFrame\nAfter applying an aggregation function to a groupby object, the result is typically a DataFrame:\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Result of aggregation is a DataFrame\nresult = grouped.mean()\nprint(type(result))\n\n# We can now use DataFrame methods on this result\nprint(result.reset_index())\n\n\n<class 'pandas.core.frame.DataFrame'>\n location date temperature humidity year\n0 A 2023-01-03 20.0 50.333333 2023.0\n1 B 2023-01-04 23.0 46.666667 2023.0\n\n\n\nPractice\nTry grouping the data by both ‘location’ and ‘year’, then calculate the maximum temperature for each group. What type of object do you get? How can you reset the index to make it a regular DataFrame?\n\n✏️ Try it. Add the cell below to your notebook and then provide your code.\n\n\n\nCode\n# Your code here\n\n\n\n\nKey Groupby Points\n\nA groupby object is not a DataFrame, but a special object containing group information.\nTo view data in a groupby object, you need to iterate over it or apply aggregation functions.\nGroupby operations are lazy and require aggregation to produce results.\nThe result of aggregating a groupby object is typically a DataFrame or Series, which you can then manipulate further." + "text": "Multi-level Grouping\nWe can group by multiple columns to create a hierarchical index.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Add a 'year' column\ndf['year'] = df['date'].dt.year\n\n# Group by location and year\nmulti_grouped = df.groupby(['location', 'year'])\n\nprint(multi_grouped.mean())\n\n\n date temperature humidity\nlocation year \nA 2023 2023-01-03 20.0 50.333333\nB 2023 2023-01-04 23.0 46.666667\n\n\n\nUnderstanding Groupby Objects\nAfter using the groupby() function, it’s important to understand what kind of object we’re working with and how it differs from a regular DataFrame. Let’s explore this in more detail.\n\nThe Groupby Object\nWhen you apply the groupby() function to a DataFrame, the result is a DataFrameGroupBy object. This object is not a DataFrame itself, but rather a special object that contains information about the groups.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Create a groupby object\ngrouped = df.groupby('location')\n\n# Check the type of the grouped object\nprint(type(grouped))\n\n# Try to view the grouped object\nprint(grouped)\n\n\n<class 'pandas.core.groupby.generic.DataFrameGroupBy'>\n<pandas.core.groupby.generic.DataFrameGroupBy object at 0x111b18ad0>\n\n\nAs you can see, simply printing the groupby object doesn’t show us the data. Instead, it gives us information about the groupby operation.\n\n\n\nAccessing Group Data\nTo actually see the data in each group, we need to iterate over the groups or use aggregation functions.\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Iterate over groups\nfor name, group in grouped:\n print(f\"Group: {name}\")\n print(group)\n print()\n\n# Using an aggregation function\nprint(grouped.mean())\n\n\nGroup: A\n location date temperature humidity year\n0 A 2023-01-01 20 50 2023\n2 A 2023-01-03 19 52 2023\n4 A 2023-01-05 21 49 2023\n\nGroup: B\n location date temperature humidity year\n1 B 2023-01-02 22 48 2023\n3 B 2023-01-04 24 45 2023\n5 B 2023-01-06 23 47 2023\n\n date temperature humidity year\nlocation \nA 2023-01-03 20.0 50.333333 2023.0\nB 2023-01-04 23.0 46.666667 2023.0\n\n\n\n\nKey Differences from DataFrames\n\nStructure: A groupby object is not a table-like structure like a DataFrame. It’s more like a container of groups.\nDirect Access: You can’t access columns or rows of a groupby object directly like you can with a DataFrame.\nLazy Evaluation: Groupby operations are lazy - they don’t actually compute anything until you call an aggregation function or iterate over the groups.\nAggregation Required: To get meaningful results from a groupby object, you typically need to apply some kind of aggregation function (like mean(), sum(), count(), etc.).\n\n\n\nConverting Groupby Results to DataFrame\nAfter applying an aggregation function to a groupby object, the result is typically a DataFrame:\n\n✏️ Try it. Add the cell below to your notebook and run it.\n\n\n\nCode\n# Result of aggregation is a DataFrame\nresult = grouped.mean()\nprint(type(result))\n\n# We can now use DataFrame methods on this result\nprint(result.reset_index())\n\n\n<class 'pandas.core.frame.DataFrame'>\n location date temperature humidity year\n0 A 2023-01-03 20.0 50.333333 2023.0\n1 B 2023-01-04 23.0 46.666667 2023.0\n\n\n\nPractice\nTry grouping the data by both ‘location’ and ‘year’, then calculate the maximum temperature for each group. What type of object do you get? How can you reset the index to make it a regular DataFrame?\n\n✏️ Try it. Add the cell below to your notebook and then provide your code.\n\n\n\nCode\n# Your code here\n\n\n\n\nKey Groupby Points\n\nA groupby object is not a DataFrame, but a special object containing group information.\nTo view data in a groupby object, you need to iterate over it or apply aggregation functions.\nGroupby operations are lazy and require aggregation to produce results.\nThe result of aggregating a groupby object is typically a DataFrame or Series, which you can then manipulate further." }, { "objectID": "course-materials/interactive-sessions/6a_grouping_joining_sorting_1_old.html#reshaping-dataframes-with-pivot-tables", diff --git a/index.qmd b/index.qmd index 09947e3..472d273 100644 --- a/index.qmd +++ b/index.qmd @@ -32,6 +32,10 @@ The goal of EDS 217 (Python for Environmental Data Science) is to equip incoming - Collaborate with peers to solve group programming tasks, and communicate the process and results to the rest of the class +## Syncing your classwork to Github + +[Here](/course-materials/interactive-sessions/8a_github.qmd) are some directions for syncing your classwork with a GitHub repository + ## Teaching Team