jupyter

jupytext

kernelspec

formats

text_representation

ipynb,md

extension	format_name	format_version	jupytext_version
.md	markdown	1.3	1.12.0

display_name	language	name
Python 3 (ipykernel)	python	python3

Linear Regression and Temperature

In this notebook, we'll look at using linear regression to study changes in temperature.

Setup

import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

%config InlineBackend.figure_format ='retina'

Getting our data

We'll be getting data from North America Land Data Assimilation System (NLDAS), which provides the daily average temperature from 1979-2011 for the United States.

For the next step, you will need to choose some settings in the data request form. These are:

GroupBy: Month Day, Year
Your State
Export Results (check box)
Show Zero Values (check box)

Download the data for your home state (or state of your choosing) and upload it to M2 in your work directory.

Loading our data

df = pd.read_csv('North America Land Data Assimilation System (NLDAS) Daily Air Temperatures and Heat Index (1979-2011).txt',delimiter='\t',skipfooter=14,engine='python')

df

Clean the data

Drop any rows that have the value "Total" in the Notes column, then drop the Notes column

Make a column called Date that is in the pandas datetime format

Make columns for 'Year', 'Month', and 'Day' by splitting the column 'Month Day, Year'

df['DateInt'] = df['Date'].astype(int)/10e10 # This will be used later

Generating a scatter plot

Use df.plot.scatter to plot 'Date' vs 'Avg Daily Max Air Temperature (F)'. You might want to add figsize=(50,5) as an argument to make it more clear what is happening.

Describe your plot.

Adding colors for our graph

# No need to edit this unless you want to try different colors or a pattern other than colors by month

cmap = matplotlib.cm.get_cmap("nipy_spectral", len(df['Month'].unique())) # Builds a discrete color mapping using a built in matplotlib color map

c = []
for i in range(cmap.N): # Converts our discrete map into Hex Values
    rgba = cmap(i)
    c.append(matplotlib.colors.rgb2hex(rgba))

df['color']=[c[int(i-1)] for i in df['Month'].astype(int)] # Adds a column to our dataframe with the color we want for each row

Make the same plot as 4) but add color by adding the argument c=df['color'] to our plotting command.

Pick a subset of the data

Select a 6 month period from the data. # Hint use logic and pd.datetime(YYYY, MM, DD)

Plot the subset using the the same code you used in 6). You can change the figsize if needed.

Linear Regression

We are going to use a very simple linear regression model. You may implement a more complex model if you wish.

The method described here is called the least squares method and is defined as:

$m = \frac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y}))}{\sum_{i=1}^{n}(x_i-\bar{x})^2}$

$b = \bar{y} - m\bar{x}$

Where $\bar{x}$ and $\bar{y}$ are the average value of $x$ and $y$ respectively.

First we need to define our X and Y values.

X=subset['DateInt'].values
Y=subset['Avg Daily Max Air Temperature (F)'].values

def lin_reg(x,y):
    # Calculate the average x and y
    x_avg = np.mean(x)
    y_avg = np.mean(y)

    num = 0
    den = 0
    for i in range(len(x)): # This represents our sums
        num = num + (x[i] - x_avg)*(y[i] - y_avg) # Our numerator
        den = den + (x[i] - x_avg)**2 # Our denominator
    # Calculate slope
    m = num / den
    # Calculate intercept
    b = y_avg - m*x_avg

    print (m, b)
    
    # Calculate our predicted y values
    y_pred = m*x + b
    
    return y_pred

Y_pred = lin_reg(X,Y)

subset.plot.scatter(x='Date', y='Avg Daily Max Air Temperature (F)',c=subset['color'])
plt.plot([min(subset['Date'].values), max(subset['Date'].values)], [min(Y_pred), max(Y_pred)], color='red') # best fit line
plt.show()

What are the slope and intercept of your best fit line?

What are the minimum and maximum Y values of your best fit line? Is your slope positive or negative?

Putting it all together

Generate a best fit line for the full data set and plot the line over top of the data.

Is the slope positive or negative? What do you think that means?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linear_temperature.md

linear_temperature.md

Linear Regression and Temperature

Setup

Getting our data

Loading our data

Clean the data

Generating a scatter plot

Adding colors for our graph

Pick a subset of the data

Linear Regression

Putting it all together

Files

linear_temperature.md

Latest commit

History

linear_temperature.md

File metadata and controls

Linear Regression and Temperature

Setup

Getting our data

Loading our data

Clean the data

Generating a scatter plot

Adding colors for our graph

Pick a subset of the data

Linear Regression

Putting it all together