Project 5: California Wildfires

Carly Sharma, Jake Parker, & Adam Zucker

Problem Statement

Given California's history of terrible wildfires, we've set out to predict the potential severity of future fires based on past fire data, county-specific weather patterns, and global atmospheric carbon data. We'll use a variety of models, including Logistic Regression, K-Nearest Neighbors, a Random Forest Classifier, a Support Vector Classifier, and a Convolutional Neural Network. With these predictions, we hope to recommend mitigating factors and aid the California Department of Forestry and Fire Protection (Cal Fire) in stopping the spread of wildfires in their earliest stages.

Background Research

As climate change becomes an ever-increasing concern for the health and longevity of the Earth, wildfires, and their severity, have proportionally been on the rise. Arid climates, such as those found in many parts of California, are at particularly high risk - 2019 and 2020 have seen some of the most destructive fires in the state's history (Cal Fire). Cal Fire has implemented measures in attempt to mitigate the spread of wildfires, but we would like to investigate if there are more specific actions to be taken to help prevent the start, and most certainly the spread of these fires.

Notebook 1: Data imports, cleaning, and merging into a dataframe for modeling
Notebook 2: A notebook devoted to EDA, feature selection and engineering, and further data cleaning
Notebook 3: Visualization notebook, containing graphs and additional EDA
Notebook 4: Modeling notebook containing all classification models we ran

Datasets Used

true_df.csv: Our engineered dataset based on the below:
- California Wildfire Incidents dataset (from Kaggle)
- Meteorological data (from NOAA)
- Atmospheric carbon level data (from NOAA/Mauna Loa Observatory)
dummy_df.csv: A dummified version of our true_df

Data Dictionary

Feature	Type	Dataset	Description
acres_burned	float	`true_df.csv`	Total acres burned in a given fire
admin_unit	object	`true_df.csv`	Responding fire unit
avg_monthly_temp	float	`true_df.csv`	Average monthly temperature in degrees Fahrenheit
avg_wind_speed	float	`true_df.csv`	Average monthly wind speed in MPH
co2_measured_mole_fraction	float	`true_df.csv`	Measured mole fraction of carbon dioxide in atmosphere, collected from Mauna Loa Observatory, HI
cooling_degree_days	float	`true_df.csv`	Cooling degree days, computed as the mean daily temperature minus 65 degrees Fahrenheit. Each day is summed to produce a monthly total.
county	object	`true_df.csv`	County where a given fire started
date	object	`true_df.csv`	The date corresponding to the month during which a given fire started
dp10	float	`true_df.csv`	Number of days in a month with at least 0.01 inches of rainfall
dt00	float	`true_df.csv`	Number of days in a month with a minimum temperature less than or equal to 0 degrees Fahrenheit
dt32	float	`true_df.csv`	Number of days in a month with a minimum temperature less than or equal to 32 degrees Fahrenheit
dx32	float	`true_df.csv`	Number of days in a month with a maximum temperature less than or equal to 32 degrees Fahrenheit
dx70	float	`true_df.csv`	Number of days in a month with a maximum temperature less than or equal to 70 degrees Fahrenheit
dsnd	float	`true_df.csv`	Number of days in a month with snowfall greater than an inch
duration	float	`true_df.csv`	The duration of a given fire in days
elevation	float	`true_df.csv`	The elevation, given in feet, at which the fire started
extinguished	object	`true_df.csv`	Date the fire was extinguished
extreme_max_temp	float	`true_df.csv`	Highest daily temperature from the month
extreme_min_temp	float	`true_df.csv`	Lowest minimum temperature from the month
fire_bins	integer	`true_df.csv`	Bins representing acres burned in a given fire
fire_name	object	`true_df.csv`	The name of a given fire
highest_daily_snowfall	float	`true_df.csv`	Highest snowfall on a given day in a month
latitude	float	`true_df.csv`	Latitude, given in decimal degrees
longitude	float	`true_df.csv`	Longitude, given in decimal degrees
major_incident	boolean	`true_df.csv`	Whether or not the fire was classified as a major incident
month	integer	`true_df.csv`	The month a given fire started
qc_flag	boolean	`true_df.csv`	Whether or not an atmospheric carbon dioxide measurement was considered viable
season	object	`true_df.csv`	Season of the year
started	object	`true_df.csv`	Date the fire started
temp_range	float	`true_df.csv`	The difference between the maximum and minimum temperature in a given month
total_monthly_precipitation	float	`true_df.csv`	Total precipitation in a given month, given in inches
total_monthly_snowfall	float	`true_df.csv`	Total snowfall in a month, given in inches

Analysis Summary

We began with 1 dataset on California Fire Incidents from Kaggle, 12 California-specific meteorological datasets from NOAA spanning from 2017 to 2019, and 1 dataset from the Mauna Loa Observatory in Hawaii measuring approximate global atmospheric carbon dioxide levels. In Notebook 1, we began by merging all 12 NOAA dataset, then cleaning the resulting dataset based on null values, odd or unexpected entries, and general formatting. We then cleaned our California Fire Incidents and Mauna Loa carbon dioxide datasets in preparation for merging all our data into a single dataframe. In this process, we imputed values for nulls based on monthly averages, and converted longitude and latitude data to county names, to enable proper merging of our data.
In Notebook 2, we went on to explore the data, select our desired features, account for multicollinearity, and engineer potentially useful features (such as season and month of the fire). We plotted various Seaborn heatmaps, and generated a number of correlation matrices to examine features we wanted to include in our final models.
We visualized our data in Notebook 3, looking at graphs to compare acres burned by county, month, season, and year; total acres burned in California between 2017 and 2019; duration of fire by county; and average duration of fire based on severity.
We ultimately tested 5 classification models: Logistic Regression, K-Nearest Neighbors, a Random Forest Classifier, a Support Vector Classifier, and a Convolutional Neural Network. Our target feature was the total acres burned, classified into bins we defined as:
- Bin 1: 50 acres or below
- Bin 2: 100 acres or below, and greater than 50 acres
- Bin 3: 250 acres or below, and greater than 100 acres
- Bin 4: 500 acres or below, and greater than 250 acres
- Bin 5: 1000 acres or below, and greater than 500 acres
- Bin 6: Greater than 1000 acres

Conclusions & Recommendations

From our EDA, we found that counties in higher elevations tend to have more and larger fires. Expectedly, precipitation levels and maximum temperatures also contribute heavily to the likelihood of wildfires - low rainfall and snowfall equates to a higher chance of a large fire, as does an extreme monthly temperature of above 90 degrees Fahrenheit.
Wildfires are most prevalent in the Summer and Fall seasons. Specifically in July, we noticed an especially high occurrence of small and large wildfires.
The most frequently occurring fires are small in terms of acres burned, but many small fires can add up to a large amount of damage.
We would recommend controlled burn-offs of dry foliage leading up to fire season (Summer into Fall), especially in arid and high-elevation areas.
We would also recommend reallocation of resources and firefighters based on environmental and weather conditions, such that more units can respond as quickly as possible to a fire threat, thus stopping the spread of the fire.

Further Research

Given the time, we'd like to look at wind speeds at the time the fire started, and not just the monthly average, as this could inform how quickly a fire spreads.
We'd also like to compare rural versus urban areas, in order to try to get a feel for where and how the most severe fires start and spread.
Looking at a longer history of fires in California, and globally, would be an interesting and informative metric if we had access to longer-term data.
We could research and gather more data on the concrete effects of climate change on already fire-prone environments.

Sources Cited:

Kaggle California wildfire data
NOAA meteorological data
NOAA/Mauna Loa Observatory atmospheric carbon data
Cal Fire

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
code		code
data_clean		data_clean
data_raw		data_raw
images		images
.DS_Store		.DS_Store
README.md		README.md
ca_wildfires_slide_deck.pdf		ca_wildfires_slide_deck.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project 5: California Wildfires

Carly Sharma, Jake Parker, & Adam Zucker

Problem Statement

Background Research

Contents

Datasets Used

Data Dictionary

Analysis Summary

Conclusions & Recommendations

Further Research

Sources Cited:

About

Releases

Packages

Languages

awzucker/california_wildfires

Folders and files

Latest commit

History

Repository files navigation

Project 5: California Wildfires

Carly Sharma, Jake Parker, & Adam Zucker

Problem Statement

Background Research

Contents

Datasets Used

Data Dictionary

Analysis Summary

Conclusions & Recommendations

Further Research

Sources Cited:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages