This repository contains two Jupyter notebooks focusing on the analysis of US electricity disturbances from 2019 to 2023. The data is sourced from the US Department of Energy. The first notebook focuses on data cleaning, while the second notebook performs exploratory data analysis (EDA) to uncover insights and patterns in the data.
The dataset used in this analysis is obtained from the US Department of Energy. The data includes information on electrical disturbances reported across the US from 2019 to 2023, including the causes, locations, and impacts of these disturbances.
content
: Folder containing the data and plots.Data
: Contains the raw data files from the source in excel along with some documentation.Plots
: Contains all the plots saved from the analysis.electricity_disturbance_data.csv
: Combined data after data cleaning and preparation.
Data_Cleaning_US_Electricity_Disturbances.ipynb
: Jupyter notebook containing the data cleaning process.EDA_US_Electricity_Disturbances.ipynb
: Jupyter notebook containing the exploratory data analysis.
File: Data_Cleaning_US_Electricity_Disturbances.ipynb
This notebook outlines the process of cleaning the dataset to ensure it is ready for analysis. Key steps include:
- Importing Necessary Libraries: Libraries such as
pandas
,numpy
, and others are imported to handle data operations and manipulations. - Loading the Dataset: The raw dataset is loaded into a pandas DataFrame.
- Standardizing Data Formats: Ensuring consistency in data formats, such as dates and categorical variables.
- Removing Duplicates: Duplicate entries are identified and removed to avoid skewed results.
- Ensuring Data Consistency: Additional checks are performed to ensure data integrity and consistency.
- Add or remove columns: Add additional columns or remove existing columns based on the scope of the anlysis.
- Save to CSV: Saved the cleaned data frame to
electricity_disturbance_data.csv
File: EDA_US_Electricity_Disturbances.ipynb
This notebook provides a comprehensive analysis of the cleaned dataset to uncover trends, patterns, and insights. Key analyses include:
-
- General Observations:
- Skewness: All three variables show a high degree of positive skewness, indicating that extreme values (high demand loss, many customers affected, and long event duration) are less frequent.
- Central Tendency: The central tendency (mean or median) for all three variables is likely to be low due to the skewness.
- Variability: There is a wide range in values for all three variables, suggesting a lot of variability in the impact of electrical disturbances.
- General Observations:
-
- Demand Loss (MW) and Number of Customers Affected:
- Correlation Coefficient: 0.45
- Interpretation: There is a moderate positive correlation between demand loss (MW) and the number of customers affected. This suggests that as the demand loss increases, the number of customers affected tends to increase as well, but the relationship is not very strong.
- Number of Customers Affected and Event Duration (hours):
- Correlation Coefficient: 0.34
- Interpretation: There is a moderate positive correlation between the number of customers affected and the event duration. This indicates that events affecting more customers tend to last longer, although the relationship is moderate and other factors may also play a significant role.
- Demand Loss (MW) and Event Duration (hours):
- Correlation Coefficient: 0.15
- Interpretation: There is a weak positive correlation between demand loss (MW) and event duration. This suggests that the relationship between the amount of demand loss and the duration of the event is weak, implying that factors other than the duration of the event are likely more significant in determining the extent of demand loss.
- Demand Loss (MW) and Number of Customers Affected:
-
Question: What is the most common type of event?
-
Question: Which areas are most affected by these events?
-
Question: What is the average duration of each event type?
-
Question: How many customers are affected by different types of events?
-
Question: Which Events have the greater average of Demand Loss (MW)?
-
Question: Are there specific months with higher frequencies of events?
-
Question: Is there a trend in the number of events over the years?
-
Question: How does the event duration vary over different months?
This project is licensed under the MIT License.