Skip to content

Performing statistical and exploratory data analysis on ECG data containing signals of single heartbeats and deploying multivariate predictive models such as Logistic Regression, KNN, Decision Tree, Lasso Regression, Ridge Regression and Random Forests to predict and classify heart arrhythmias, that is, whether the heartbeat is normal or abnorma…

License

Notifications You must be signed in to change notification settings

srinidhi14vaddy/Heartbeat-Anomaly-Detection-from-ECG-Signals-in-R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Heartbeat-Anomaly-Detection-from-ECG-Signals-in-R

Problem Statement:

Performing statistical and exploratory data analysis on ECG data containing signals of single heartbeats and deploying multivariate predictive models such as Logistic Regression, KNN, Decision Tree, Lasso Regression, Ridge Regression and Random Forests to predict and classify heart arrhythmias, that is, whether the heartbeat is normal or abnormal with the highest accuracy.

Introduction:

The heart contracts in a rhythmical manner to pump blood throughout the body with contractions beginning at the atrial sine node and propagates through the rest of the muscle. This electrical signal propagation has a pattern and because of this, electrical currents are generated on the surface of the body causing variations in the electrical potential of the skin surface. These signals can be captured with electrodes and appropriate equipment and are called ECG signals. Thus, an ECG signal or an electrocardiogram measures the electrical activity of the heart to detect heart functioning and heart diseases. By analyzing the combination of action impulse waveforms generated by different specialized cardiac tissues in the heart, it is possible to detect some abnormality. The abnormal beating is called an arrhythmia and each type of arrhythmia is associated with a pattern that can be seen in the ECG signal.

About the dataset:

  1. The dataset consists of 12552 observations of single heartbeats.
  2. The predictor vector is of length p = 187, wherein these 187 variables describe the ECG signal impulse value at each consecutive second.
  3. The diagnostic response vector Y_train is a categorical vector depicting 0 for normal heartbeat and 1 for abnormal heartbeat.

Statistical Analysis:

1. Data Exploration : Loading and describing the data

● The dataset X_train has 12552 observations and 187 variables. The vector y_train has 12552 rows of categorical values of 0 and 1 for normal and abnormal heartbeats. This forms our training data to train the different models on. ● The dataset X_test has 2000 observations and 187 variables but there is no y_test. This is unlabelled data and the model’s accuracy can be checked on this unseen dataset. ● table(y_train) function is used to check the number of observations in each class.

○ There are 3046 observations in 0/normal class ○ There are 9506 observations in 1/abnormal class ● colSums(is.na(X_train)) function is used to check for any missing values in any of the variables. There were no missing values in any of the columns.

2. Data Visualization:

Plotting the distribution of the observations in the two classes - normal and abnormal using Barplot. There are 3 times more observations in the abnormal class than normal class.

Plotting the heartbeat with time for each of the classes by using the Unlist function.

The normal heartbeat has less variability than the abnormal heartbeat and the curve is smoother. The abnormal heartbeat also has higher peaks, which means it goes beyond the normal range in P wave, PR interval and QRS duration.

3. Data Processing and Transformation

Since the data in all the variables are in the same scale, normalization isn’t required. The testing dataset provided is unlabelled. Therefore, to check the accuracy of the trained models, it is important to split the given training dataset into train and test sets of fixed size. Setting 11000 rows in the training dataset and the rest in the test dataset to check for model performance.

About

Performing statistical and exploratory data analysis on ECG data containing signals of single heartbeats and deploying multivariate predictive models such as Logistic Regression, KNN, Decision Tree, Lasso Regression, Ridge Regression and Random Forests to predict and classify heart arrhythmias, that is, whether the heartbeat is normal or abnorma…

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published