This is a submission for "Getting And Cleaning Data" course's Peer Reviewed Project:
- UCI HAR Dataset: The the folder of raw and origin data
- run_analysis.R: The R script that extract relevant data from the raw data
- tidy_data_set.txt: The output of the R script
- Codebook.md: Description of the output data
- GettingAndCleaningData.Rproj: The R Studio project file (ignore if you don't use R Studio)
- Fork this repo and download everything into a folder on your local machine. This repo also includes the raw data. You can provide your own data to save some bandwidth and skip the 'UCI HAR Dataset' folder.
- If you use R-Studio, simply open 'GettingAndCleaningData.Rproj' project file. The Working Directory will be set automatically for you. Else, remember to use 'setwd()' to set the working directory to the folder of this repo.
- Open 'run_analysis.R' and make sure the list of file locations is correct, especially if you use your own data
- Run this script using the source() command.
- A text file named 'tidy_data_set.txt' will be created (overwritten if it's already there) in the same directory as the script.
- The Subject columns are read from 'subjects_train.txt' and 'subject_test.txt'
- The Activity columns are read from 'y_train.txt' and 'y_test.txt' Then they are converted to readable strings using 'activity_labels.txt'
- Mean and Standard Deviation columns are read from 'X_train.txt' and 'X_test.txt'
- All columns for each data set are merged creating 2 data sets: training and test
- Training and test data sets are merged row-wize to form the raw data set
- The raw data set is then broken into chunks by Subject and Activity columns
- For each chunk, the average of each column other than the Subject and Activity is calculated
- The results of the above calculation are merged to form the data set shown in 'tidy_data_set.txt'