Coursera Online Course - Peer Assessment Project
This file describes how the run_analysis.R script works:
- Download and extract the data file from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip and rename the folder to 'data'
- Put the run_analysis.R file into the same folder as the data directory and make sure you set your working directory within R to that directory
- Next, source("run_analysis.R") within R to generate the tidy data set
- Within the working directory you should find the output data file called 'tidy_data_with_means.txt'
- Read the data into a new data frame within R: df <- read.table('tidy_data_with_means.txt')
The script itself works in the following way (see R script for inline comments):
- Read the training data, labels, subjects into seperate data frames (X_train.txt, y_train.txt, subject_train.txt)
- Read the test data, labels, subjects into seperate data frames (X_text.txt, y_test.txt, subject_test.txt)
- Join the respective data frames using row binding: rbind()
- Read the features into a data frame (features.txt)
- Find relevant columns for subsetting the combined data using grep and regular expression: grep("mean\\(.|std\\(.")
- Using the columns indentified in 5. subset the data: df <- df[, columns]
- Sanitize the column names in the data by removing parenthesis and capitalizing 'mean' and 'std' columns. Use make.names() to make sure all column names are validated
- Read the activities into a data frame (activity_labels.txt)
- Sanitize the column names by lowercasing activities and disallowing underscores in names.
- Map labels to activities and rename first column to "activity"
- Rename first column in the subjects data frame to "subject"
- Construct first tidy data set by doing a column binding using: subjects, labels and data frames: tidyData1 <- cbind(subjects, labels, data)
- Aggregate the data from tidyData1 (calculate average of each variable for each activity and each subject): tidyData2 <- with(tidyData1, aggregate(tidyData1[,c(-1,-2)], list(subject, activity), mean))
- Sanitize aggregated data by renaming column 1 to "subject" and column 2 to "activity"
- Sort the tidy data by "subject, activity" in increasing order
- Subset the tidy data since only the first 180 records will contain meaningful entries (30 subjects with 6 activities each)
- Write the tidy data to 'tidy_data_with_means.txt'