Getting-and-Cleaning-Data

Coursera Online Course - Peer Assessment Project

This file describes how the run_analysis.R script works:

Download and extract the data file from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip and rename the folder to 'data'
Put the run_analysis.R file into the same folder as the data directory and make sure you set your working directory within R to that directory
Next, source("run_analysis.R") within R to generate the tidy data set
Within the working directory you should find the output data file called 'tidy_data_with_means.txt'
Read the data into a new data frame within R: df <- read.table('tidy_data_with_means.txt')

The script itself works in the following way (see R script for inline comments):

Read the training data, labels, subjects into seperate data frames (X_train.txt, y_train.txt, subject_train.txt)
Read the test data, labels, subjects into seperate data frames (X_text.txt, y_test.txt, subject_test.txt)
Join the respective data frames using row binding: rbind()
Read the features into a data frame (features.txt)
Find relevant columns for subsetting the combined data using grep and regular expression: grep("mean\\(.|std\\(.")
Using the columns indentified in 5. subset the data: df <- df[, columns]
Sanitize the column names in the data by removing parenthesis and capitalizing 'mean' and 'std' columns. Use make.names() to make sure all column names are validated
Read the activities into a data frame (activity_labels.txt)
Sanitize the column names by lowercasing activities and disallowing underscores in names.
Map labels to activities and rename first column to "activity"
Rename first column in the subjects data frame to "subject"
Construct first tidy data set by doing a column binding using: subjects, labels and data frames: tidyData1 <- cbind(subjects, labels, data)
Aggregate the data from tidyData1 (calculate average of each variable for each activity and each subject): tidyData2 <- with(tidyData1, aggregate(tidyData1[,c(-1,-2)], list(subject, activity), mean))
Sanitize aggregated data by renaming column 1 to "subject" and column 2 to "activity"
Sort the tidy data by "subject, activity" in increasing order
Subset the tidy data since only the first 180 records will contain meaningful entries (30 subjects with 6 activities each)
Write the tidy data to 'tidy_data_with_means.txt'

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R
tidy_data_with_means.txt		tidy_data_with_means.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting-and-Cleaning-Data

About

Releases

Packages

Languages

Ralph-Zitz/Getting-and-Cleaning-Data

Folders and files

Latest commit

History

Repository files navigation

Getting-and-Cleaning-Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages