Getting and Cleaning Data Course Project

Introduction

This reposistory contains files for the implentation of the Coursera Class Getting and Cleaning Data.

Files In Repository

The repository contains the following files:-

README.md - This readme file.
CodeBook.md - The code book explaining the data contained in the data set.
run_analysis.R - The R script to generate a tidy data set based on the raw data.
tidy.data.txt - The output of the script. This is a tidy data set with the average of each variable for each activity and each subject.

Project Requirements

This Coursera project requires one R script called run_analysis.R that does the following:-

Merges the training and the test sets to create one data set.
Extracts only the measurements on the mean and standard deviation for each measurement.
Uses descriptive activity names to name the activities in the data set
Appropriately labels the data set with descriptive variable names.
From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

Raw Data

The raw data for this project can be downloaded from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip.

For information on the contents of the raw data files, review the included Code Book.

The data should be extracted in its original structure into the project folder.

└───UCI HAR Dataset
        ├───test
        │   └───Inertial Signals
        └───train
            └───Inertial Signals

Unused Data

The data in the 'UCI HAR Dataset\test\Inertial Signals'and 'UCI HAR Dataset\train\Inertial Signals' directories are not used in this implementation.

Script Usage

Dependancies

The following libraries are required:-

data.table
reshape2

These libraries can be installed by running the following commands in the R console.

install.packages('data.table')
install.packages('reshape2')

Instructions

Download the raw data files from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
Unzip the files into the project folder keeping the folder structure.
Download the file run_analysis.R from this repository in the project folder.
Run the script run_analysis.R
```
 source('run_analysis.R')
```

Script Output

The script generates a text file called tidy.data.txt that will contains a tidy data set with the average of each variable for each activity and each subject.

Codebook

The code book for this project is located at this URL.

Other Details

Project Environment

This script was written using R version 3.1.2 (2014-10-31) ("Pumpkin Helmet")

Script Details

The run_analysis.R script performs the following actions:-

Read in the raw data sets.
Combine the training and test sets into a single dataset using the cbind and rbind functions. [Assignment step 1]
Keep only the subject.id, activity and any column that contains a mean or standard deviation. Note: For mean, any varaible name that is a mean is included. This means that meanFreq data is also kept. [Assignment step 2]
The activty values are replaced with descriptive names. [Assignment step 3]
The variable names are changed to descriptive labels. [Assignment step 4]
The data is melted and recast into a data set showing the average of each variable for each activity and each subject. The data is then outputted to a text file. [Assignement step 5]

Descriptive Variable Names

The outputted file modifies the variable names to make them more descriptive as follows:-

Names starting with 't' are preceeded by 'TimeDomain'.
Names starting with 'f' are preceeded by 'FrequencyDomain'.
'std' is replaced with 'StandardDeviation'.
'Acc' is replaced with 'Acceleration'.
'Gyro' is replaced with 'Gyroscope'.
'Mag' is replaced with 'Magnitude'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting and Cleaning Data Course Project

Introduction

Files In Repository

Project Requirements

Raw Data

Unused Data

Script Usage

Dependancies

Instructions

Script Output

Codebook

Other Details

Project Environment

Script Details

Descriptive Variable Names

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R
tidy.data.txt		tidy.data.txt

Folders and files

Latest commit

History

Repository files navigation

Getting and Cleaning Data Course Project

Introduction

Files In Repository

Project Requirements

Raw Data

Unused Data

Script Usage

Dependancies

Instructions

Script Output

Codebook

Other Details

Project Environment

Script Details

Descriptive Variable Names

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages