Skip to content

aksmit94/getdata-101_Course_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

#READ ME

##Additional packages required: dplyr, stringr

###The following steps, when applied sequentially, will give the user a tidy, long format dataset as demanded in the project

  1. Fetching required files in R by read.table()

  2. x_test as xtest

  3. x_train as xtrain

  4. y_test as ytest

  5. y_train as ytrain

  6. subject_test as subtst

  7. subject_train as subtrn

  8. features as features

  9. activity_labels as actlbl

  10. Joining relevant data sets by rbind()

  11. xtest & xtrain and store in x.

  12. ytest & ytrain and store in y.

  13. subtst & subtrn and store in sub.

  14. Removing redundant first columns from datasets which include first column as a numeric vector from 1 to column length by select()

  15. Removing all "()" present in column names in features by gsub()

  16. Storing "feature" indices with strings "mean" or "std" in feat (and thereby including much debated "meanfreq" also) by grep(...,fixed = F)

  17. Subsetting according to feat by [ ] operator

  18. Columns of x; into x

  19. Rows of features; into features

  20. Setting variable names (column names) of x from features vector obtained from above by names()

  21. Joining sub, y and x (in this order) into a resulting data frame named xysub by cbind()

  22. Assigning column names to xysub by names()

  23. Column corresponding to sub i.e. 1st column: "Subject"

  24. Column corresponding to y i.e. 2nd column: "Activity_Label"

  25. Column corresponding to x i.e. 3rd to 81st column: features

  26. Making name vector lblnames from actlbl by as.character()

  27. Replacing Activity_Label numeric entries (2nd column of xysub) with corresponding activity names from actlbl by for(i in 1:6) {xysub$Activity_Label[xysub$Activity_Label == i] <- lblnames[i]}

  28. Grouping xysub by Subject and Activity_Label by group_by()

  29. Creating new data frame tld (tidy long data) containing "average of each variable for each activity and each subject" from xysub by summarise_each()

  30. Appending "Avg-" to all column names except the first two in tld by ifelse(names(tld)%in%c("Subject", "Activity_Label"), str_c('', names(tld)), str_c('Avg-', names(tld)))

  31. Writing tld to "Tidy_Long_Data.txt" by write.table(tld, file = "", row.name = F)

About

This repository contains files specific to the submission for Course Project of Getting and Cleaning Data course from Data Science Specialization of Coursera.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages