check the significace of contextual variables in prediction
While performing the prediction of any ratings for a business, we usually consider the variables which directly effect the outcome. I am interested in finding whether the contextual variables play a significant role in better predicting the ratings\
I chose the dataset from the yelp reviews which has information about business, user,rating, reviews\
Import the data set
Identify possible contextual variables
extract and clean the reviews text from data set
generate a data frame by computing scores of each review with all context variables
split the data frame into test and train data set
compute average rating for both test and train data
select the required features and run Random Forest regression
compute RMSE of different Random Forest regressions with and without context variables
verify the significance
Check whether contextual variables aid in better predicting the ratings