Skip to content

Latest commit

 

History

History
31 lines (24 loc) · 1.53 KB

README.md

File metadata and controls

31 lines (24 loc) · 1.53 KB

STATS 5243: Applied Data Science Final Project

Authors: Yaxin Deng, Xiaomeng Huang, Min Sun, Anderson Nelson

Description:

The data was scraped from WineEnthusiast on November 22nd, 2017 which contains 13 variables: Country: The country that the wine is from

  • Description: wine reviews
  • Designation: The vineyard within the winery where the grapes that made the wine are from
  • Points: The number of points WineEnthusiast rated the wine on a scale of 1-100 (though they say they only post reviews for wines that score >=80)
  • Price: The cost for a bottle of the wine
  • Province: The province or state that the wine is from
  • Region_1: The wine growing area in a province or state (ie Napa)
  • Region_2: Sometimes there are more specific regions specified within a wine growing area (ie Rutherford inside the Napa Valley), but this value can sometimes be blank
  • Taster_name
  • Taster_twitter_handle
  • Title: The title of the wine review, which often contains the vintage if you're interested in extracting that feature
  • Variety: The type of grapes used to make the wine (ie Pinot Noir)
  • Winery: The winery that made the wine

Questions:

  • What are the factors that influence prices?
  • What kind of segmentation makes sense for this data
  • Create a recommendation system
  • Text analysis of the data
  • Create a model that predicts price and point evaluate based on RMSE
    • Quantify the impact of wine description on price and point? Hypothesis is that description is signficant in predicting points, and not significant when predicting price?