-
Install Go you can find the instructions here
Install GCC, needed for the testing
Make sure to set your GOROOT Directory Instructions Here
-
Install Dep You can find Instructions here.
*Dep is used for the package management in this application.
-
Download code into the correct directory.
Go is a little picky about where code is. It wants to be in your go root directory and the code needs to be in the following path for this project.
$GOROOT/src/github.com/Iukekini/backend-coding-assessment-Iukekini-1052
-
Build the Project and test project.
I setup a make file to do all the dependency loading / building / testing.
make
If you want to run it manually, here are the commands it will run.
dep ensure go build -o podium-backend-assessment -v go test -v ./...
-
Run the application.
./podium-backend-assessment
Results Notes The results are laid out in a table wiht the follow columns
- Probability - The is the probability that the classifier put the review in the right class (1-5).
- Rating - This is the class returned by the classifier
- User - User that authored the review
- Visit Type - Service / Sales / Used
- Score - this is the score the user gave the review
- Date - Date of Visit
- Review - This is the title of the review. I didn't include the body as it was too long to display nicely.
If you want to see more reviews or pull more data (parse more pages) you can adjust that from the config.json file.
In order to rank the reviews based on their positivity. I setup a Bayes classifier. I used a set of amazon reviews to train the classifier on what a positive review looked like. The classifier has 5 classes based on the 5 stars of an amazon review. After the classifier was trained I checked each of the reviews that I had parsed from the site against the classifier. I took the result and used that to sort the reviews and pick the highest rated 3 reviews to show.
notes
The classifier training data was not perfect for this scenario. Since an amazon review is more love / neutral / hate type of review. The classifier had a harder time picking between a good review and an over the top review. This problem could be solved by creating a set of training data that better represented this problem.
Please feel free to open an issue.
goml for the classification algorithm
go-config for the Config loading and management
Testify Some add ons for the go test suite. Enables assert and panic checks.
Log15 for the Logging
goquery like jquery but for go. Used it for searching parsing the webpages.
Training Data I used the amazon review csv to train the classifier. I only used the first 4k rows
Dep for package management