Dataset contains 1.7 million name records in the tuple {name
, gender
, count
, year
}. See how names change in popularity from 1890 to 2010. Source: US Social Security Administration
cd to workspace directory and clone this git repo:
git clone https://github.com/seahrh/sbac
Install frontend dependencies (npm packages)
npm install
gulp update
Install Java dependencies
mvn clean install
Install mongodb
Download the data dump (206MB)
Import the data dump (assuming data is stored at default path /data/db):
mongoimport --db sbac --collection Name Name.json --jsonArray
Start the mongo daemon
mongod --auth
Access is password protected. Create the following users in mongo shell:
user: dba, password: dba (all privileges)
use admin
db.createUser({user: "dba",pwd: "dba",roles: [ { role: "userAdminAnyDatabase", db: "admin" } ] })
user: sbacu, password: sbacu (read/write privilege on sbac database)
use sbac
db.createUser({user: "sbacu",pwd: "sbacu",roles: [ { role: "readWrite", db: "sbac" } ] })
Login as sbacu:
mongo --port 27017 -u "sbacu" -p "sbacu" --authenticationDatabase "sbac"
Frontend is built with gulp e.g. concatenate and minify js, css.
gulp
Backend java is built with maven.
mvn clean package
Tests are run with maven. Unit tests are run in test
phase and integration tests are run in verify
phase of the Maven build lifecycle. Tests are contained in src/test/java
package; *Test.java
are unit tests and *IT.java
are integration tests.
The REST API is tested as an integration test because it requires a live mongodb connection and the API must be deployed on a server.
If mongod has not already started,
mongod --auth
To run the tests,
mvn verify
To do the localhost in one step, run the following batch script (Windows): dev.bat
dev
This will run the gulp build, then maven build including all unit and integration tests.
Open the following url in browser:
App configuration (e.g. database credentials) is stored in a configuration file separate from source code. So that if we need to change configuration, we don’t need to recompile.
src/main/resources/config/config.properties
There are actually two tasks. First, autocomplete the search query. Second, given a search query, use mongodb to perform the full text search.
-
Use MongoDB as the main database (4 hours)
-
database=
sbac
-
collection=
Name
(same as model classnameName.java
, following DRY principle) -
Schema: {name, gender, count, year}
-
example query:
db.Name.find()
-
Used Morphia as ORM and mongodb Java driver
-
Did not shard as data is not big
-
-
Use MongoDB textsearch for autocomplete feature (1 hour)
- Built a full text search index on two fields {
name
,year
} as these are the likely fields to search
- Built a full text search index on two fields {
-
Have a simple AutoComplete Search Bar (3 hours)
-
Handle the autocomplete on the client side to make the user experience responsive in real time
-
Used a 3rd party JavaScript widget: bootstrap-3-typeahead. Fetches a json file
src/main/webapp/names.json
from the server to populate the widget. -
Autocomplete suggestions are limited to names that have at least count=10. This reduces the size of the json file, so that the autocomplete/network latency is more performant.
-
Reduce load on datastore by linking the autocomplete to json file
-
Basically the results of the query are cached in the json file.
-
Nature of data is static (names), so do not need to update the json file frequently
-
-
Call search api when button is clicked
-
-
A simple result page for the search terms (3 hours)
-
Made with jquery/bootstrap
-
When data is returned from api, use JavaScript to show search results
-
-
REST API done with Jersey (8 hours)
-
see
sbac.api
package -
No api key as the api is public. Instead log session id. If user misbehaves, rate limit by session id (or ip address).
-