Road network partitioning scheme for faster queries in recommender systems. This project was part of the authors' R&D project during their undergraduate, under the guidance of Prof. Prosenjit Gupta.
Recommender systems have been successfully used to provide items of interest to the user, from an overwhelming set of choices. However, where items being recommended have an associated location, by Tobler's First Law of Geography, there is a general preference for things nearby. This aspect of locality can be leveraged not only to improve the quality and accuracy of recommendations, but also the scalability of the recommendation algorithm by suitably partitioning the road network used for travelling to the places being recommended.
Queries of the form - find k POIs nearby, find distances from all the POIs present in the entire region and return the nearest k results. Distances to POIs that are very far away and which the user would never travel to, are also calculated. Hence, this method turns out to be inefficient. A possible solution to this is to partition the region into smaller grids, locate the user in a grid and provide recommendations from the POIs present in that grid. This would improve the running time but decrease the accuracy in some cases.
Unpartitoned network
Partitoned network
However, this method gives inaccurate results when the user lies near the boundary of a grid. This is due to the fact that
the nearest POI could lie in a neighbouring grid, which would never be considered.
To solve this limitation, the case when the user lies near a boundary is detected and the grids are shifted, in order to include the POIs
that lie in the neighbouring grids. As shifting just horizontally or vertically does not work in all cases (see fig), the grids are
shifted diagonally.
The partitioning strategy is tested by comparing three approaches:
- Unpartitioned: Original road network without any partitions.
- Paritioned: Partitioned road network.
- Partitioned with shifts: Partitioned road network with shifts.
Let the time taken by an approach be represented as ‘T’ and the accuracy as ‘A’. Then the following is observed:
- TUnpartitioned > TPartitioned with shifts > TPartitioned
- AUnpartitioned >= APartitioned with shifts >= APartitioned
where,
rapproach = Set of POIs returned by an approach
roptimal = Set of POIs returned by the Unpartitioned approach
The dataset used for this project was obtained from FourSqaure for the city of New York. This can be found here under NYC and Tokyo Check-in Dataset
.
Columns used for the project:
- Venue category ID: POI identifier
- Venue category name: Interest identifier
- Latitude
- Longitude
The cleaned dataset can be found in data/unique.csv
.
sudo apt-get install nodejs
Distances from user location to the various POIs are obtained using the Google Distance matrix API.
The code to consume the web service can be found in googleDistanceAPI.js
.
In order to use the API, an API key is required.
This key can be obtained by following this tutorial.
- Paste the API key as the value of
API_KEY
global variable ingoogleDistanceAPI.js
. - Type
make
to view available commands. - Type
make run
to run the project.