Live Website -> https://graph-gg.herokuapp.com/
A graph network of Indian celebrities built by scraping Wikipedia and other websites. The graph also includes nodes representing States, Movies, Political Parties, etc and uses Neo4j DBMS to view various celebrities' shortest paths, political connections and so on.
- Neo4j Database Management System
- React Libraries for FrontEnd.
- Django for Backend
The database of the project was populated by extensively scraping the respective person's Wikipedia page. Additional information, like the person's political affiliations were scraped from websites like https://www.myneta.info/ .
-
The database has been built in a BFS fashion. The program iterates through the Wikipedia page of the starting node (person) and adds other people mentioned there into a priority queue (based on the "importance" of these people). Links are added between these people and the original node. The original node is then popped from the queue, and this same process takes place for the next node in the queue.
-
The graph database is in the form of several nodes(vertices) interconnected by edges. There are three types of nodes in the database:
-
Person
-
State (eg. Maharashtra, Delhi, etc). Persons are linked to the states they are associated with using edges.
-
Movies : Actors are linked to the movies they acted in using edges.
-
An example of viewing the relationships of a 'Person' with other Persons.
The main objective of the project was to answer complex queries, like
- What is the "shortest path" between 2 celebrities, eg. Sachin Tendulkar and Sushant Singh Rajput?
- How many "mutual" friends do 2 celebrities have, eg. Narendra Modi and Atal Bihari Vajpayee?
- Can we find "friend recommendations" for a celebrity, i.e. people who do not currently have any relationships with the person, but are most likely to know him/her?
The Shortest Path algorithm in Neo4j allows us to specify the nodes between which we want the shortest path, as well as the type of edges and nodes that should appear on the shortest path. The same has been done in the hosted site using React.
- An example query: Shortest Path between Sachin Tendulkar and Sushant Singh Rajput.
- The same query in the backend:
- As we can see, the path goes through MS Dhoni, since Sushant Singh Rajput acted in MS Dhoni's biopic.
A modification we have tried to implement in our project is to show the shortest path between actors only considering the movies they have acted in as links.
- A sample query: The shortest path between Aamir Khan and Shah Rukh Khan using film links (The two have never worked together).
We envisioned this query to be useful when the two celebrities have a large number of contacts in common.
-
In such cases, we would like to view who these "common friends" are.
-
An example query : Atal Bihari Vajpayee and Narendra Modi, both Prime Ministers of India from the Bhartiya Janata Party.
-
As we can see, the answer is a wide range of political figures across the spectrum.
- Here, the friend recommendations of a person are his "2nd degree connections", i.e. the people he does not know directly but knows through his own direct connections.
Various other features have been implemented and they can be tested on http://graph-gg.herokuapp.com/!