A program to find the shortest path between two Wikipedia articles.
- Clone the repository
- Run
make
.
On the first run, this will download and parse the Wikipedia dump. This will take a long time. - Run
./WikiHopper
This will first load the aforementioned file, which will take a long time.
This process will take a long time, so it is recommended to run the following install script:
git clone https://github.com/MRegirouard/WikiHopper.git && cd WikiHopper && make -j3 && ./WikiHopper
Note that only three make
jobs need to be run, as there are only a few files to build.
The program will promt for two article names, and then will find the shortest path from the start to the end article, traversing via page links.
Note that some paths may not still work, as Wikipedia is often changed, and the dataset is from 2018.
The program uses a iterative breadth-first search to traverse the article graph, and a hashmap for fast retreival. Articles are first loaded from the text file into an Article
object, which contains the article name, and a vector of pointers to other Article
objects that this one links to within Wikipedia. These objects are then put into an unordered_map
, so they can be found in constant time to determine if an article exists or not. The breadth-first search then finds the shortest path between nodes, in a relatively fast manner.