This project is the exam of a basic Python programming course at my university. I release this mainly for educational purpose.
Python 3.8+
(older versions might work but remain untested)requests
(http interaction with Wikipedia)igraph
+cairo
(save graph representations to images and documents)
- breadth-first-search starting from any given Wikipedia article to every
linked article
- This must be implemented without any libraries except
requests
!
- This must be implemented without any libraries except
- command line parameters (see Command Line Parameters):
- output as adjacency matrix (see -m)
- basic graph operations:
- export as image (see --png)
- highlight keywords (see -h)
- simple http cache (see --cache)
- This is a directory specified via command line parameters to save every requested articles HTML code to. There is only very simple checks to ensure cache validity and no expiry at all.
- maximum references per article limit (see -R)
- exclude articles from parsing (see -e)
- It is possible to define an exclude function instead of a keyword list
when using the
GraphBuilder
object from your own code to allow for fine-grained filtering.
- It is possible to define an exclude function instead of a keyword list
when using the
- export as PDF (see --pdf)
create and activate a virtual python environment
python -m venv env
source env/bin/activate
install python dependencies
pip install requests python-igraph cairocffi
run main.py
python main.py --png musk.png https://en.wikipedia.org/wiki/Elon_Musk
parameter | expects | default | description |
---|---|---|---|
--help | show help | ||
--help-md | show help formatted as markdown | ||
-v, --verbose | set log level to info | ||
-D, --maximum-depth | number | 10 | maximum distance from start article |
-K, --maximum-nodes | number | 500 | maximum nodes in graph |
-R, --maximum-references | number | maximum references used per article | |
-e, --exclude | identifier | exclude article from result graph | |
--png | path | save graph to given png file | |
path | save graph to given pdf file | ||
-h, --highlight | keyword | highlight articles containing a given phrase | |
--cache | directory | directory to store downloaded HTML files in | |
-p, --properties | print graph properties to stdout | ||
-m, --matrix | print adjacency matrix to stdout |
print properties and adjacency matrix for a limited graph:
python main.py -pm -K 12 -R 3 https://en.wikipedia.org/wiki/Elon_Musk
save graph to musk.pdf
, highlight Tesla
and Bitcoin
, skip SpaceX
:
python main.py -e SpaceX -h Tesla -h BitCoin --pdf musk.pdf https://en.wikipedia.org/wiki/Elon_Musk