WikiGraph

This project is the exam of a basic Python programming course at my university. I release this mainly for educational purpose.

Dependencies

Python 3.8+ (older versions might work but remain untested)
requests (http interaction with Wikipedia)
igraph + cairo (save graph representations to images and documents)

Project Goals

From Exam Task

breadth-first-search starting from any given Wikipedia article to every linked article
- This must be implemented without any libraries except requests!
command line parameters (see Command Line Parameters):
- maximum node limit K (see -K)
- maximum depth limit D (see -D)
output as adjacency matrix (see -m)
basic graph operations:
- get all neighbours / edges
- search for the node with highest / lowest degree (see -p)
- calculate graph density (see -p)
export as image (see --png)
highlight keywords (see -h)

Additional Self Defined

simple http cache (see --cache)
- This is a directory specified via command line parameters to save every requested articles HTML code to. There is only very simple checks to ensure cache validity and no expiry at all.
maximum references per article limit (see -R)
exclude articles from parsing (see -e)
- It is possible to define an exclude function instead of a keyword list when using the GraphBuilder object from your own code to allow for fine-grained filtering.
export as PDF (see --pdf)

Getting Started

create and activate a virtual python environment

python -m venv env
source env/bin/activate

install python dependencies

pip install requests python-igraph cairocffi

run main.py

python main.py --png musk.png https://en.wikipedia.org/wiki/Elon_Musk

Command Line Parameters

parameter	expects	default	description
--help			show help
--help-md			show help formatted as markdown
-v, --verbose			set log level to info
-D, --maximum-depth	number	10	maximum distance from start article
-K, --maximum-nodes	number	500	maximum nodes in graph
-R, --maximum-references	number		maximum references used per article
-e, --exclude	identifier		exclude article from result graph
--png	path		save graph to given png file
--pdf	path		save graph to given pdf file
-h, --highlight	keyword		highlight articles containing a given phrase
--cache	directory		directory to store downloaded HTML files in
-p, --properties			print graph properties to stdout
-m, --matrix			print adjacency matrix to stdout

Further Examples

print properties and adjacency matrix for a limited graph:

python main.py -pm -K 12 -R 3  https://en.wikipedia.org/wiki/Elon_Musk

save graph to musk.pdf, highlight Tesla and Bitcoin, skip SpaceX:

python main.py -e SpaceX -h Tesla -h BitCoin --pdf musk.pdf https://en.wikipedia.org/wiki/Elon_Musk

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
wikigraph		wikigraph
.gitignore		.gitignore
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WikiGraph

Dependencies

Project Goals

From Exam Task

Additional Self Defined

Getting Started

Command Line Parameters

Further Examples

About

Releases

Packages

Languages

erictroebs/wikigraph

Folders and files

Latest commit

History

Repository files navigation

WikiGraph

Dependencies

Project Goals

From Exam Task

Additional Self Defined

Getting Started

Command Line Parameters

Further Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages