A topic model for the debates of the 18th German Bundestag as a showcase example

Markus Konrad markus.konrad@wzb.eu, May 2018

Important note: If you want to git clone this project, you need to install git lfs first.

Overview

For a workshop on practical topic modeling, I created this topic model as a showcase example that demostrates the steps that are necessary to take in order to arrive at a usable, informative model:

Preprocessing the raw data (preproc_raw.py)
Generating the document-term-matrix from the data (generate_tokens.py)
Evaluating topic models for a set of hyperparameters (tm_eval.py and tm_eval_plot.py)
Generating the final model using the best combination of hyperparameters (generate_model.py)
Visualizing, interpreting and analysing the model (report1.ipynb, report2.ipynb and example_analyses.py) – note that this was not the focus of the workshop and hence only exemplary analyses are given

Used software packages

This example uses Python 2.7 because of some dependency issues (namely the pattern package for better lemmatization of German texts does not support Python 3).

These are the main software packages in use:

tmtoolkit for evaluating models in parallel, calculating some model statistics and visualizations
lda for topic modeling with LDA using Gibbs sampling
PyLDAVis and Jupyter Notebooks for interactive visualizations

All software dependencies can be installed via pip install -r requirements.txt.

Data

The data for the debates comes from offenesparlament.de.

License

Licensed under Apache License 2.0 (except for the data from offenesparlament.de which have their own license). See LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A topic model for the debates of the 18th German Bundestag as a showcase example

Overview

Used software packages

Data

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
data		data
fig		fig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example_analyses.py		example_analyses.py
generate_model.py		generate_model.py
generate_tokens.py		generate_tokens.py
preproc_raw.py		preproc_raw.py
report1.ipynb		report1.ipynb
report2.ipynb		report2.ipynb
requirements.txt		requirements.txt
tm_eval.py		tm_eval.py
tm_eval_plot.py		tm_eval_plot.py

License

WZBSocialScienceCenter/tm_bundestag

Folders and files

Latest commit

History

Repository files navigation

A topic model for the debates of the 18th German Bundestag as a showcase example

Overview

Used software packages

Data

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages