topic-arena

An arena for comparing topic model quality.

What do we want from a topic model:

Sensible overview:
- Should be representative of the documents: As many documents' content should be covered as possible. (Task 1)
- Should be informative: You should gain as much information about the corpus as possible. (TODO find relevant task)
- Should be corpus specific: You should gain information about this corpus, and preferably not just about general themes. (Task 3)
Getting into the weeds - Filtering/Retrieval
- Should get relevant documents when retrieving based on one topic (Task 2)
Discourse analysis (dynamic models, optional)

Tasks

Task 1

You get a document annotated by two topic models. You are presented with the highest scoring topics and their topic descriptions. You should choose which model's output you prefer. We should try accounting for atypical documents as well when doing this so that we see how representative models are.

Task 2

Take a single topic model and sample four documents that rank sufficiently differently on the topic. Rank the four documents in the order that you think they are relevant to the topic. We can then use some reranking metric.

Task 3

Fit two topic models on the same corpus.
Choose a very similar intruder corpus (perhaps an adjacent category in Wiki hierarchy).
Display the topic descriptions from both models.
Draw an intruder document from the intruder corpus.
Tell the participant to choose which model the document belongs to.
If one model gets more intruder documents assigned it means that it is less specific to that corpus.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

topic-arena

What do we want from a topic model:

Tasks

Task 1

Task 2

Task 3

About

Releases

Packages

License

centre-for-humanities-computing/topic-arena

Folders and files

Latest commit

History

Repository files navigation

topic-arena

What do we want from a topic model:

Tasks

Task 1

Task 2

Task 3

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages