Bookworm 📚

Most novels are, in some way, a description of a social network. Bookworm ingests novels, builds a solid version of their implicit character network and spits out a intuitively understandable and deeply analysable graph.

Navigation

bookworm for the code itself.
Notebooks including example usage (with a load of interwoven description of how the thing actually works), in jupyter notebook form. Start Here
data for a description of how to get hold of data so that you can run bookworm yourself.

Usage

Command Line Usage

The bookworm('path/to/book.txt') function wraps the following steps into one simple command, allowing the entire analysis process to be run easily from the command line

python run_bookworm.py --path 'path/to/book.txt'

Add --d3 to format the output for interpretation by the d3.js force directed graph
Add --threshold n where n is an integer to specify the minimum character interaction strength to be included in the output (default 2)
Add --output_file 'path/to/file' to specify where the .json or .csv should be left

Detailed API Usage

Start by loading in a book

book = load_book('path/to/book.txt')

Split the book into individual sentences, sequences of n words, or sequences of n characters by respectively running

sequences = get_sentence_sequences(book)
sequences = get_word_sequences(book, n=50)
sequences = get_character_sequences(book, n=200)

Manually input a list of character names or automatically extract a list of 'plausible' character names by respectively using

characters = load_characters('path/to/character_list.csv')
characters = extract_character_names(book)

Find instances of each character in each sequence with find_connections(), enumerate their cooccurences with calculate_cooccurence(), and transform that into a more easily interpretable format using get_interaction_df()

df = find_connections(sequences, characters)
cooccurence = calculate_cooccurence(df)
interaction_df = get_interaction_df(cooccurence, characters)

The resulting dataframe can be easily transform into a networkx graph using

nx.from_pandas_dataframe(interaction_df,
                         source='source',
                         target='target')

From there, all sorts of interesting analysis can be done. See the project's associated jupyter notebooks and the networkx documentation for more details.

Slides

I presented a bunch of this stuff at

🗽 PyData NYC 17
🍻 Databeers London

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
bookworm		bookworm
data		data
01 - Intro to Bookworm.ipynb		01 - Intro to Bookworm.ipynb
02 - Character Building.ipynb		02 - Character Building.ipynb
03 - Visualising and Analysing Networks.ipynb		03 - Visualising and Analysing Networks.ipynb
04 - Time and Chronology.ipynb		04 - Time and Chronology.ipynb
05 - Cliques and Communities.ipynb		05 - Cliques and Communities.ipynb
06 - Stable Roommates, Marriages, and Gender.ipynb		06 - Stable Roommates, Marriages, and Gender.ipynb
07 - Graph Matching and Novel Similarity.ipynb		07 - Graph Matching and Novel Similarity.ipynb
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt
run_bookworm.py		run_bookworm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bookworm 📚

Navigation

Usage

Command Line Usage

Detailed API Usage

Slides

About

Releases

Packages

Languages

License

harrisonpim/bookworm

Folders and files

Latest commit

History

Repository files navigation

Bookworm 📚

Navigation

Usage

Command Line Usage

Detailed API Usage

Slides

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages