story-dna

Tropes are story-telling tools; ie the essence of what makes up a story. Every single story ever can be described as a combination of tropes just like how every single person ever can be described as a combination of DNA. Furthermore there seems to be a trend in data science to call this kind of stuff DNA so I'm going with it.

The idea is to produce for each work on tvtropes.org an array that describes what tropes it contains. Using this tropelist data set, a variety of analyses can be conducted on art.

Obtain a set of parameters that when provided with a tropelist data set can predict what the quality of a new, unrated work will be. This could help artists determine what tropes to use in order to be successful.
Run a clustering algorithm on the same data to find patterns and similarities. Maybe help with quantitative genre definition.
One could even use their own personal ratings for every show to build a recommender system, based on the features/tropes of different works.

How it works: Provide some sort of rating (eg imdb) to go with each show's data. Run the trope data and rating data for each show through a machine learning algorithm. Right now I'm looking at stochastic gradient descent. While this requires the use of integers for ratings, I'm just gonna make every rating 1/100 or 1/1000 and call it a day. Yes, this means there would be 100s of labels to classify to and 10000s of features for that matter.

============

mastertropelistmaker.py

Scrapes all of tvtropes.org for a list of every single trope. There are supposed to be 25866 but my tool consistently only finds 17183. It's designed to let you scrape a little at a time, keyboard interrupting whenever you feel like. You could run this yourself but I'm also providing the mastertropelist.csv

findallworks.py

Automatically produces a list of all links for a given medium. Ideally you'd run this before tlm.py.

tlm.py (trope list maker)

Right now this takes a single work's page along with some sort of rating, scrapes the tropes from the page, and appends a new row to the "masterarraylist" containing the work's title, the binary tropelist array, the rating label, and the total number of tropes found. Also contains a function to scrape imdb based off tvtropes titles.

classifier.py

Takes all the "masterarraylist" data and runs it through a scikit classifier.

nottropes.csv

I noticed some tropes I was scraping are not really story-telling tools so much as concepts that didn't fit well in other places on the site; eg "Sequel" "TVTropesWillRuinYourLife"

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
classifier.py		classifier.py
findallworks.py		findallworks.py
linklist.py		linklist.py
mastertropelist.csv		mastertropelist.csv
mastertropelistfixer.py		mastertropelistfixer.py
mastertropelistmaker.py		mastertropelistmaker.py
newmastermaker.py		newmastermaker.py
nottropes.csv		nottropes.csv
tlm.py		tlm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

story-dna

About

Releases

Packages

Languages

austincap/story-dna

Folders and files

Latest commit

History

Repository files navigation

story-dna

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages