Project for comparing the caption of popular youtuber's apology videos. Scores for similarity.
Structure this project:
PARENTFOLDER/
TRANSCRIPTS/
TEXT/
JSON/
GRAPHS/
ratio_list.json
youtube_vids.json
TranscriptNLP/
graph.py
main.py
- Clone the repo.
- Run
pip install -r requirements.txt
- Go to
main.py
and changescore_word
toTrue
. - Run
main.py
. - Profit!
Using the files in the TEXT/
folder, it will train a gensim model with "wiki-gigaword-100"
and all the sentences from the transcripts. It will generate a score and save it to similarity_score.json.
Using the ratio_list.json generated by TranscriptCollect, it will create a .csv file and save it as ratios.csv.
Using similarity_score.json it will create a heatmap png named similarity_graph.png. Requires /TEXTS/ to match the file.
Using ratios.csv, it will create a grouped bar graph named ratio_graph.png.
Try this repo out!