TranscriptNLP

Project for comparing the caption of popular youtuber's apology videos. Scores for similarity.

Get Started

Structure this project:

PARENTFOLDER/
    TRANSCRIPTS/
        TEXT/
        JSON/
        GRAPHS/
        ratio_list.json
        youtube_vids.json
        
    TranscriptNLP/
        graph.py
        main.py

Clone the repo.
Run pip install -r requirements.txt
Go to main.py and change score_word to True.
Run main.py.
Profit!

Overview

main.py

Using the files in the TEXT/ folder, it will train a gensim model with "wiki-gigaword-100" and all the sentences from the transcripts. It will generate a score and save it to similarity_score.json.

Using the ratio_list.json generated by TranscriptCollect, it will create a .csv file and save it as ratios.csv.

graph.py

Using similarity_score.json it will create a heatmap png named similarity_graph.png. Requires /TEXTS/ to match the file.

Using ratios.csv, it will create a grouped bar graph named ratio_graph.png.

Need to get the captions?

Try this repo out!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
graph.py		graph.py
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt
sources.md		sources.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TranscriptNLP

Get Started

Overview

main.py

graph.py

Need to get the captions?

About

Releases

Packages

Languages

Mapleia/TranscriptNLP

Folders and files

Latest commit

History

Repository files navigation

TranscriptNLP

Get Started

Overview

main.py

graph.py

Need to get the captions?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages