Skip to content

d-vignesh/ComedianTranscriptAnalysis

Repository files navigation

This repo contains scripts that uses the nlp techniques to obtain insights from a set of comedian transcripts. step 1 : We start with scraping the transcripts data and applying text cleaning techniques and create the Document Term matrix.
step 2 : some exploratory data analysis on the dataset like constructing wordcloud, obtaining the word frequency and profanity to verify whether our data makes sense.
step 3 : Perform sentiment analysis on the transcript using Textblob and get info on how each comedian's sentiment varies over the routine.
step 4 : Perform topic modelling using Latent Dirichlet Allocation and try come with the topics each comedian mostly uses in their comedy.
step 5 : as a fun task we try the task of text generation. We try the markov_chain techinque and also RNN to generate similar transcripts.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published