Files:
py/
: notebooks and scripts for anlaysisdev/
: deprecated scriptsanalysis.ipynb
: setting scope of analysis and data cleaningeda.ipynb
: exploratory analysis, creating term-document matrix, frequencies and similarity scoreslda_general.ipynb
: topic modeling with LDA on first set of comments pulled form reddit, not for a specific date rangetm-bert.ipynb
: topic modeling with pre-trained language models usingBERTopic
library plus sentiment analysis
dat/
: S1 and S2 sub ids & comments in.pkl
formatimg/
: images created by notebooks and other sources for presentation- `plots/' : plots created from scripts like word clouds, bar charts, etc.
Data:
- Reddit comments from 'r/euphoria' using
psaw
library- S1: June 16 - Aug 4 , 2019
- S2: Jan 9 - Feb 27, 2022