⭐ A template repository for linguistic analysis of conversations from scratch using ConvoKit.
This repository provides a quick way to analyze custom conversational datasets by first converting them into a ConvoKit Corpus object. The ConvoKit module has many interesting linguistic analyses you can run on your conversation data. This template covers creating a corpus from your data (MakeConvokitCorpus.ipynb), basic descriptive statistics (DescriptiveStatistics.ipynb), linguistic coordination (LinguisticCoordination.ipynb), and politness strategies speakers employ (Politeness.ipynb).
- Step 1: Click the green button that says
Use this template
at the top. - Step 2: Follow the instructions on the screen to create your repository.
- Step 3: Clone it to your local machine (you can use GitHub Desktop).
- Step 4: Go to
.gitignore
and uncommentdata/
to keep your data private at all times. - Step 5: Upload your data (
master.csv
andtranscritps
) to your local repostiory. There is an examplemaster.csv
in this template to help you with formatting. - Step 6: Run MakeConvokitCorpus.ipynb to create your corpus.
- Step 7: Check out the python notebooks in the
analysis
folder.
.
├── data
│ ├── processed
│ │ └── corpus
│ └── raw
│ ├── master.csv
│ └── transcripts
├── processing
│ └── MakeConvokitCorpus.ipynb
├── analysis
│ ├── LinguisticCoordination.ipynb
│ ├── DescriptiveStatistics.ipynb
│ ├── Politeness.ipynb
│ └── utils.py
├── results
├── viz
└── README.md