A Python program used to pre-process Brooklyn Multi-Interaction Corpus.
The dataset can be requested from the original authors (https://github.com/andreas-weise/bmic)
The program extracts metadata information from the BMIC corpus and combines the output into one big file.
The program reads the following files
-session_metadata JSON files from each folder
-ipu_data files from each session.
The program can be executed in the following steps.
Step 0: Download the Brooklyn Multi-Interaction Corpus: Part 1.
Step 1: Open Brooklyn.ipynb file.
Step 2: Sequentially execute each program. The instructions are provided in the ipynb file.
Step 3: Save the output and analyze the data using R.