Corpus Statistics Analyses (corstatana)

By Julien Karadayi and Alejandrina Cristia For JSALT 2019 Speaker detection from single microphone in adverse scenarios

The goal is to link voice, role, or talker diarization/detection system performance to characteristics of files and/or speakers on files. Look inside computation/scripts for the scripts, computation/results for the csv's describing files and speakers in files, and description/ for examples of analyses (e.g. example, visualization on p. 5)

Explanation

We aimed to provide trial, file, and corpus level descriptors that can allow us to describe what different pipelines or systems are doing right or wrong.

By trial we mean the 60 second regions that have been identified as trials for the talker detection task. By file we mean a sound file, which is typically a recording (in all corpora except for babytrain) or extract from a recording (in the daylong sections of babytrain). By corpus we mean AMI, SRI, Chime5, and the subcorpora within babytrain.

Typically, the descriptors explained next will be measured only at the trial and file level, with descriptors for corpora being derived from those at the trial and file level.

Note that some of this information is already in Marvin's README for the datasets (notably child age, proportion overlapping and non-overlapping).

Properties derived from metadata

if babytrain, key child's age

Properties derived from the reference rttm

This is done at the file level for the whole file and for the trials, this is derived from that file.

file length
number of different speakers
number of speakers in each category: child, female adult, male adult, uncertain
proportion of speech that is overlapping
proportion of speech that is not overlapping
average vocalization duration
for each speaker, their total speech duration that is overlapping, total speech duration that is not overlapping

Properties derived from the reference rttm + the audio

signal to noise ratio (average loudness in speech portions divided by average loudness in nonspeech portions)
for each speaker, their average signal to noise ratio (average loudness during their non-overlapping speech portions divided by average loudness in nonspeech portions)

Properties derived from the reference rttm + the system rttm

This is done at the file level for the whole file and for the trials, this is derived from that file.

for each file and for the trials derived from that file, diarization: miss rate, false alarm rate, confusion rate, correct rate
for each speaker in each file, the amount of time in each of the following cases: correct (duration that was correctly recovered by the system); missed (duration that was missed ie classified as non-speech); confused (duration that was attributed to a different speaker)

Instructions to reproduce analyses

To generate your own report by reusing the code in description/, you will need RStudio. For further information on using Rmd for transparent (knittable) analyses, see Mike Frank & Chris Hartgerink's tutorial.

Clone the folder with git clone https://github.com/jsalt-coml/corstatana/ then cd corstatana
Launch RStudio by double-clicking on system.Rmd -- (or otherwise ensure that your working directory points to the Rmd location by doing set.wd("RIGHT PATH").
Click on the button "knit".

If you have any errors, the most likely reason is that you are missing a package. Do install.packages("MISSING PACKAGE NAME") in the RStudio terminal. If you are having trouble knitting to a pdf, you are probably missing a latex package. To avoid that hassle, just click on the small triangle to the right of knit and choose "knit to html".

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
computation		computation
description		description
system_eval		system_eval
.gitignore		.gitignore
BabyTrain_ages.csv		BabyTrain_ages.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Corpus Statistics Analyses (corstatana)

Explanation

Properties derived from metadata

Properties derived from the reference rttm

Properties derived from the reference rttm + the audio

Properties derived from the reference rttm + the system rttm

Instructions to reproduce analyses

About

Releases

Packages

Contributors 2

Languages

jsalt-coml/corstatana

Folders and files

Latest commit

History

Repository files navigation

Corpus Statistics Analyses (corstatana)

Explanation

Properties derived from metadata

Properties derived from the reference rttm

Properties derived from the reference rttm + the audio

Properties derived from the reference rttm + the system rttm

Instructions to reproduce analyses

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages