Skip to content

Lazily extracting utterances

Douglas Bagnall edited this page Aug 3, 2012 · 2 revisions

Extracting good sized utterances from a long transcribed recording

Manually cutting up and aligning audio files is a fiddly nuisance. The method described here tries to minimise effort at the expense of missing some perfectly good utterances. It works for the likes of Hansard.

Cutting in Audacity

  1. Open the audio file in Audacity
  2. Select Analyse->Silence Finder or Analyse->Sound Finder
  3. Reduce the silence length a little bit (around 0.3 seconds is a good start).
  4. The silence finder will put labels at each silence that size or bigger.
  5. Select File->Export Multiple, choose WAV and prefix-and-number naming.

Actually, before Export Multiple, you might want to visit Edit->Preferences->Import/Export and turn off the metadata editing pop-up.

Are there enough the right size?

Festival recommends utterances between 5 and 30 seconds long. WAV file sizes are proportional to their length, but depend on the number of channels and sample rate.

Here are bit rates in decimal k.

     seconds ->  1      5      30
---------------------------------
16   mono       32    160     960
44.1 mono       88    441    2646
44.1 stereo    176    882    5292
48   mono       96    480    2880
48   stereo    192    960    5760

Find out your files' bit rate and see if there are a good number of files the right size:

# for 44.1 stereo
ls | wc -l
find -size -900k -type f | wc -l
find -size +5000k -type f | wc -l

This tells you haw many you are throwing out. If there are not many the right size, go back to Audacity and search again for silences using different parameters. Then when it is right:

find -size -900k -type f | xargs rm
find -size +5000k -type f | xargs rm

Fixing the numbering

Audacity numbers the files without zero padding, so they sort in the wrong order. Fix that with:

rename 's/(\d+.wav)$/00000$1/' *.wav
rename 's/0+(\d{4}.wav)$/$1/' *.wav

(/usr/bin/rename seems to come with Perl).

listen to audio and find text

Open the transcript in a text editor. Keep a backup. Listen to the files with:

 mkdir done
 for x in *.wav; do play $x; echo $x; read && mv $x done; done

Delete everything up to the first word, write in the file number, then insert carriage returns after the last word.

Look out for any numerals -- you'll need to spell them out.