Skip to content
This repository has been archived by the owner on Jun 27, 2023. It is now read-only.

Python vs. pocketsphinx_continuous/_batch - same config, different results #60

Open
JindrichSindelar-eaton opened this issue Jun 4, 2020 · 0 comments

Comments

@JindrichSindelar-eaton
Copy link

JindrichSindelar-eaton commented Jun 4, 2020

Hello all,

I'm playing with Pocketsphinx for few days and was curious about the differences in behavior of the Python library vs. the available executables (pocketsphinx_continuous, pocketsphinx_batch).
I have enabled the Verbose flag for the Python version and adapted the 3 fields that were different from the logs I got from the mentioned executables (vad_threshold, kws_threshold, allphone_ci). My expectations were that the outputs of my python code below will match to one of the outputs generated by the bash scripts I call the executables from, but that doesn't happen.

Could you please give me some hints what else is different, what is the reason of these differences? The audio files used for all the programs are the same and all are mono, 16kHz 16-bit signed little-endian.

(Switching the ps.decode() arguments: no_search = True has no effect on the output, full_utt = True then doesn't produce any output at all. Where can I find what exactly do these two flags mean?)

Below I'm attaching the codes and files with the corresponding transcription outputs and configuration logs.

Python code (corresponding attachments: python_output_tuned.hyp.txt, python_tuned.log.txt):

import os
from os import path, listdir
from pocketsphinx import Pocketsphinx, get_model_path
import sox

model_path = get_model_path()
config = {
    # using the default values - see https://pypi.org/project/pocketsphinx/
    'hmm': os.path.join(model_path, 'en-us'),
    'lm': os.path.join(model_path, 'en-us.lm.bin'),
    'dict': os.path.join(model_path, 'cmudict-en-us.dict'),
    'sampling_rate': 16000,
    'verbose': True,
    # with following configs, the settings should exactly match what we can reach with the wrapped scripts
    'vad_threshold': 2.0,
    'kws_threshold': 1.0,
    'allphone_ci': False
}

ps = Pocketsphinx(**config)

# path to the directory where the .wav's are stored
directory = "../my_records/jindra/converted"

out_hyp_file_path = "./python_output_github.hyp"
out_hyp_file = open(out_hyp_file_path, "w")


file_list = os.listdir(directory)
# sort the list by alphabet (default order is "arbitrary") to obtain outputs diff-able with outputs of pocketsphinx_batch
file_list.sort()

for entry in file_list:
    entry_file = os.path.join(directory, entry)
    if(os.path.isfile(entry_file) and (entry[-4:] == ".wav")):
        ps.decode(audio_file = entry_file, buffer_size = 2048, no_search = False, full_utt = False)

        hypothesis = ps.hypothesis()
        # format similar to outputs of pocketsphinx_batch
        out_hyp_file.write(hypothesis + " (" + entry[:-4] + ")\n")

out_hyp_file.close()

Pocketsphinx_cont_wrapper.sh (output_continuous.hyp.txt,
continuous.log.config.txt):

# !bin/bash

# make sure you're running from .venv where your pocketsphix is installed
model_dir=$(python3 -c "from pocketsphinx import get_model_path; print(get_model_path())")

curr_dir=$(pwd)
cd $1

out_file=output_continuous.hyp

if test -f "$out_file"; then
    rm $out_file
fi

for f in *.wav
do
    hyp=$(pocketsphinx_continuous   -infile $f \
                                    -hmm "${model_dir}/en-us" \
                                    -lm "${model_dir}/en-us.lm.bin" \
                                    -dict "${model_dir}/cmudict-en-us.dict" \
                                    -samprate 16000 \
                                    )
    f_name=$(basename $f .wav)
    # this shall give similar output format as pocketsphinx_batch, so we can simply diff it
    echo "${hyp} (${f_name})" >> $out_file
done

cd $curr_dir

pocketsphinx_batch_wrapper.sh (output_batch.hyp.txt,
batch.log.config.txt):

# !bin/bash

# make sure you're running from .venv where your pocketsphix is installed
model_dir=$(python3 -c "from pocketsphinx import get_model_path; print(get_model_path())")

curr_dir=$(pwd)
cd $1

ctl_filename="ctlfile.txt"

# there's no -q flag for rm, so do it this way?
if test -f "$ctl_filename"; then
    rm $ctl_filename
fi

for f in *.wav
do
    echo $(basename $f .wav) >> $ctl_filename

done

# The adcin seems to be important here
# https://cmusphinx.github.io/wiki/tutorialtuning/
pocketsphinx_batch  -adcin yes \
                    -cepdir . \
                    -cepext .wav \
                    -ctl $ctl_filename \
                    -hmm "${model_dir}/en-us" \
                    -lm "${model_dir}/en-us.lm.bin" \
                    -dict "${model_dir}/cmudict-en-us.dict" \
                    -samprate 16000 \
                    -hyp output_batch.hyp

cd $curr_dir
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant