Skip to content
This repository has been archived by the owner on Mar 8, 2023. It is now read-only.

Commit

Permalink
Remove # from dataset (#14)
Browse files Browse the repository at this point in the history
workaround to remove # from dataset, since sentence collector erroneously allowed #.
Fixed issue #13
  • Loading branch information
mone27 authored and Mte90 committed Oct 2, 2019
1 parent b9dc010 commit 3c80cdd
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions DeepSpeech/generate_alphabet.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@ pushd $HOME/ds/
all_train_csv="$(find /mnt/extracted/data/ -type f -name '*train.csv' -printf '%p,' | sed -e 's/,$//g')"
all_dev_csv="$(find /mnt/extracted/data/ -type f -name '*dev.csv' -printf '%p,' | sed -e 's/,$//g')"
all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p,' | sed -e 's/,$//g')"

#replace '#' with '' in the whole dataset due to an error in sentence validator that allowed this char
sed -i 's/#//g' /mnt/extracted/data/*test.csv
sed -i 's/#//g' /mnt/extracted/data/*train.csv
sed -i 's/#//g' /mnt/extracted/data/*dev.csv

if [ ! -f "/mnt/models/alphabet.txt" ]; then
if [ "${ENGLISH_COMPATIBLE}" = "1" ]; then
Expand Down

0 comments on commit 3c80cdd

Please sign in to comment.