You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using Ossian to train a Bangla (Bengali) voice. My data-set consists of ~4000 sentences (7 hours of speech). The error graph I obtained after training the acoustic model looks like this:
I have used (almost) all the default settings, except changing some hyper-parameters as follows:
batch_size : 128
training_epochs : 15
L2_regularization: 0.003
The synthesized speech does not sound bad. But I think there are lot of rooms for improvements available by looking at the error graph. Can someone direct me to any changes to improve the acoustic model? Do I need more data (I am working on it), or reduce the size/layer of the NN? Any suggestions about the hyper-parameters? Thanks.
The text was updated successfully, but these errors were encountered:
I am using Ossian to train a Bangla (Bengali) voice. My data-set consists of ~4000 sentences (7 hours of speech). The error graph I obtained after training the acoustic model looks like this:
I have used (almost) all the default settings, except changing some hyper-parameters as follows:
The synthesized speech does not sound bad. But I think there are lot of rooms for improvements available by looking at the error graph. Can someone direct me to any changes to improve the acoustic model? Do I need more data (I am working on it), or reduce the size/layer of the NN? Any suggestions about the hyper-parameters? Thanks.
The text was updated successfully, but these errors were encountered: