Train myself #56

blastbeng · 2024-08-08T16:04:22Z

I am trying to use the process.py , but I have this error. Is that code still maintained?

Also I had to edit some python imports about hubert.

python process.py --path /opt/docker/compose/discord-tts-bot/bark/config/vocab.txt --mode prepare2
2024-08-08 17:14:34 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
WARNING: einx with PyTorch requires PyTorch version >= 2, but found
Traceback (most recent call last):
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/process.py", line 4, in <module>
    from prepare import prepare, prepare2
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/prepare.py", line 8, in <module>
    from bark_hubert_quantizer.pre_kmeans_hubert import CustomHubert
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/bark_hubert_quantizer/pre_kmeans_hubert.py", line 20, in <module>
    from audiolm_pytorch.utils import curtail_to_multiple
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/audiolm_pytorch/__init__.py", line 8, in <module>
    from audiolm_pytorch.audiolm_pytorch import AudioLM
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/audiolm_pytorch/audiolm_pytorch.py", line 23, in <module>
    from audiolm_pytorch.soundstream import SoundStream
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/audiolm_pytorch/soundstream.py", line 20, in <module>
    from vector_quantize_pytorch import GroupedResidualVQ
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/__init__.py", line 1, in <module>
    from vector_quantize_pytorch.vector_quantize_pytorch import VectorQuantize
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py", line 12, in <module>
    import einx
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/einx/__init__.py", line 5, in <module>
    from . import backend
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/einx/backend/__init__.py", line 1, in <module>
    from .register import register_for_module, register, get, backends, numpy
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/einx/backend/register.py", line 51, in <module>
    register_for_module("torch", _torch.create)
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/einx/backend/register.py", line 19, in register_for_module
    register(backend_factory())
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/einx/backend/register.py", line 30, in register
    raise ValueError("Backend must be an instance of einx.backend.Backend")
ValueError: Backend must be an instance of einx.backend.Backend

The text was updated successfully, but these errors were encountered:

gitmylo · 2024-08-08T16:15:42Z

Looks like the environment you're running this in is using an old pytorch version.
Make sure you're using a pytorch version at least 2.0.0
pip install -U torch torchvision torchaudio will update pytorch to the latest version

Also, looking at your command, the --path flag takes a directory, not a txt file, if you want to train, you'll need to generate a dataset first (https://github.com/gitmylo/bark-data-gen)

And last thing, going off the project name "discord-tts-bot", FYI you don't need to train a whole new quantizer model if you just want to clone a voice. The different models are just to support different languages. The huggingface repo for the official models can be found here, but using the HuBERTManager will also download the models automatically.

blastbeng · 2024-08-15T09:41:35Z

Looks like the environment you're running this in is using an old pytorch version. Make sure you're using a pytorch version at least 2.0.0 pip install -U torch torchvision torchaudio will update pytorch to the latest version

Also, looking at your command, the --path flag takes a directory, not a txt file, if you want to train, you'll need to generate a dataset first (https://github.com/gitmylo/bark-data-gen)

And last thing, going off the project name "discord-tts-bot", FYI you don't need to train a whole new quantizer model if you just want to clone a voice. The different models are just to support different languages. The huggingface repo for the official models can be found here, but using the HuBERTManager will also download the models automatically.

Thank you. So if I want to use italian do i have to train another quantifier or not?
I am trying to clone my voice (i am Italian), I don't understand if these quantifier are good for italian also
https://huggingface.co/GitMylo/bark-voice-cloning/tree/main

blastbeng · 2024-08-15T21:42:10Z

Guess i solved it by doing so:

gitmylo/bark-data-gen#3

now i just have to wait for the npy and wav generations, next step is to train my dataset.
I am doing so on 1600 ebooks.. I hope it wont take ages!!

blastbeng · 2024-08-28T08:54:23Z

@gitmylo Just one question: in your opinion, how much data do I need to generate with (https://github.com/gitmylo/bark-data-gen) to train another language?

I triied with 5000 npy/wavs but the result was bad, maybe just because i stopped the training at 0.3 loss. I don't know....

By the way I made a new version of create_data, with correspondent launch script, Do you think we can integrate this script?
If yes, I'll fork the repo and ask for PR

create_data_and_wavs.py

import os.path
import random
import uuid
import sys
import hashlib
from random import randrange

import numpy

from bark import text_to_semantic
from bark.generation import load_model, SAMPLE_RATE
from bark.api import semantic_to_waveform

from data import load_books, random_split_chunk
from scipy.io import wavfile

loaded_data = load_books()

print('Loading semantics model')
load_model(use_gpu=True, use_small=False, force_reload=False, model_type='text')
print('Loading coarse model')
load_model(use_gpu=True, use_small=False, force_reload=False, model_type='coarse')
print('Loading fine model')
load_model(use_gpu=True, use_small=False, force_reload=False, model_type='fine')

output = '/mnt/t/projects/npy_ita'
output_wav = '/mnt/t/projects/wav_ita'

if not os.path.isdir(output):
    os.mkdir(output)

loop_count = 10
try:
    if len(sys.argv) > 1:
        loop_count = int(sys.argv[1])
        print('Loop count found in cmdline:', str(loop_count))
except:
    pass

for i in range(loop_count):
        text = ''
        while not len(text) > 0:
            text = random_split_chunk(loaded_data)  # Obtain a short chunk of text
        if text is not None and text != '':
            text = text.strip()
            filename = hashlib.md5(text.encode('utf-8')).hexdigest() + '.npy'
            file_name = os.path.join(output, filename)
            if not os.path.isfile(file_name): 
                print('Loop count:', str(i))
                print('Generating semantics for text:', text)
                semantics = text_to_semantic(text, temp=round(random.uniform(0.6, 0.8), ndigits=2))
                numpy.save(file_name, semantics)

                base_name = os.path.basename(file_name)

                real_name = '.'.join(base_name.split('.')[:-1])  # Cut off the extension
                out_file = os.path.join(output_wav, f'{real_name}.wav')
                if not os.path.isfile(out_file):  # Don't process files that have already been processed
                    print(f'Processing {file_name}')
                    wav = semantic_to_waveform(numpy.load(file_name), temp=round(random.uniform(0.6, 0.8), ndigits=2))
                    wavfile.write(out_file, SAMPLE_RATE, wav)

print('Done!')

create_data_and_wavs.sh

#/usr/bin/bash
cd /opt/projects/bark-data-gen
source .venv/bin/activate; python create_data.py $1

blastbeng closed this as completed Aug 15, 2024

blastbeng reopened this Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train myself #56

Train myself #56

blastbeng commented Aug 8, 2024

gitmylo commented Aug 8, 2024

blastbeng commented Aug 15, 2024 •

edited

Loading

blastbeng commented Aug 15, 2024

blastbeng commented Aug 28, 2024 •

edited

Loading

Train myself #56

Train myself #56

Comments

blastbeng commented Aug 8, 2024

gitmylo commented Aug 8, 2024

blastbeng commented Aug 15, 2024 • edited Loading

blastbeng commented Aug 15, 2024

blastbeng commented Aug 28, 2024 • edited Loading

blastbeng commented Aug 15, 2024 •

edited

Loading

blastbeng commented Aug 28, 2024 •

edited

Loading