Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train myself #56

Open
blastbeng opened this issue Aug 8, 2024 · 4 comments
Open

Train myself #56

blastbeng opened this issue Aug 8, 2024 · 4 comments

Comments

@blastbeng
Copy link

I am trying to use the process.py , but I have this error. Is that code still maintained?

Also I had to edit some python imports about hubert.

python process.py --path /opt/docker/compose/discord-tts-bot/bark/config/vocab.txt --mode prepare2
2024-08-08 17:14:34 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
WARNING: einx with PyTorch requires PyTorch version >= 2, but found
Traceback (most recent call last):
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/process.py", line 4, in <module>
    from prepare import prepare, prepare2
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/prepare.py", line 8, in <module>
    from bark_hubert_quantizer.pre_kmeans_hubert import CustomHubert
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/bark_hubert_quantizer/pre_kmeans_hubert.py", line 20, in <module>
    from audiolm_pytorch.utils import curtail_to_multiple
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/audiolm_pytorch/__init__.py", line 8, in <module>
    from audiolm_pytorch.audiolm_pytorch import AudioLM
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/audiolm_pytorch/audiolm_pytorch.py", line 23, in <module>
    from audiolm_pytorch.soundstream import SoundStream
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/audiolm_pytorch/soundstream.py", line 20, in <module>
    from vector_quantize_pytorch import GroupedResidualVQ
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/__init__.py", line 1, in <module>
    from vector_quantize_pytorch.vector_quantize_pytorch import VectorQuantize
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py", line 12, in <module>
    import einx
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/einx/__init__.py", line 5, in <module>
    from . import backend
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/einx/backend/__init__.py", line 1, in <module>
    from .register import register_for_module, register, get, backends, numpy
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/einx/backend/register.py", line 51, in <module>
    register_for_module("torch", _torch.create)
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/einx/backend/register.py", line 19, in register_for_module
    register(backend_factory())
  File "/opt/projects/bark-voice-cloning-HuBERT-quantizer/.venv/lib/python3.10/site-packages/einx/backend/register.py", line 30, in register
    raise ValueError("Backend must be an instance of einx.backend.Backend")
ValueError: Backend must be an instance of einx.backend.Backend
@gitmylo
Copy link
Owner

gitmylo commented Aug 8, 2024

Looks like the environment you're running this in is using an old pytorch version.
Make sure you're using a pytorch version at least 2.0.0
pip install -U torch torchvision torchaudio will update pytorch to the latest version

Also, looking at your command, the --path flag takes a directory, not a txt file, if you want to train, you'll need to generate a dataset first (https://github.com/gitmylo/bark-data-gen)

And last thing, going off the project name "discord-tts-bot", FYI you don't need to train a whole new quantizer model if you just want to clone a voice. The different models are just to support different languages. The huggingface repo for the official models can be found here, but using the HuBERTManager will also download the models automatically.

@blastbeng
Copy link
Author

blastbeng commented Aug 15, 2024

Looks like the environment you're running this in is using an old pytorch version. Make sure you're using a pytorch version at least 2.0.0 pip install -U torch torchvision torchaudio will update pytorch to the latest version

Also, looking at your command, the --path flag takes a directory, not a txt file, if you want to train, you'll need to generate a dataset first (https://github.com/gitmylo/bark-data-gen)

And last thing, going off the project name "discord-tts-bot", FYI you don't need to train a whole new quantizer model if you just want to clone a voice. The different models are just to support different languages. The huggingface repo for the official models can be found here, but using the HuBERTManager will also download the models automatically.

Thank you. So if I want to use italian do i have to train another quantifier or not?
I am trying to clone my voice (i am Italian), I don't understand if these quantifier are good for italian also
https://huggingface.co/GitMylo/bark-voice-cloning/tree/main

@blastbeng
Copy link
Author

Guess i solved it by doing so:

gitmylo/bark-data-gen#3

now i just have to wait for the npy and wav generations, next step is to train my dataset.
I am doing so on 1600 ebooks.. I hope it wont take ages!!

@blastbeng
Copy link
Author

blastbeng commented Aug 28, 2024

@gitmylo Just one question: in your opinion, how much data do I need to generate with (https://github.com/gitmylo/bark-data-gen) to train another language?

I triied with 5000 npy/wavs but the result was bad, maybe just because i stopped the training at 0.3 loss. I don't know....

By the way I made a new version of create_data, with correspondent launch script, Do you think we can integrate this script?
If yes, I'll fork the repo and ask for PR

create_data_and_wavs.py

import os.path
import random
import uuid
import sys
import hashlib
from random import randrange

import numpy

from bark import text_to_semantic
from bark.generation import load_model, SAMPLE_RATE
from bark.api import semantic_to_waveform

from data import load_books, random_split_chunk
from scipy.io import wavfile

loaded_data = load_books()

print('Loading semantics model')
load_model(use_gpu=True, use_small=False, force_reload=False, model_type='text')
print('Loading coarse model')
load_model(use_gpu=True, use_small=False, force_reload=False, model_type='coarse')
print('Loading fine model')
load_model(use_gpu=True, use_small=False, force_reload=False, model_type='fine')

output = '/mnt/t/projects/npy_ita'
output_wav = '/mnt/t/projects/wav_ita'

if not os.path.isdir(output):
    os.mkdir(output)

loop_count = 10
try:
    if len(sys.argv) > 1:
        loop_count = int(sys.argv[1])
        print('Loop count found in cmdline:', str(loop_count))
except:
    pass

for i in range(loop_count):
        text = ''
        while not len(text) > 0:
            text = random_split_chunk(loaded_data)  # Obtain a short chunk of text
        if text is not None and text != '':
            text = text.strip()
            filename = hashlib.md5(text.encode('utf-8')).hexdigest() + '.npy'
            file_name = os.path.join(output, filename)
            if not os.path.isfile(file_name): 
                print('Loop count:', str(i))
                print('Generating semantics for text:', text)
                semantics = text_to_semantic(text, temp=round(random.uniform(0.6, 0.8), ndigits=2))
                numpy.save(file_name, semantics)

                base_name = os.path.basename(file_name)

                real_name = '.'.join(base_name.split('.')[:-1])  # Cut off the extension
                out_file = os.path.join(output_wav, f'{real_name}.wav')
                if not os.path.isfile(out_file):  # Don't process files that have already been processed
                    print(f'Processing {file_name}')
                    wav = semantic_to_waveform(numpy.load(file_name), temp=round(random.uniform(0.6, 0.8), ndigits=2))
                    wavfile.write(out_file, SAMPLE_RATE, wav)

print('Done!')

create_data_and_wavs.sh

#/usr/bin/bash
cd /opt/projects/bark-data-gen
source .venv/bin/activate; python create_data.py $1

@blastbeng blastbeng reopened this Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants