Training has no effect #144

aled93 · 2023-09-26T03:16:46Z

aled93
Sep 26, 2023

First of all I should say that I have most deep learning non-friendly GPU 1660 super, I know pytorch don't like this GPU and that it produces NaN when deals with fp16 (half). But I see you fixed it (#90) so that feature extraction uses fp32.

But what about training? Is it uses fp16 somewhere?

My problem is that no matter how many epochs I train model result voice is same as in base checkpoint f0. I tried train on my own dataset and crystal clear dataset from huggingface and both sounds exact as f0. Train config is mangio-crepe, 128 hop length, 3 filter radius, 1e-4 learning rate. After training stopped I refresh checkpoints list, select new checkpoint, click "copy to rvc models" and in rvc tab first unload, refresh and select model.

I tested Mangio-RVC-Fork, after little modification (use CPU for feature extraction) even after 10 epochs on both datasets result is noticeable, specially on dataset from huggingface.

gitmylo · 2023-09-26T12:32:50Z

gitmylo
Sep 26, 2023
Maintainer

I'm not sure, if it was related to the NaN issue it would just completely not work.

And have you properly prepared the dataset? data/training/RVC/name should have 2 folders starting with "0_" and 3 starting with "1_".

4 replies

aled93 Sep 26, 2023
Author

Folders exists. Each folder is not empty

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----          26.09.2023    11:00                0_16k
d----          26.09.2023    11:09                0_gt
d----          26.09.2023    11:02                1_f0
d----          26.09.2023    11:02                1_f0nsf
d----          26.09.2023    11:02                1_feat
d----          26.09.2023    11:46                models
-a---          26.09.2023    11:03      102305419 black_house_2_added.index
-a---          26.09.2023    11:03        2549899 black_house_2_trained.index
-a---          26.09.2023    11:03       99489920 total_fea.npy
-a---          26.09.2023    11:00            210 workspace.json

gitmylo Sep 26, 2023
Maintainer

Odd, can you try training on the google colab (if it's currently working, since TTS had another possibly breaking update) and see if it works on there?

aled93 Sep 26, 2023
Author

Training on colab is ok. Graph of loss on colab dropped from ~8 to <2 in 10 steps while on localhost it fluctuate between 6 and 8 all time. Same dataset, same config.

Colab	Local

gitmylo Sep 26, 2023
Maintainer

Yeah, it really looks like something's going on during training then. I'm not sure, not sure where exactly the issue lies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training has no effect #144

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Training has no effect #144

aled93 Sep 26, 2023

Replies: 1 comment · 4 replies

gitmylo Sep 26, 2023 Maintainer

aled93 Sep 26, 2023 Author

gitmylo Sep 26, 2023 Maintainer

aled93 Sep 26, 2023 Author

gitmylo Sep 26, 2023 Maintainer

aled93
Sep 26, 2023

Replies: 1 comment 4 replies

gitmylo
Sep 26, 2023
Maintainer

aled93 Sep 26, 2023
Author

gitmylo Sep 26, 2023
Maintainer

aled93 Sep 26, 2023
Author

gitmylo Sep 26, 2023
Maintainer