Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add function of am streaming inference #84

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

EricFuma
Copy link

@EricFuma EricFuma commented Oct 7, 2023

  1. Adding chunk_forward function for FsmnEncoderV2 and MemoryBlockV2 module, which is based on cache and implement streaming inference chunk by chunk;

  2. Reconstruct the forward function of KanTtsSAMBERT, extract the common part into the pre_forward function, and use it as a common pre-module for the forward and forward_chunk functions to reduce the amount of redundant code; among them, chunk_forward implements The frame-level streaming inference function, which can control the mel length of each inference by changing the mel_chunk_size parameter;

  3. In the infer_sambert.py script, add the --inference_type and --mel_chunk_size parameters. Among them, --inference_type controls am's inference method, --mel_chunk_size specifies the chunk size of streaming inference (need to specify --inference_type == "streaming" at the same time)

  4. This update is an incremental update, and existing training and inference scripts and commands can run normally; the results of streaming inference and non-streaming inference have passed the consistency test, and the code has passed the pre-commit check.

@lancelee98
Copy link

试了试 合成后会有一些“噗噗”声,是声码器还需要做什么配置吗?

@wawaa
Copy link

wawaa commented Nov 13, 2023

@lancelee98 我尝试的还好,没有噗噗声。你那里用的什么版本模型?

@lancelee98
Copy link

@wawaa 我这边是自己微调的模型,可能是我模型非流式也会有一些噪音

@wawaa
Copy link

wawaa commented Dec 19, 2023

@lancelee98 听了一下我的还是也有噗噗声

@EricFuma
Copy link
Author

EricFuma commented Dec 22, 2023

@lancelee98 听了一下我的还是也有噗噗声

这部分只是 AM 模型推理的流式改造, Vocoder 也要做相应的改造才能与非流式的效果对等

@wawaa
Copy link

wawaa commented Dec 22, 2023

@lancelee98 听了一下我的还是也有噗噗声

这部分只是 AM 模型推理的流式改造, Vocoder 也要做相应的改造才能与非流式的效果对等

是的Vocoder也对AM的输出先做了pad但是还是有噗噗声,您那里有方便推荐的输入给Vocoder的chunk size和pad设置参数嘛?

@EricFuma
Copy link
Author

@lancelee98 听了一下我的还是也有噗噗声

这部分只是 AM 模型推理的流式改造, Vocoder 也要做相应的改造才能与非流式的效果对等

是的Vocoder也对AM的输出先做了pad但是还是有噗噗声,您那里有方便推荐的输入给Vocoder的chunk size和pad设置参数嘛?

pad 设置到 12 帧(含)以上,且需要确保你的 vocoder 是 casual cnn 而非 cnn,chunk size 其实并不影响

@EricFuma
Copy link
Author

@lancelee98 听了一下我的还是也有噗噗声

这部分只是 AM 模型推理的流式改造, Vocoder 也要做相应的改造才能与非流式的效果对等

是的Vocoder也对AM的输出先做了pad但是还是有噗噗声,您那里有方便推荐的输入给Vocoder的chunk size和pad设置参数嘛?

另外你可以测试下,将这个脚本生成的 mel 特征全部输入到 vocoder 中,看是否还有噗噗声,来验证下am 流式推理部分是不是好的,也辛苦反馈一下结果。后面会将 vocoder 流式改造也上传。

@wawaa
Copy link

wawaa commented Dec 22, 2023

@lancelee98 听了一下我的还是也有噗噗声

这部分只是 AM 模型推理的流式改造, Vocoder 也要做相应的改造才能与非流式的效果对等

是的Vocoder也对AM的输出先做了pad但是还是有噗噗声,您那里有方便推荐的输入给Vocoder的chunk size和pad设置参数嘛?

另外你可以测试下,将这个脚本生成的 mel 特征全部输入到 vocoder 中,看是否还有噗噗声,来验证下am 流式推理部分是不是好的,也辛苦反馈一下结果。后面会将 vocoder 流式改造也上传。

谢谢关于 casual cnn 的提醒。调整这一点后:
1、全部 mel 特征输入 vocoder 音频正常;
2、mel chunk pad 后输入 vocoder 音频正常。

@yuanmaitian
Copy link

@lancelee98 听了一下我的还是也有噗噗声

这部分只是 AM 模型推理的流式改造, Vocoder 也要做相应的改造才能与非流式的效果对等

是的Vocoder也对AM的输出先做了pad但是还是有噗噗声,您那里有方便推荐的输入给Vocoder的chunk size和pad设置参数嘛?

pad 设置到 12 帧(含)以上,且需要确保你的 vocoder 是 casual cnn 而非 cnn,chunk size 其实并不影响

请问pad的修改是在models\hifigan中hifigan.py吗?具体怎么改能告知一下吗?感谢大佬!

@yuanmaitian
Copy link

请问pad的修改是在models\hifigan中hifigan.py吗?具体怎么改能告知一下吗?感谢大佬!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants