You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The image and gesture need to be aligned, the notebook alignment code and gesture extraction code is demo.ipynb.
The 40 images generated by Flux provided on the homepage are aligned. The simplest method is to use the provided images as reference and use ControlNet or image generation to generate similar portraits.
Currently, we have strictly selected frontal and half-body data for training, so the algorithm does not support side views or non-standard images. The following are some unsupported types.
Currently, the SD-based algorithm has high requirements for the graphics card and graphics memory, requiring around 16GB to run, and it runs slowly on ordinary consumer cards.
The accelerated version is currently being trained, and is expected to be announced soon. The specific acceleration performance of the accelerated version can be referenced to the V1's accelerated version.
3. Video Noise Issue
The noise in the generated video can be reduced by adjusting the CFG parameter. The adjustment range is from 1.5 to 3.0. A lower CFG value results in better video quality but poorer lip-sync accuracy. Conversely, a higher CFG value leads to lower video quality but better lip-sync accuracy.
cfg1-5.mp4
cfg1-8.mp4
cfg2-0.mp4
cfg2-5.mp4
4. How to Generate High-Quality Reference Images
The 40 images generated by Flux on the homepage are aligned, the simplest method is to use the provided images as reference and generate similar portraits using ControlNet or graphic generation tools.
The images provided on the homepage are generated by Flux using the Lora models:
Our EchoMimicV2 can fundamentally generate video of unlimited length. As long as GPU memory allows, the length can be increased.
A common question from users is why their attempts to extend the video length are capped at 13 seconds. This is because they are using our provided test pose sequence samples of 13 seconds length. If custom poses of longer length are used, this issue will not occur.
We take this issue very seriously. We are currently developing a Jupyter notebook, which will be released soon. This notebook will include features for custom pose sequences, reference image alignment, and segmented inference. Please stay tuned.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
1. Correct Usage
2. Inference Speed
3. Video Noise Issue
cfg1-5.mp4
cfg1-8.mp4
cfg2-0.mp4
cfg2-5.mp4
4. How to Generate High-Quality Reference Images
5. Generated Video Length
Beta Was this translation helpful? Give feedback.
All reactions