real-time chat #2

jingli-wtbox · 2023-11-16T20:49:17Z

Thank you for sharing such great work. It's awesome.

I find it's like a real-time chat when i go through some examples, like "Examples on Image-based Chat Persona" in below page:

example

May I know if ChatAnything supports real-time chat?

thanks

ermu2001 · 2023-11-16T23:01:55Z

Setting up the conversation usually takes around 60 sec.

Afterwards Chatting would usually takes 6 sec to get respond from chatgpt.

I tested on one gpu rendering takes around 8 sec (RTX A5000). But the rendering of sadtalker could be parallelized.

You can try run locally and see whether it was real-time. :)

jingli-wtbox · 2023-11-17T00:11:53Z

thank you. will have a try on other types of GPU.

puffy310 · 2023-11-22T20:37:54Z

Can you theoretically just run this one 8xH100 and it'll work in "real-time". Maybe a real time conversation version of this software should be looked into.

zhoudaquan · 2023-11-23T04:54:40Z

Can you theoretically just run this one 8xH100 and it'll work in "real-time". Maybe a real time conversation version of this software should be looked into.

Hi, thanks for your interest in the work! we do not have H100 at hand right now... However, based on our observation, on A100 GPUs, the total time cost excluding GPT API calls is within 10s, and the face rendering process takes 1-2s. We will try to replace the ChatGPT APIs for real-time chat in the coming month..

tolecy · 2023-11-24T04:30:58Z

Great project!
I replaced chatGPT with my own small model and tested it on my own 3080ti graphics card, and the time consumption details are as follows：

===================================
Face Renderer:: 100%|80/80 [00:22<00:00, 3.49it/s]

fps:25.0
OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
seamlessClone:: 100%|318/318 [00:15<00:00, 20.66it/s]

I wonder if anyone has any efficient implementation or ideas for accelerating the video generation process. I have been interested in this recently. What I want to do now is to output the facial image generation process in synchronization with the voice after TTS is completed. However, because the facial generation process is relatively slow, the streaming effect will actually be very jerky.

(My goal now is to be as smooth as D-ID, input any image and voice, and quickly generate videos or smooth streaming output.)

tolecy · 2023-11-24T06:05:50Z

Great project! I replaced chatGPT with my own small model and tested it on my own 3080ti graphics card, and the time consumption details are as follows：

===================================

Face Renderer:: 100%|80/80 [00:22<00:00, 3.49it/s]

fps:25.0

OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
seamlessClone:: 100%|318/318 [00:15<00:00, 20.66it/s]
I wonder if anyone has any efficient implementation or ideas for accelerating the video generation process. I have been interested in this recently. What I want to do now is to output the facial image generation process in synchronization with the voice after TTS is completed. However, because the facial generation process is relatively slow, the streaming effect will actually be very jerky.

(My goal now is to be as smooth as D-ID, input any image and voice, and quickly generate videos or smooth streaming output.)

btw，this is the message for the face rendering process --- “Thank you for the kind words. It is a pleasure to meet you as well. I am here to share the magic and beauty of the world around us. If you have any questions or need any guidance, I am always here to help.”

puffy310 · 2023-11-24T06:14:44Z

How did you replace ChatGPT, with another OpenAI model or a locally hosted OpenAI API compatible program?

tolecy · 2023-11-24T06:32:53Z

How did you replace ChatGPT, with another OpenAI model or a locally hosted OpenAI API compatible program?

I simply wrapped my local model as a service (with input-output format similar to OpenAI) and deployed it locally, and then made some modifications to the content of /chat_anything/chatbot/chat.py.

ermu2001 · 2023-11-24T20:07:09Z

Great project! I replaced chatGPT with my own small model and tested it on my own 3080ti graphics card, and the time consumption details are as follows：

===================================

Face Renderer:: 100%|80/80 [00:22<00:00, 3.49it/s]

fps:25.0

OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
seamlessClone:: 100%|318/318 [00:15<00:00, 20.66it/s]
I wonder if anyone has any efficient implementation or ideas for accelerating the video generation process. I have been interested in this recently. What I want to do now is to output the facial image generation process in synchronization with the voice after TTS is completed. However, because the facial generation process is relatively slow, the streaming effect will actually be very jerky.

(My goal now is to be as smooth as D-ID, input any image and voice, and quickly generate videos or smooth streaming output.)

The facial image generation only executes once -- at the first round of conversation "..., Bot: how are you doing...". I think it would be acceptable for the latency since.

And by the way, this step "seamlessClone:: 100%|318/318 [00:15<00:00, 20.66it/s]" is a option for sadtalker to somehow not crop out the face for rendering and pasting it back. You can disable it by unchecking the "Use full body instead of a face." on the setting tab. It seems unoptimized and takes up lots of time O.o

https://github.com/zhoudaquan/ChatAnything/blob/main/chat_anything/sad_talker/utils/paste_pic.py#L59-L65

puffy310 · 2023-11-24T23:15:33Z

Very excited to see more progress in this area!

tolecy · 2023-11-29T13:45:51Z

Great project! I replaced chatGPT with my own small model and tested it on my own 3080ti graphics card, and the time consumption details are as follows：

===================================

Face Renderer:: 100%|80/80 [00:22<00:00, 3.49it/s]

fps:25.0

OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
seamlessClone:: 100%|318/318 [00:15<00:00, 20.66it/s]
I wonder if anyone has any efficient implementation or ideas for accelerating the video generation process. I have been interested in this recently. What I want to do now is to output the facial image generation process in synchronization with the voice after TTS is completed. However, because the facial generation process is relatively slow, the streaming effect will actually be very jerky.
(My goal now is to be as smooth as D-ID, input any image and voice, and quickly generate videos or smooth streaming output.)

The facial image generation only executes once -- at the first round of conversation "..., Bot: how are you doing...". I think it would be acceptable for the latency since.

And by the way, this step "seamlessClone:: 100%|318/318 [00:15<00:00, 20.66it/s]" is a option for sadtalker to somehow not crop out the face for rendering and pasting it back. You can disable it by unchecking the "Use full body instead of a face." on the setting tab. It seems unoptimized and takes up lots of time O.o

https://github.com/zhoudaquan/ChatAnything/blob/main/chat_anything/sad_talker/utils/paste_pic.py#L59-L65

Yep,
When running on 4090 (considering only face render), the time required for video generation is not significantly different from the video length. Theoretically, if it is a streaming output (at 25fps), a relatively smooth feeling can be achieved.

Currently, I am trying to integrate live2D, and further, I hope to input a custom full-body image for full-body driving (this is my next plan) just like a prepared-live2D model, but I don’t have much experience in this field of cv. Any suggestions about this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

real-time chat #2

real-time chat #2

jingli-wtbox commented Nov 16, 2023 •

edited

Loading

ermu2001 commented Nov 16, 2023

jingli-wtbox commented Nov 17, 2023

puffy310 commented Nov 22, 2023

zhoudaquan commented Nov 23, 2023

tolecy commented Nov 24, 2023 •

edited

Loading

tolecy commented Nov 24, 2023

===================================

fps:25.0

puffy310 commented Nov 24, 2023

tolecy commented Nov 24, 2023

ermu2001 commented Nov 24, 2023 •

edited

Loading

===================================

fps:25.0

puffy310 commented Nov 24, 2023

tolecy commented Nov 29, 2023

===================================

fps:25.0

real-time chat #2

real-time chat #2

Comments

jingli-wtbox commented Nov 16, 2023 • edited Loading

ermu2001 commented Nov 16, 2023

jingli-wtbox commented Nov 17, 2023

puffy310 commented Nov 22, 2023

zhoudaquan commented Nov 23, 2023

tolecy commented Nov 24, 2023 • edited Loading

=================================== Face Renderer:: 100%|80/80 [00:22<00:00, 3.49it/s]

fps:25.0 OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v' seamlessClone:: 100%|318/318 [00:15<00:00, 20.66it/s]

tolecy commented Nov 24, 2023

===================================

fps:25.0

puffy310 commented Nov 24, 2023

tolecy commented Nov 24, 2023

ermu2001 commented Nov 24, 2023 • edited Loading

===================================

fps:25.0

puffy310 commented Nov 24, 2023

tolecy commented Nov 29, 2023

===================================

fps:25.0

jingli-wtbox commented Nov 16, 2023 •

edited

Loading

tolecy commented Nov 24, 2023 •

edited

Loading

===================================
Face Renderer:: 100%|80/80 [00:22<00:00, 3.49it/s]

fps:25.0
OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
seamlessClone:: 100%|318/318 [00:15<00:00, 20.66it/s]

ermu2001 commented Nov 24, 2023 •

edited

Loading