Workflow - I2V & T2V Multi-character "Talking Avatar/Digital Human" consistent voices with Fish Audio S2 Pro

#42
by RuneXX - opened

I2V & T2V Multi-character "Talking Avatar/Digital Human" consistent voices with Fish Audio S2 Pro

Create dialogs with consistent voices on the fly. By prompting your dialog and using reference audio for voice cloning.
Great for "talking avatar" sort of videos (since there is no ambient sound or background sound or sound fx, just dialog).

(you can of course add any ambient sound or sound fx in a video editor later should you want ;-)

Needed nodes (in addition to KJNodes etc):
https://github.com/Saganaki22/ComfyUI-FishAudioS2
Support a wide range of emotional tags for expressive audio, support 80+ languages etc

Feel free to try it out https://huggingface.co/RuneXX/LTX-2.3-Workflows/ ;-)

Added a workflow for Qwen TTS as well (single character).

You can of course use any other TTS comfyUI nodes should you prefer, some good ones are Microsoft VibeVoice, CosyVoice 3, Chatterbox, IndexTTS and more.. (several of them are multi-character)
And if you want an all-in-one TTS suite that has multiple variants: https://github.com/diodiogod/TTS-Audio-Suite

Any of them are easy to connect to the workflow, since they all give an audio output to connect.

Any idea per chance why I get 1 frame or so, and then a garbled mess? The default comfy workflow has no issue for me. I played around for a bit without any luck. Tried disabling 2nd sampler and a bunch of stuff also, updating comfy and required nodes. On the comfy included workflow I'm getting pretty good results, it's so weird since since there's no errors or anything. The processing time is fine for the video, it looks like it's working normally, but the output...

Any tips appreciated thanks!

Perhaps wrong models? maybe a screenshot of the model setup?

Sign up or log in to comment