Workflow - I2V & T2V 3-Pass workflow for higher detail and motions

#79
by RuneXX - opened

I2V & T2V 3-Pass workflow for higher detail and motions

By request, a workflow that goes through 3 samplers (one regular + 2x upscale).
Can make more details, and motion

Feel free to try it out ;-)

Might make more sense in a DEV model workflow.
The first pass is at a very low res so it doesn't "take forever" and is rather quick.
And that lays the grounds for the initial motion and all (and more motion than the distilled model typically).
After that 2 upscale samplers to refine the looks.

Seems to work pretty well in a Dev workflow ;-) uploaded that too, to try out ....

I tried one of these 3-Pass workflows awhile ago but I always ran out of vram using them. Even making a small 15 second clip would eat all my vram and crash Comfy.

I've been sticking to using all your workflows on here seeing as they always worked out great for me. Running your new 3-Pass, will it require more vram then the second? In other words, if I can make a 50 second clip on your other workflows with the 2-pass at 1280x720, can I expect to accomplish the same length on this new 3-pass workflow of yours?

Was also curious, why does the quality still look good on your "Add_Sound_To_Any_Video" with just a single pass yet the regular I2V and T2V workflows look bad when doing the same? Even when I extend the clips on the "Add_Sound_To_Any_Video" by 20 to 40 seconds, the quality is still great. The other two regular workflows require a second pass in order to look decent. Found that strange. The same applies to your recut workflow. A single pass is good enough with that one as well.

Also, thanks a lot for all you do here. Your always updating and sharing your workflows with everyone. It is greatly appreciated.

it should not use much more vram, since my 3-pass workflow might differ from what you tried earlier.
In other words if you set 1280x720 as target resolution, thats what you get.. . and not 2x + 2x .. ending up with 4K something ;-) where each step dramatically up-res your video.

(of course you can opt for 4k too, if you set that as target resolution)

As for the quality of the older single-pass, I'll take a look. Over time, some learned lessons might have changed what sampler to use etc. The older single pass ones might need some love ;-) will check, perhaps just setting a better sampler is the thing needed

Sorry, my mistake. I was using your custom audio workflow as a basic workflow. I would turn off the audio addon with your added off/on switch and select single pass. This would produce bad quality output for image to video and text to video.

I just tried your basic single pass workflow and that works very well(no need to alter it). Must be something you added to the single workflow that makes it give better results as opposed to just turning it off in your Custom Audio workflow. Strange. I compared the two but don't see any differences aside from the single not having the upscaler included in it.

By the way, I tried using other samplers but that didn't fix the problem in the custom audio workflow. I thought I could stick to just one workflow and just use your off/on switches. Apparently this doesn't work so good for the single pass.

edit

It turns out using your single pass workflow takes wayyyy too long. it takes me 29 minutes for a 1280x720 video at 40 seconds.

Using your custom audio workflow and just turning off that audio feature is faster generating image and text to video. Having the 2-pass workflow only takes 15 minutes for the same resolution and seconds. Very odd. Your single pass workflow takes double the time but the quality does seem to be better after more testing.

I have the t2v option selected and its still trying to load the image please help!

just click the image box that has your loaded image in it and click the arrow pointing to the right, top of the box(Second icon from the right side) This will disable the workflow from using the image and it will default to t2v. You will know its disabled when the box is turned purple from clicking that arrow icon.

Update on your 3-pass. that baby need some serious vram in order to increase the seconds past 15. I also find there is a lot of ghosting/distortion with this workflow. Too bad since the detail of the image doesn't get much better then this.

Having the 2-pass workflow only takes 15 minutes for the same resolution and seconds. Very odd. Your single pass workflow takes double the time but the quality does seem to be better after more testing.

Depends on the resolution you aim for. But it can be significantly slower if your VRAM/Ram is low and you set the video resolution high.
The 2-pass workflow is made for speed, it first generate a low resolution first pass, before upscaling it in the 2nd pass "magically" in just a few steps (using upscale model).
And can often be much better for higher resolutions (all depending on what your pc can handle).

For single-pass, its perhaps most of all for a bit of a Wan mode, where you'd generate 5-10 second clips at 832x480 pixels or so...
Or as LTX says, testing out ideas at 758x512 res.
Or just a way for people who have no chance at generating high res videos, to just go for lower res videos in a single pass, a bit "Wan" (better than nothing;-))

For everything else, the 2-pass workflow is usually the one to use ;-)

I have the t2v option selected and its still trying to load the image please help!

Yes comfyUI ask for things even if they are not really used. So for T2V mode in a combo workflow that can also do I2V, you still need an image loaded.
You can just write "example" in the image loader, and get that default ComfyUI clipart as placeholder ;-) and thinking of it, perhaps i should do that too before uploading ...

I did also upload a couple of pure T2V workflows though, that doesnt have this "issue" where as long as there exists an image loader, and image must be loaded ;-)

Update on your 3-pass. that baby need some serious vram in order to increase the seconds past 15. I also find there is a lot of ghosting/distortion with this workflow. Too bad since the detail of the image doesn't get much better then this.

A bit experimental at this stage, since its not really intended to be 3 samplers. There might be ways to improve it though.
Been thinking of trying the 1.5x upscaler instead of the 2x. so that the first pass is not extremely tiny, and the differences in resolution between each step is less.
I think this might give better result in this particular workflow. But havent tested yet ;-)

edit:

  • tried some 1.5x upscale workflows, but seems to be a bug with the 1.5 upscaler, same as the first version of 2x upscale had (random text, and other stuff on screen)
    (removed them, unless i can find a way to remedy that))

Thanks very much for explaining the differences between the two and why it takes so long with just the single pass. I initially thought there was something wrong with that workflow.

Didn't realize until yesterday that your single pass workflow really is the best when it comes to staying true to the image. If using image to video, nothing beats it. I don't recommend anyone use the 2-pass for your recut workflow. That can really mess up the scene and change all the characters.

Its too bad the 1.5 is bugged. I doubt light tricks will get around to fixing it like they did with there latest 2x 1.1.

true, the single pass can stay more true to the input image, since its just one sampler (any sampler is a "re-imagination" or re-creation, a point of potential changes).
There are in-place and guider nodes that influence the result, but with a low res first pass, and quick upscale with few steps, the 2-pass workflow is perhaps a bit more "creative", and less faithful

Just FYI on these workflows you used the Comfy-LTXVideo version of LTXVImgToVideoConditionOnly instead of the native LTXVImgToVideoInplace. Adds an extra dependency.

Sign up or log in to comment