Video Depth Anything — Large

Mirror of depth-anything/Video-Depth-Anything-Large for use with ComfyUI-FFMPEGA.

What is Video Depth Anything?

Video Depth Anything is a state-of-the-art model for temporally consistent monocular depth estimation in videos. It extends Depth Anything V2 with temporal modules for smooth, flicker-free depth maps across video frames.

Key features:

Temporal consistency — smooth depth maps without frame-to-frame flickering
Multiple encoder sizes — Small (335.3M), Base, and Large variants
Apache 2.0 license — fully open source
Colormap output — supports multiple colormap visualizations (inferno, magma, plasma, etc.)

Files

model.safetensors
config.json

Usage

With ComfyUI-FFMPEGA (recommended)

Set no_llm_mode to video_depth on the FFMPEG Agent node
Select encoder size (small, base, large) under Advanced Options
Choose colormap for visualization
The model auto-downloads on first use

Manual download

huggingface-cli download AEmotionStudio/Video-Depth-Anything-Large --local-dir ./video_depth_anything

Available Sizes

Variant	Parameters	Size	Speed
Small	24.8M	~102 MB	Fastest
Base	97.5M	~390 MB	Balanced
Large	335.3M	~670 MB	Best quality

License

Apache 2.0 — see the upstream repository for full license terms.

Credits

Original model by: Depth Anything team
Paper: "Video Depth Anything: Consistent Depth Estimation for Super-Long Videos"
Upstream HuggingFace: depth-anything/Video-Depth-Anything-Large
Redistributed by: Æmotion Studio for use with ComfyUI-FFMPEGA

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AEmotionStudio/Video-Depth-Anything-Large

Base model

depth-anything/Video-Depth-Anything-Large

Finetuned

(1)

this model