Multiple Subject Reference IC-LoRA (Test Version)

⚠️ This is a test version released for feedback collection to guide future optimization.

Overview

This model implements a novel approach to multi-reference video generation using Multiple Subject Reference (MSR). Instead of introducing additional encoder branches or fusion modules, we transform multiple static reference images into a pseudo-video sequence that shares the same representation space as the target video.

Usage

This LoRA requires the ComfyUI-Licon-MSR plugin for ComfyUI. A sample workflow is included in the model files for easy testing and experimentation.

Key Features

Multi-Reference Visual Memory

Token-level reference preservation: Multiple reference images are encoded as video latents, preserving fine-grained visual information at token level rather than compressing into a single embedding
Native self-attention retrieval: The target video tokens directly access reference tokens through the model's existing self-attention mechanism—no new architectural components needed
In-context conditioning: References serve as "visual memory" within the main token sequence, not as external conditioning inputs

Flexible Reference Composition

2 to 5 reference images: Supports varying numbers of reference inputs with increasing complexity
Complementary semantic roles: Each reference image can carry different information:
- Subject identity
- Object/prop details
- Scene/background
- Local textures
- Multiple viewpoints

What It Can Do

Identity Preservation Across References

Generate videos where multiple reference identities are simultaneously preserved:

Multiple characters from different reference images
Character + object combinations
Object + scene compositions

Relation-Based Composition

Beyond mere identity preservation, the model can compose references based on textual relation descriptions:

Action interactions (handing, picking up, pushing)
Spatial relationships (left-right, foreground-background)
Temporal event structures (start → process → result)

Cross-Reference Attribute Selection

The model learns to selectively retrieve attributes from different references:

Face from reference A, clothing from reference B
Object identity from one reference, pose/position from another
Background elements from scene references

Current Issues (Test Version)

High-motion limb distortion: Significant degradation in limb quality during fast or complex motion sequences
Slight object consistency loss: Minor identity drift for objects throughout the video duration

Results Showcase

2-Reference Comparison

Reference Images	Our Model	Seedance2.0
	▶ Play	▶ Play

4-Reference Comparison

Reference Images	Our Model	Seedance2.0
	▶ Play	▶ Play

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support