Interest - a Exclibur Collection

Exclibur 's Collections

Interest

updated May 20, 2025

CompCap: Improving Multimodal Large Language Models with Composite Captions

Paper • 2412.05243 • Published Dec 6, 2024 • 20
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

Paper • 2412.04814 • Published Dec 6, 2024 • 46
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

Paper • 2412.05237 • Published Dec 6, 2024 • 46
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models

Paper • 2412.05939 • Published Dec 8, 2024 • 15
Chimera: Improving Generalist Model with Domain-Specific Experts

Paper • 2412.05983 • Published Dec 8, 2024 • 9
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Paper • 2412.06673 • Published Dec 9, 2024 • 11
Video Motion Transfer with Diffusion Transformers

Paper • 2412.07776 • Published Dec 10, 2024 • 17
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

Paper • 2412.03548 • Published Dec 4, 2024 • 17
Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation

Paper • 2412.07334 • Published Dec 10, 2024 • 17
StreamChat: Chatting with Streaming Video

Paper • 2412.08646 • Published Dec 11, 2024 • 18
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Paper • 2412.05552 • Published Dec 7, 2024 • 6
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Paper • 2412.09585 • Published Dec 12, 2024 • 11
Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published Dec 11, 2024 • 49
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions

Paper • 2412.08737 • Published Dec 11, 2024 • 54
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 97
VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding

Paper • 2412.02186 • Published Dec 3, 2024 • 23
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Paper • 2412.17739 • Published Dec 23, 2024 • 41
Large Concept Models: Language Modeling in a Sentence Representation Space

Paper • 2412.08821 • Published Dec 11, 2024 • 17
The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9, 2025 • 95
Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding

Paper • 2501.07888 • Published Jan 14, 2025 • 15
Temporal Preference Optimization for Long-Form Video Understanding

Paper • 2501.13919 • Published Jan 23, 2025 • 23
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published Jan 22, 2025 • 90
PaSa: An LLM Agent for Comprehensive Academic Paper Search

Paper • 2501.10120 • Published Jan 17, 2025 • 54
PokerBench: Training Large Language Models to become Professional Poker Players

Paper • 2501.08328 • Published Jan 14, 2025 • 19
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16, 2025 • 72
Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17, 2025 • 115
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published Jan 16, 2025 • 27
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 170
Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21, 2025 • 155