CLIP
updated
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper
• 2401.17270
• Published
• 43
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other
Modalities
Paper
• 2401.14405
• Published
• 13
Improving fine-grained understanding in image-text pre-training
Paper
• 2401.09865
• Published
• 18
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster
Pre-training on Web-scale Image-Text Data
Paper
• 2404.15653
• Published
• 29
Multi-Head Mixture-of-Experts
Paper
• 2404.15045
• Published
• 60
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and
Training Strategies
Paper
• 2404.08197
• Published
• 29
MoDE: CLIP Data Experts via Clustering
Paper
• 2404.16030
• Published
• 15
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper
• 2405.20204
• Published
• 37
An Image is Worth More Than 16x16 Patches: Exploring Transformers on
Individual Pixels
Paper
• 2406.09415
• Published
• 51
Paper
• 2410.05258
• Published
• 180
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for
Contrastive Loss
Paper
• 2410.17243
• Published
• 92