AnyModal/flickr30k
Viewer
• Updated
• 31k • 669
Multimodal LLMs for all! AnyModal is a modular and extensible framework for integrating diverse input modalities (e.g., images, audio) into large language models (LLMs). It enables seamless tokenization, encoding, and language generation using pre-trained models for various modalities.