Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention Paper • 2310.07911 • Published Oct 11, 2023 • 1
Deconstructing Attention: Investigating Design Principles for Effective Language Modeling Paper • 2510.11602 • Published Oct 13, 2025 • 15
HashFormers: Towards Vocabulary-independent Pre-trained Transformers Paper • 2210.07904 • Published Oct 29, 2022
MultiHashFormer: Hash-based Generative Language Models Paper • 2606.28057 • Published 4 days ago • 16