Abstract
Compact pretrained bidirectional encoders based on Avey architecture outperform Transformer-based models on token classification and information retrieval tasks while scaling more efficiently to long contexts.
Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention's ability to deliver high-quality bidirectional contextualization with sequence-level parallelism, as popularized by BERT-style architectures. Recently, Avey was introduced as an autoregressive, attention-free alternative that naturally admits an encoder-only adaptation. In this paper, we reformulate Avey for the encoder-only paradigm and propose several innovations to its architecture, including decoupled static and dynamic parameterizations, stability-oriented normalization, and neural compression. Results show that this reformulated architecture compares favorably to four widely used Transformer-based encoders, consistently outperforming them on standard token-classification and information-retrieval benchmarks while scaling more efficiently to long contexts.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- RexBERT: Context Specialized Bidirectional Encoders for E-commerce (2026)
- LMK>CLS: Landmark Pooling for Dense Embeddings (2026)
- ViCA: Efficient Multimodal LLMs with Vision-Only Cross-Attention (2026)
- LinMU: Multimodal Understanding Made Linear (2026)
- STILL: Selecting Tokens for Intra-Layer Hybrid Attention to Linearize LLMs (2026)
- MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head (2026)
- KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper