Papers
arxiv:2604.18199

Linear-Time and Constant-Memory Text Embeddings Based on Recurrent Language Models

Published on Apr 20
Authors:
,

Abstract

Recurrent architectures with vertically chunked inference offer efficient text embedding generation with constant memory usage and competitive performance compared to transformer-based models.

AI-generated summary

Transformer-based embedding models suffer from quadratic computational and linear memory complexity, limiting their utility for long sequences. We propose recurrent architectures as an efficient alternative, introducing a vertically chunked inference strategy that enables fast embedding generation with memory usage that becomes constant in the input length once it exceeds the vertical chunk size. By fine-tuning Mamba2 models, we demonstrate their viability as general-purpose text embedders, achieving competitive performance across a range of benchmarks while maintaining a substantially smaller memory footprint compared to transformer-based counterparts. We empirically validate the applicability of our inference strategy to Mamba2, RWKV, and xLSTM models, confirming consistent runtime-memory trade-offs across architectures and establishing recurrent models as a compelling alternative to transformers for efficient embedding generation.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.18199
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.18199 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.18199 in a Space README.md to link it from this page.

Collections including this paper 1