chunkable-sentence-transformer
Custom Sentence Transformers modules that add support for vertically chunked inference to encode arbitrarily long texts with constant memory by processing fixed-size vertical chunks through all model layers sequentially.
What this repository provides
ChunkableTransformer: extends the Sentence TransformersTransformermodule with avertical_chunk_sizeparameter. When set, input sequences are split into chunks that are each processed through the full model depth, carrying the recurrent states across chunks instead of materializing the entire sequence in memory at once.LastIndexPooling: pools the embedding from the last token regardless of padding, which allows us to only retain outputs of the final chunk when using left padding with chunked inference.
Usage
This repository is designed to be referenced directly from Hugging Face model configs via modules.json, so that models can be loaded with trust_remote_code=True without any local installation:
[
{
"idx": 0,
"name": "0",
"path": "",
"type": "dynatrace-oss/chunkable-sentence-transformer--models.ChunkableTransformer"
},
{
"idx": 1,
"name": "1",
"path": "1_LastIndexPooling",
"type": "dynatrace-oss/chunkable-sentence-transformer--models.LastIndexPooling"
}
]
Constant-memory inference is then available via the vertical_chunk_size encode parameter:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("dynatrace-oss/llama-embed-mamba2-7b", trust_remote_code=True)
embeddings = model.encode(["Your long document text here..."], vertical_chunk_size=512)
Requirements
pip install sentence-transformers
Models
This code was created for the following embedding models:
Open Source Integration Roadmap
Our goal is to integrate all necessary changes to simplify the adoption of vertically chunked inference for other models:
⚪ Planned | 🟡 In Progress | 🟢 Integrated
- ⚪ sentence-transformers: Last index pooling
- ⚪ sentence-transformers: Native vertical chunking support for transformers
This list will be updated as integration progresses.
License
Apache-2.0