chunkable-sentence-transformer

Custom Sentence Transformers modules that add support for vertically chunked inference to encode arbitrarily long texts with constant memory by processing fixed-size vertical chunks through all model layers sequentially.

What this repository provides

ChunkableTransformer: extends the Sentence Transformers Transformer module with a vertical_chunk_size parameter. When set, input sequences are split into chunks that are each processed through the full model depth, carrying the recurrent states across chunks instead of materializing the entire sequence in memory at once.
LastIndexPooling: pools the embedding from the last token regardless of padding, which allows us to only retain outputs of the final chunk when using left padding with chunked inference.

Usage

This repository is designed to be referenced directly from Hugging Face model configs via modules.json, so that models can be loaded with trust_remote_code=True without any local installation:

[
  {
    "idx": 0,
    "name": "0",
    "path": "",
    "type": "dynatrace-oss/chunkable-sentence-transformer--models.ChunkableTransformer"
  },
  {
    "idx": 1,
    "name": "1",
    "path": "1_LastIndexPooling",
    "type": "dynatrace-oss/chunkable-sentence-transformer--models.LastIndexPooling"
  }
]

Constant-memory inference is then available via the vertical_chunk_size encode parameter:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("dynatrace-oss/llama-embed-mamba2-7b", trust_remote_code=True)
embeddings = model.encode(["Your long document text here..."], vertical_chunk_size=512)

Requirements

pip install sentence-transformers

Models

This code was created for the following embedding models:

Open Source Integration Roadmap

Our goal is to integrate all necessary changes to simplify the adoption of vertically chunked inference for other models:

⚪ Planned | 🟡 In Progress | 🟢 Integrated

⚪ sentence-transformers: Last index pooling
⚪ sentence-transformers: Native vertical chunking support for transformers

This list will be updated as integration progresses.

License

Apache-2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including dynatrace-oss/chunkable-sentence-transformer

Embed Mamba2

Collection

Text embedding models based on Mamba2 with linear-time and constant-memory inference through vertical chunking. • 5 items • Updated 1 day ago • 2