Bolt Embedding Models
Bolt Embedding is a family of high-performance embedding models optimized for
enterprise Retrieval-Augmented Generation (RAG).
These models are fine-tuned from IBM Granite embedding models and
are designed to produce strong semantic embeddings for knowledge
retrieval, search, and document understanding.
Bolt models map text (queries, sentences, or documents) into a dense vector space suitable for similarity search, clustering, and retrieval pipelines.
Model Overview
Bolt embeddings are purpose-built for enterprise RAG workloads, where retrieval quality and robustness across heterogeneous documents are critical.
Key design goals:
- Strong query → document retrieval quality
- Robust performance on long enterprise documents
- Optimized for large-scale vector search
- Trained using large-batch contrastive learning to replicate real RAG retrieval conditions
These models are fine-tuned from IBM Granite embedding models using contrastive training on RAG-style data.
Model Details
Model Type
Sentence Transformer embedding model
Base Model
Fine-tuned from:
ibm-granite/granite-embedding-small-english-r2(small)ibm-granite/granite-embedding-english-r2(large)
(depending on the Bolt variant)
Output
- Embedding dimension: 384 (small), 768 (large)
- Similarity metric: Cosine similarity
- Max sequence length: 4096 tokens
Architecture
SentenceTransformer(
(0): Transformer(ModernBertModel)
(1): Pooling(CLS)
)
Bolt uses CLS pooling to produce a single embedding vector per input.
Training Objective
Bolt embeddings are trained specifically for retrieval scenarios using contrastive learning.
Loss Function
CachedMultipleNegativesRankingLoss
This loss is widely used for training embedding models for retrieval tasks.
Key properties:
- Efficient training with very large effective batch sizes
- Uses in-batch negatives
- Encourages queries to be close to their relevant passages while far from irrelevant ones
Large Batch Training
Bolt models were trained using batch sizes of 1024.
Large batches simulate realistic retrieval scenarios:
Query
Positive document
~2000 unrelated documents, including hard negatives
This closely approximates production RAG retrieval environments, where each query must rank the correct document among many candidates.
The result is improved:
- retrieval accuracy
- semantic separation
- ranking robustness
Training Data
Training was performed using custom datasets we collected. This dataset includes hand-curated examples as well as examples from datasets with commercially-accepable licenses. To curate hard negatives for some examples, LLMs with commercially-permissable licenses were used to generate negatives.
Dataset format:
| Column | Description |
|---|---|
| anchor | Query or input text |
| positive | Relevant document/passage |
| negative | Unrelated document/passage, with some examples generated using LLMs to provide hard negatives and some examples chosen at random from existing negatives |
Training size:
- 500,000 training samples
- 20,000 evaluation samples
The dataset contains a mixture of:
- question → answer pairs
- query → document matches
- semantic similarity examples
These samples are designed to mimic real RAG retrieval workloads.
Intended Use
Bolt embeddings are designed for:
- Retrieval-Augmented Generation (RAG)
- Enterprise document search
- Semantic search
- Knowledge base retrieval
- Question answering
- Duplicate detection
- Similarity scoring
Typical pipeline:
User query
↓
Bolt embedding
↓
Vector search
↓
Top-k documents
↓
LLM generation
Usage
Install Sentence Transformers:
pip install -U sentence-transformers
Load the Model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("aisquared/bolt-embedding-small")
or
model = SentenceTransformer("aisquared/bolt-embedding-large")
Generate Embeddings
sentences = [
"What are the tax implications of employee stock options?",
"Employee stock options may have tax consequences depending on exercise timing.",
"The Eiffel Tower is located in Paris."
]
embeddings = model.encode(sentences)
print(embeddings.shape)
Compute Similarity
similarities = model.similarity(embeddings, embeddings)
print(similarities)
Why Bolt?
Many embedding models are trained on general semantic similarity tasks.
Bolt is optimized for enterprise retrieval, where queries must locate the correct information among thousands of unrelated documents.
Key differentiators:
- Large-batch contrastive training
- RAG-specific dataset
- Long context support (4096 tokens trained)
- Optimized for vector database retrieval
Framework Versions
Training was performed using:
- Python 3.12
- Sentence Transformers
- Transformers
- PyTorch
- HuggingFace Datasets
- HuggingFace Jobs, utilizing 1xA100 GPU
Citation
If you use Bolt embeddings in research or production systems, please cite the underlying Sentence-BERT work.
Sentence-BERT
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
year = 2019
}
Cached Multiple Negatives Ranking Loss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021}
}
License
Bolt embeddings is released under the AI Squared Community License.
- Downloads last month
- 8
We're not able to determine the quantization variants.