You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Bolt Embedding Models

Bolt Embedding is a family of high-performance embedding models optimized for enterprise Retrieval-Augmented Generation (RAG).
These models are fine-tuned from IBM Granite embedding models and are designed to produce strong semantic embeddings for knowledge retrieval, search, and document understanding.

Bolt models map text (queries, sentences, or documents) into a dense vector space suitable for similarity search, clustering, and retrieval pipelines.


Model Overview

Bolt embeddings are purpose-built for enterprise RAG workloads, where retrieval quality and robustness across heterogeneous documents are critical.

Key design goals:

  • Strong query → document retrieval quality
  • Robust performance on long enterprise documents
  • Optimized for large-scale vector search
  • Trained using large-batch contrastive learning to replicate real RAG retrieval conditions

These models are fine-tuned from IBM Granite embedding models using contrastive training on RAG-style data.


Model Details

Model Type

Sentence Transformer embedding model

Base Model

Fine-tuned from:

  • ibm-granite/granite-embedding-small-english-r2 (small)
  • ibm-granite/granite-embedding-english-r2 (large)

(depending on the Bolt variant)

Output

  • Embedding dimension: 384 (small), 768 (large)
  • Similarity metric: Cosine similarity
  • Max sequence length: 4096 tokens

Architecture

SentenceTransformer(
  (0): Transformer(ModernBertModel)
  (1): Pooling(CLS)
)

Bolt uses CLS pooling to produce a single embedding vector per input.


Training Objective

Bolt embeddings are trained specifically for retrieval scenarios using contrastive learning.

Loss Function

CachedMultipleNegativesRankingLoss

This loss is widely used for training embedding models for retrieval tasks.

Key properties:

  • Efficient training with very large effective batch sizes
  • Uses in-batch negatives
  • Encourages queries to be close to their relevant passages while far from irrelevant ones

Large Batch Training

Bolt models were trained using batch sizes of 1024.

Large batches simulate realistic retrieval scenarios:

Query
Positive document
~2000 unrelated documents, including hard negatives

This closely approximates production RAG retrieval environments, where each query must rank the correct document among many candidates.

The result is improved:

  • retrieval accuracy
  • semantic separation
  • ranking robustness

Training Data

Training was performed using custom datasets we collected. This dataset includes hand-curated examples as well as examples from datasets with commercially-accepable licenses. To curate hard negatives for some examples, LLMs with commercially-permissable licenses were used to generate negatives.

Dataset format:

Column Description
anchor Query or input text
positive Relevant document/passage
negative Unrelated document/passage, with some examples generated using LLMs to provide hard negatives and some examples chosen at random from existing negatives

Training size:

  • 500,000 training samples
  • 20,000 evaluation samples

The dataset contains a mixture of:

  • question → answer pairs
  • query → document matches
  • semantic similarity examples

These samples are designed to mimic real RAG retrieval workloads.


Intended Use

Bolt embeddings are designed for:

  • Retrieval-Augmented Generation (RAG)
  • Enterprise document search
  • Semantic search
  • Knowledge base retrieval
  • Question answering
  • Duplicate detection
  • Similarity scoring

Typical pipeline:

User query
      ↓
Bolt embedding
      ↓
Vector search
      ↓
Top-k documents
      ↓
LLM generation

Usage

Install Sentence Transformers:

pip install -U sentence-transformers

Load the Model

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("aisquared/bolt-embedding-small")

or

model = SentenceTransformer("aisquared/bolt-embedding-large")

Generate Embeddings

sentences = [
    "What are the tax implications of employee stock options?",
    "Employee stock options may have tax consequences depending on exercise timing.",
    "The Eiffel Tower is located in Paris."
]

embeddings = model.encode(sentences)

print(embeddings.shape)

Compute Similarity

similarities = model.similarity(embeddings, embeddings)

print(similarities)

Why Bolt?

Many embedding models are trained on general semantic similarity tasks.

Bolt is optimized for enterprise retrieval, where queries must locate the correct information among thousands of unrelated documents.

Key differentiators:

  • Large-batch contrastive training
  • RAG-specific dataset
  • Long context support (4096 tokens trained)
  • Optimized for vector database retrieval

Framework Versions

Training was performed using:

  • Python 3.12
  • Sentence Transformers
  • Transformers
  • PyTorch
  • HuggingFace Datasets
  • HuggingFace Jobs, utilizing 1xA100 GPU

Citation

If you use Bolt embeddings in research or production systems, please cite the underlying Sentence-BERT work.

Sentence-BERT

@inproceedings{reimers-2019-sentence-bert,
  title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
  author = "Reimers, Nils and Gurevych, Iryna",
  year = 2019
}

Cached Multiple Negatives Ranking Loss

@misc{gao2021scaling,
  title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
  author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
  year={2021}
}

License

Bolt embeddings is released under the AI Squared Community License.

Downloads last month
8
GGUF
Model size
47.7M params
Architecture
modern-bert
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aisquared/bolt-embedding-small-gguf

Quantized
(10)
this model

Collection including aisquared/bolt-embedding-small-gguf