electrical-embeddinggemma-ir_q4_k_m

Model Description

This model is the GGUF q4_k_m (4-bit K-quant) variant of the gemma-300m-electrical-electronics-ir family, fine-tuned from unsloth/embeddinggemma-300m for dense Information Retrieval (IR) in the electrical and electronics engineering domain. This is the recommended production build — it is 4× smaller than the f16 GGUF (236 MB), runs on a laptop CPU without a GPU, and loses only 0.0008 MAP@100 points versus the full-precision f16 variant.

Training Data

The model was trained on the disham993/ElectricalElectronicsIR dataset — 20,000 question-passage pairs covering electrical engineering, electronics, power systems, and communications.

  • 16k train / 2k validation / 2k test
  • Queries: 133–822 characters; passages: 586–5,590 characters
  • Topics include phased array antennas, IEC 61850 protocols, Josephson junctions, OTDR measurements, MIMO channel estimation, FPGA partial reconfiguration, and more

Model Details

Base Model unsloth/embeddinggemma-300m (308M params)
Format GGUF q4_k_m (4-bit K-quant)
Task Feature Extraction (Dense IR / Semantic Search)
Language English (en)
Dataset disham993/ElectricalElectronicsIR
Approx. size ~236 MB
Backend llama.cpp / llama-cpp-python
License MIT

Training Procedure

Training Hyperparameters

Method LoRA via Unsloth's FastSentenceTransformer, exported to GGUF q4_k_m
LoRA rank / alpha r=32, α=64
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Loss MultipleNegativesRankingLoss (in-batch negatives)
Batch size 128 per device × 2 gradient accumulation = 256 effective
Learning rate 2e-5 (linear schedule, 3% warmup)
Max steps 100
Max sequence length 1024
Precision bf16 (training) → q4_k_m GGUF (export)
Batch sampler NO_DUPLICATES
Hardware NVIDIA RTX 5090

Evaluation Results

Evaluated on the held-out test split (2,000 queries) of disham993/ElectricalElectronicsIR using sentence_transformers.evaluation.InformationRetrievalEvaluator.

Model MAP@100 NDCG@10 MRR@10 Recall@10
unsloth/embeddinggemma-300m (baseline) 0.5753 0.6221 0.5682 0.7925
electrical-embeddinggemma-ir_lora 0.9795 0.9847 0.9795 1.0000
electrical-embeddinggemma-ir_finetune_16bit 0.9797 0.9849 0.9797 1.0000
electrical-embeddinggemma-ir_f16 0.9849 0.9887 0.9849 0.9995
electrical-embeddinggemma-ir_q8_0 0.9844 0.9883 0.9844 0.9995
electrical-embeddinggemma-ir_q4_k_m (this model) ⭐ 0.9841 0.9879 0.9840 0.9990
electrical-embeddinggemma-ir_q5_k_m 0.9824 0.9866 0.9823 0.9990

Recommended production build. MAP@100 delta vs f16: only −0.0008 at ~4× smaller size. Runs on CPU.

Usage

LM Studio (OpenAI-compatible API)

Load this model in LM Studio and use it via the built-in OpenAI-compatible server:

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:1234/v1", api_key="lm-studio")

texts = [
    "What is impedance matching?",
    "Impedance matching maximises power transfer by equalising source and load impedance.",
    "An LLC resonant converter achieves zero-voltage switching using an LC tank circuit.",
]

response = client.embeddings.create(
    model="text-embedding-electrical-embeddinggemma-ir",
    input=texts,
)

for item in response.data:
    print(f"[{item.index}] dim={len(item.embedding)}  first5={item.embedding[:5]}")

llama-cpp-python

# Install dependencies
pip install huggingface_hub
CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 pip install llama-cpp-python # (For NVIDIA GPU acceleration)
import torch
import torch.nn.functional as F
from huggingface_hub import hf_hub_download, HfApi
from llama_cpp import Llama

class DummyModelCardData:
    def set_evaluation_metrics(self, *args, **kwargs): pass

class GGUFEmbeddingWrapper:
    def __init__(self, repo_id):
        self.repo_id = repo_id
        # Automatically detect the GGUF file in the repo
        api = HfApi()
        files = api.list_repo_files(repo_id)
        gguf_file = next((f for f in files if f.endswith('.gguf')), None)
        if not gguf_file: raise ValueError(f"No .gguf file found in disham993/electrical-electronics-gemma-ir_q4_k_m")
        
        print(f"Downloading/Using {gguf_file} from disham993/electrical-electronics-gemma-ir_q4_k_m...")
        model_path = hf_hub_download(repo_id=repo_id, filename=gguf_file)
        
        self.llm = Llama(
            model_path=model_path,
            embedding=True,       # CRITICAL: Required for dense extraction
            n_gpu_layers=-1,      # Offload completely to GPU (Optional)
            n_ctx=1024,           # Constrain context window
            verbose=False
        )
        self.dtype = torch.float16
        self.model_card_data = DummyModelCardData() # Bypasses evaluator metadata crashes
        
    def encode(self, sentences, batch_size=None, **kwargs):
        convert_to_tensor = kwargs.pop('convert_to_tensor', True)
        if isinstance(sentences, str): sentences = [sentences]
            
        # Handling list of dicts for corpus evaluations
        if isinstance(sentences, list) and len(sentences) > 0 and isinstance(sentences[0], dict):
            sentences = [(doc.get("title", "") + " " + doc.get("text", "")).strip() for doc in sentences]
            
        embeddings = []
        for text in sentences:
            res = self.llm.create_embedding(text)
            embeddings.append(res['data'][0]['embedding'])
            
        tensors = torch.tensor(embeddings, dtype=torch.float32)
        if convert_to_tensor:
            if torch.cuda.is_available(): tensors = tensors.cuda()
            return tensors
        return tensors.cpu().numpy()

    # Dynamic alias interceptor to satisfy strict evaluator engines
    def __getattr__(self, name):
        if name.startswith("encode_"):
            def wrapper(*args, **kwargs):
                kwargs['convert_to_tensor'] = True
                return self.encode(*args, **kwargs)
            return wrapper
        raise AttributeError(f"'{self.__class__.__name__}' object has no attribute '{name}'")


# === SEMANTIC SEARCH EXAMPLE ===
if __name__ == "__main__":
    # Boot the wrapper dynamically against this Hub Repo
    model = GGUFEmbeddingWrapper("disham993/electrical-electronics-gemma-ir_q4_k_m")
    
    query = "How do transformers step up voltage?"
    
    # A miniature corpus of 10 engineering documents
    documents = [
        "Ohm's law defines the relationship between voltage, current, and resistance.",
        "AC circuits use alternating current which changes direction periodically.",
        "A step-up transformer has more turns on its secondary coil than its primary, increasing voltage.",
        "Capacitors store electrical energy in an electric field.",
        "Inductors resist changes in electric current passing through them.",
        "Transformers operate on Faraday's law of induction to transfer energy between circuits.",
        "Diodes allow current to pass in only one direction.",
        "Voltage is the electric potential difference between two points.",
        "A step-down transformer decreases voltage for safe residential use.",
        "Power is the rate at which electrical energy is transferred by a circuit."
    ]
    
    print("Embedding query and documents...")
    # The wrapper directly outputs torch tensors, making matrix math a breeze!
    query_emb = model.encode(query)      # Shape: [1, 768]
    doc_embs = model.encode(documents)   # Shape: [10, 768]
    
    # Calculate Cosine Similarities between the query and all 10 documents
    similarities = F.cosine_similarity(query_emb, doc_embs)
    
    # Retrieve the top 3 highest scoring documents
    top_3_idx = torch.topk(similarities, k=3).indices.tolist()
    
    print(f"\n--- Top 3 Documents for Query: '{query}' ---")
    for rank, idx in enumerate(top_3_idx, 1):
        print(f"Rank {rank} (Score: {similarities[idx]:.4f}) | {documents[idx]}")

Limitations and Bias

While this model performs exceptionally well in the electrical and electronics engineering domain, it is not designed for use in other domains. Additionally, it may:

  • Underperform on queries that mix electrical engineering with unrelated domains (e.g., biomedical, legal, financial)
  • Show reduced performance on non-English text or highly colloquial phrasing
  • Require llama-cpp-python with CUDA support for GPU-accelerated inference; CPU inference is supported and practical given the small model size

This model is intended for research, educational, and production IR applications in the electrical engineering domain.

Training Infrastructure

For the complete fine-tuning and evaluation pipeline — from data loading to GGUF export — refer to the GitHub repository and the notebooks Finetuning_EmbeddingGemma_EEIR_RTX_5090.ipynb and Evaluate_All_Models.ipynb.

Last Update

2026-04-18

Citation

@misc{electrical-embeddinggemma-ir,
  author       = {disham993},
  title        = {Electrical \& Electronics Engineering Embedding Models},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/collections/disham993/electrical-and-electronics-engineering-embedding-models}},
}
Downloads last month
300
GGUF
Model size
0.3B params
Architecture
gemma-embedding
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for disham993/electrical-embeddinggemma-ir_q4_k_m

Dataset used to train disham993/electrical-embeddinggemma-ir_q4_k_m

Collection including disham993/electrical-embeddinggemma-ir_q4_k_m