QueryShield โ Multilingual Prompt Optimizer
QueryShield-1.5B is a fine-tuned version of Qwen2.5-1.5B-Instruct trained to rewrite raw, messy user queries into detailed, structured instruction prompts for downstream LLMs โ across 5 languages and 30 professional domains.
Given a raw user question โ outputs an expert-level optimized prompt telling a downstream LLM how to answer it.
What it does
Most LLMs perform significantly better when given structured, detailed prompts rather than raw user input. QueryShield sits between the user and the LLM โ it takes the raw query and rewrites it into a high-quality instruction prompt automatically.
User: "menga diabetni boshqarish uchun ovqat rejimi ayting"
โ QueryShield
Optimized: "As a Medical Expert, the user is asking in Uzbek about dietary
management for diabetes with high blood sugar. Provide a structured
3-tier response covering: diabetes basics, dietary assessment, and
an actionable meal plan. Respond entirely in Uzbek. Avoid jargon..."
โ Downstream LLM
Final answer in Uzbek โ
Model Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
| Training data | QueryShield Multilingual Dataset |
| Training rows | 19,530 |
| Epochs | 3 |
| Train loss | 0.88 โ 0.47 |
| Eval loss | 0.967 (best checkpoint) |
| GPU | NVIDIA RTX 3090 24GB |
| Training time | ~3.7 hours |
| Parameters | 1.5B total / 147M trainable (8.7%) |
| Live demo | โถ Kaggle Notebook |
Languages
| Language | Code | Support |
|---|---|---|
| English | en |
โ Full |
| Uzbek | uz |
โ Full |
| Russian | ru |
โ Full |
| Kazakh | kk |
โ Full |
| Karakalpak | kaa |
โ Good |
Cross-lingual scenarios supported โ user can write in one language and request output in another (e.g., Uzbek input โ Russian output).
Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "nickoo004/queryshield-1.5b"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
SYSTEM = (
"You are QueryShield, a multilingual prompt optimizer. "
"Given a raw user question, rewrite it into a detailed instruction "
"prompt for a downstream LLM expert. "
"User language: {in_lang}. Response language: {out_lang}. "
"Expert role: {role}."
)
def optimize_prompt(user_question, input_language, output_language, role):
messages = [
{"role": "system", "content": SYSTEM.format(
in_lang=input_language,
out_lang=output_language,
role=role,
)},
{"role": "user", "content": user_question},
]
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
repetition_penalty=1.1,
pad_token_id=tokenizer.eos_token_id,
)
new_tokens = output[0][inputs["input_ids"].shape[1]:]
return tokenizer.decode(new_tokens, skip_special_tokens=True)
# Example 1 โ Uzbek monolingual
result = optimize_prompt(
user_question="menga diabetni boshqarish uchun eng yaxshi ovqatlanish rejimini ayting",
input_language="Uzbek",
output_language="Uzbek",
role="Medical Expert",
)
print(result)
# Example 2 โ Cross-lingual: Kazakh -> Uzbek
result = optimize_prompt(
user_question="ะผะตะฝัาฃ ัะตัะผะฐะผะดะฐ ัะพะฟััะฐา ัะฐะฟะฐัั ะฝะฐัะฐั, ะฝะต ัััะตััะผ ะบะตัะตะบ?",
input_language="Kazakh",
output_language="Uzbek",
role="Agricultural Scientist",
)
print(result)
Live Demo
โถ Run on Kaggle โ no setup needed, free GPU included.
Tests all 7 cases: English, Uzbek, Russian, Kazakh, Karakalpak + 2 cross-lingual pairs.
Supported Domains (30 total)
| Domain | Expert Role |
|---|---|
| Software Engineering | Senior Software Engineer |
| Healthcare & Medicine | Medical Expert |
| Finance & Banking | Financial Analyst |
| Legal & Law | Legal Advisor |
| Data Science & AI | Data Scientist |
| Cybersecurity | Cybersecurity Specialist |
| Aviation & Aerospace | Aerospace Engineer |
| Agriculture | Agricultural Scientist |
| Education & Teaching | Experienced Educator |
| Automotive | Automotive Engineer |
| Pharmaceuticals | Pharmaceutical Researcher |
| Manufacturing | Manufacturing Expert |
| Civil / Mechanical / Electrical Engineering | Domain Engineer |
| Business & Marketing | Business Strategist |
| Creative Writing | Professional Writer |
| โฆ and 15 more | โฆ |
Training Details
Dataset
- Source: nickoo004/queryshield-multilingual
- 19,530 rows across 5 languages and 30 domains
- Generated by DeepSeek, Gemini, and Qwen2.5-14B
Loss Curve
Epoch 1.0 -> train: 1.023 | eval: 0.997
Epoch 2.5 -> train: 0.731 | eval: 0.967 <- best checkpoint
Limitations
- Karakalpak support is functional but may be less consistent than other languages due to limited training data for this low-resource language
optimized_promptoutput is always structured as an English instruction โ this is by design- Best results on domains covered in training data; novel domains may produce generic prompts
- Not suitable for harmful, illegal, or unethical query optimization
Citation
@model{queryshield_1_5b_2026,
author = {nickoo004},
title = {QueryShield-1.5B: Multilingual Prompt Optimizer},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/nickoo004/queryshield-1.5b}
}
License
This model is released under the MIT License. Base model license: Qwen License
- Downloads last month
- 230