license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
base_model: ibm-granite/granite-4.0-micro
tags:
- granite
- guardian
- requirement-checking
- instruction-following
- lora
- peft
Requirement Checker (LoRA)
Model Summary: Requirement Checker is a lightweight LoRA adapter that brings requirement checking capabilities to the ibm-granite/granite-4.0-micro base model. The adapter is trained to judge whether an assistant's generation satisfies a set of user-specified requirements, outputting a binary yes/no assessment. The model outputs a JSON object {"score": "yes"} or {"score": "no"} indicating whether the given constraints are satisfied.
- Developers: IBM Research
- HF Collection: Granite Libraries
- Github Repository: ibm-granite
- Release Date: March 18th, 2026
- License: Apache 2.0
- Paper: Granite Guardian
Usage
Intended Use: The Requirement Checker adapter is designed for evaluating whether LLM-generated responses satisfy user-specified constraints. Key use cases include:
Use Case
- Instruction-following evaluation: Assessing whether a model's output adheres to specific formatting, content, or structural requirements provided in the prompt.
- Constraint satisfaction checking: Verifying that generated text meets multiple simultaneous constraints (e.g., length limits, style requirements, content inclusion/exclusion).
- Quality assurance pipelines: Automated checking of LLM outputs against predefined acceptance criteria.
Installation
pip install transformers peft torch
Quickstart Example (LoRA)
import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_model_id = "ibm-granite/granite-4.0-micro"
adapter_repo = "ibm-granite/granitelib-core-r1.0"
adapter_subfolder = "requirement-check/granite-4.0-micro/lora"
# Load base model and LoRA adapter
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base_model, adapter_repo, subfolder=adapter_subfolder)
model.eval()
# Define the user prompt, assistant response, and constraints to check
user_text = "Invite for an IBM office party."
response_text = """
Dear Team,
To celebrate our recent successes and take a well-deserved moment to recharge,
you are cordially invited to a team social. Please join us for an evening of
live music, appetizers, and drinks as we recognize our collective wins.
Event Details
* **Date:** Saturday, April 25, 2026
* **Time:** 6:00 PM
* **Location:** Ryan’s Bar, Chelsea, NY
* **Highlights:** Live entertainment and refreshments
RSVP
To ensure we have an accurate headcount for catering, please confirm your
attendance by **Friday, April 10, 2026**.
We look forward to seeing everyone there and celebrating our hard work together.
**Best regards,**
[Your Name/Management Team]
"""
constraints = "Use a professional tone."
# Build the evaluation prompt
evaluation_prompt = (
"Please verify if the assistant's generation satisfies the user's "
"requirements or not and reply with a binary label accordingly. "
'Respond with a json {"score": "yes"} if the constraints are '
'satisfied or respond with {"score": "no"} if the constraints are not '
"satisfied."
)
messages = [
{"role": "user", "content": user_text},
{"role": "assistant", "content": response_text},
{"role": "user", "content": f"<requirements> {constraints}\n{evaluation_prompt}"},
]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
input_len = inputs["input_ids"].shape[1]
stop_token_id = tokenizer.encode("}", add_special_tokens=False)[0]
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=20,
do_sample=False,
eos_token_id=[tokenizer.eos_token_id, stop_token_id],
)
response = tokenizer.decode(output[0][input_len:], skip_special_tokens=True).strip()
brace_idx = response.find("}")
if brace_idx != -1:
response = response[:brace_idx + 1]
print(f"Response: {response}") # {"score": "yes"}
result = json.loads(response)
print(f"Constraints satisfied: {result['score']}") # yes
Prompt Template
The requirement checker uses a structured prompt format where user/assistant messages are followed by a requirement evaluation turn. The <requirements> tag marks the constraints to be evaluated:
<user message>
<assistant response>
<requirements> <constraints>
<evaluation prompt>
Typical usage will check the last assistant generation and the following text can be used:
evaluation_prompt = (
"Please verify if the assistant's generation satisfies the user's "
"requirements or not and reply with a binary label accordingly. "
'Respond with a json {"score": "yes"} if the constraints are '
'satisfied or respond with {"score": "no"} if the constraints are not '
"satisfied."
)
Evaluations
Binary classification performance on instruction-following benchmarks: HelpSteer3, IFEval Multi-Constraint, and InfoBench (GPT-4-annotated and human-annotated splits).
| Benchmark | AUC | Accuracy | Bal. Acc. | F1 | Precision | Recall |
|---|---|---|---|---|---|---|
| HelpSteer3 | 0.8260 | 0.7463 | 0.7434 | 0.7686 | 0.7598 | 0.7776 |
| IFEval Multi-Constraint | 0.9186 | 0.8459 | 0.8403 | 0.8092 | 0.8071 | 0.8114 |
| InfoBench (GPT-4 Annotated) | 0.7660 | 0.7304 | 0.6950 | 0.8207 | 0.9260 | 0.7394 |
| InfoBench (Human Annotated) | 0.7311 | 0.7523 | 0.6799 | 0.8034 | 0.8918 | 0.7409 |
The evaluations on Helpsteer3 and IFEval above are on the validation set and a heldout set, respectively.
Training Data
The Requirement Checker adapter is fine-tuned on top of ibm-granite/granite-4.0-micro using a combination of instruction-following evaluation data. Training data includes samples with explicit user requirements and constraints paired with assistant responses, annotated for whether the constraints are satisfied.
The two sources of training data are:
- IF-RLVR training data, with up to 5 constraints per instruction. Each prompt (with constraints) were passed through Mixtral 8x22b Instruct. Then, the [prompt, constraints, response] are all passed through a programmatic evaluation pipeline which provides yes/no labels corresponding to whether the constraints were followed or not.
- Helpsteer3-Preference. The original dataset consists of [prompt, response1, response2, feedback]. The
feedbackcolumn contains individual rankings (a score from 1-5) of each response from multiple annotators. We map scores 4-5 to a label of yes and scores 1-3 to a label of no, take the average of the annotators's scores and map each of the two responses to a binary yes/no label accordingly.
Adapter Configuration
| Property | LoRA |
|---|---|
| Base Model | ibm-granite/granite-4.0-micro |
| PEFT Type | LORA |
| Rank (r) | 64 |
| Alpha | 64 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, input_linear, output_linear |
| vLLM Support | Yes |
Evaluation
The evaluations on Helpsteer3 and IFEval above are on the validation set and a heldout set, respectively.
Infrastructure
Training was completed using 8 H100 GPUs. Evaluation (and inference) requires 1 H100 GPU.
Ethical Considerations / Limitations
- The model must only be used in the prescribed evaluation mode, outputting JSON responses (
{"score": "yes"/"no"}) based on the specified prompt template. Any deviation from this intended use may lead to unexpected outputs. - The model is designed for evaluating constraint satisfaction in instruction-following scenarios.
- The model is only trained and tested on English data.
- The LoRA adapter is compatible with both vLLM (for efficient batched inference) and HuggingFace Transformers + PEFT.
Citation
@misc{padhi2024graniteguardian,
title={Granite Guardian},
author={Inkit Padhi and Manish Nagireddy and Giandomenico Cornacchia and Subhajit Chaudhury and Tejaswini Pedapati and Pierre Dognin and Keerthiram Murugesan and Erik Miehling and Mart\'{i}n Santill\'{a}n Cooper and Kieran Fraser and Giulio Zizzo and Muhammad Zaid Hameed and Mark Purcell and Michael Desmond and Qian Pan and Zahra Ashktorab and Inge Vejsbjerg and Elizabeth M. Daly and Michael Hind and Werner Geyer and Ambrish Rawat and Kush R. Varshney and Prasanna Sattigeri},
year={2024},
eprint={2412.07724},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.07724},
}
Resources
- ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite
- 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
- 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources