Update requirement-check/granite-4.0-micro/README.md

3a8b42e verified 20 days ago

12.8 kB

license: apache-2.0
language:
  - en
pipeline_tag: text-generation
library_name: transformers
base_model: ibm-granite/granite-4.0-micro
tags:
  - granite
  - guardian
  - requirement-checking
  - instruction-following
  - lora
  - peft

Requirement Checker (LoRA)

Model Summary: Requirement Checker is a lightweight LoRA adapter that brings requirement checking capabilities to the ibm-granite/granite-4.0-micro base model. The adapter is trained to judge whether an assistant's generation satisfies a set of user-specified requirements, outputting a binary yes/no assessment. The model outputs a JSON object {"score": "yes"} or {"score": "no"} indicating whether the given constraints are satisfied.

Developers: IBM Research
HF Collection: Granite Libraries
Github Repository: ibm-granite
Release Date: March 18th, 2026
License: Apache 2.0
Paper: Granite Guardian

Usage

Intended Use: The Requirement Checker adapter is designed for evaluating whether LLM-generated responses satisfy user-specified constraints. Key use cases include:

Use Case

Instruction-following evaluation: Assessing whether a model's output adheres to specific formatting, content, or structural requirements provided in the prompt.
Constraint satisfaction checking: Verifying that generated text meets multiple simultaneous constraints (e.g., length limits, style requirements, content inclusion/exclusion).
Quality assurance pipelines: Automated checking of LLM outputs against predefined acceptance criteria.

Installation

pip install transformers peft torch

Quickstart Example (LoRA)

import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model_id = "ibm-granite/granite-4.0-micro"
adapter_repo = "ibm-granite/granitelib-core-r1.0"
adapter_subfolder = "requirement-check/granite-4.0-micro/lora"

# Load base model and LoRA adapter
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base_model, adapter_repo, subfolder=adapter_subfolder)
model.eval()

# Define the user prompt, assistant response, and constraints to check
user_text = "Invite for an IBM office party."
response_text = """
Dear Team,

To celebrate our recent successes and take a well-deserved moment to recharge,
you are cordially invited to a team social. Please join us for an evening of
live music, appetizers, and drinks as we recognize our collective wins.

Event Details
* **Date:** Saturday, April 25, 2026
* **Time:** 6:00 PM
* **Location:** Ryan’s Bar, Chelsea, NY
* **Highlights:** Live entertainment and refreshments

RSVP
To ensure we have an accurate headcount for catering, please confirm your
attendance by **Friday, April 10, 2026**.

We look forward to seeing everyone there and celebrating our hard work together.

**Best regards,**
[Your Name/Management Team]
"""

constraints = "Use a professional tone."

# Build the evaluation prompt
evaluation_prompt = (
    "Please verify if the assistant's generation satisfies the user's "
    "requirements or not and reply with a binary label accordingly. "
    'Respond with a json {"score": "yes"} if the constraints are '
    'satisfied or respond with {"score": "no"} if the constraints are not '
    "satisfied."
)

messages = [
    {"role": "user", "content": user_text},
    {"role": "assistant", "content": response_text},
    {"role": "user", "content": f"<requirements> {constraints}\n{evaluation_prompt}"},
]

prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
input_len = inputs["input_ids"].shape[1]

stop_token_id = tokenizer.encode("}", add_special_tokens=False)[0]

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=20,
        do_sample=False,
        eos_token_id=[tokenizer.eos_token_id, stop_token_id],
    )

response = tokenizer.decode(output[0][input_len:], skip_special_tokens=True).strip()

brace_idx = response.find("}")
if brace_idx != -1:
    response = response[:brace_idx + 1]

print(f"Response: {response}")  # {"score": "yes"}

result = json.loads(response)
print(f"Constraints satisfied: {result['score']}")  # yes

Prompt Template

The requirement checker uses a structured prompt format where user/assistant messages are followed by a requirement evaluation turn. The <requirements> tag marks the constraints to be evaluated:

<user message>
<assistant response>
<requirements> <constraints>
<evaluation prompt>

Typical usage will check the last assistant generation and the following text can be used:

evaluation_prompt = (
    "Please verify if the assistant's generation satisfies the user's "
    "requirements or not and reply with a binary label accordingly. "
    'Respond with a json {"score": "yes"} if the constraints are '
    'satisfied or respond with {"score": "no"} if the constraints are not '
    "satisfied."
)

Evaluations

Binary classification performance on instruction-following benchmarks: HelpSteer3, IFEval Multi-Constraint, and InfoBench (GPT-4-annotated and human-annotated splits).

**Requirement Checking Benchmarks**
Benchmark	AUC	Accuracy	Bal. Acc.	F1	Precision	Recall
HelpSteer3	0.8260	0.7463	0.7434	0.7686	0.7598	0.7776
IFEval Multi-Constraint	0.9186	0.8459	0.8403	0.8092	0.8071	0.8114
InfoBench (GPT-4 Annotated)	0.7660	0.7304	0.6950	0.8207	0.9260	0.7394
InfoBench (Human Annotated)	0.7311	0.7523	0.6799	0.8034	0.8918	0.7409

The evaluations on Helpsteer3 and IFEval above are on the validation set and a heldout set, respectively.

Training Data

The Requirement Checker adapter is fine-tuned on top of ibm-granite/granite-4.0-micro using a combination of instruction-following evaluation data. Training data includes samples with explicit user requirements and constraints paired with assistant responses, annotated for whether the constraints are satisfied.

The two sources of training data are:

IF-RLVR training data, with up to 5 constraints per instruction. Each prompt (with constraints) were passed through Mixtral 8x22b Instruct. Then, the [prompt, constraints, response] are all passed through a programmatic evaluation pipeline which provides yes/no labels corresponding to whether the constraints were followed or not.
Helpsteer3-Preference. The original dataset consists of [prompt, response1, response2, feedback]. The feedback column contains individual rankings (a score from 1-5) of each response from multiple annotators. We map scores 4-5 to a label of yes and scores 1-3 to a label of no, take the average of the annotators's scores and map each of the two responses to a binary yes/no label accordingly.

Adapter Configuration

Property	LoRA
Base Model	ibm-granite/granite-4.0-micro
PEFT Type	LORA
Rank (r)	64
Alpha	64
Target Modules	q_proj, k_proj, v_proj, o_proj, input_linear, output_linear
vLLM Support	Yes

Evaluation

The evaluations on Helpsteer3 and IFEval above are on the validation set and a heldout set, respectively.

Infrastructure

Training was completed using 8 H100 GPUs. Evaluation (and inference) requires 1 H100 GPU.

Ethical Considerations / Limitations

The model must only be used in the prescribed evaluation mode, outputting JSON responses ({"score": "yes"/"no"}) based on the specified prompt template. Any deviation from this intended use may lead to unexpected outputs.
The model is designed for evaluating constraint satisfaction in instruction-following scenarios.
The model is only trained and tested on English data.
The LoRA adapter is compatible with both vLLM (for efficient batched inference) and HuggingFace Transformers + PEFT.

Citation

@misc{padhi2024graniteguardian,
      title={Granite Guardian},
      author={Inkit Padhi and Manish Nagireddy and Giandomenico Cornacchia and Subhajit Chaudhury and Tejaswini Pedapati and Pierre Dognin and Keerthiram Murugesan and Erik Miehling and Mart\'{i}n Santill\'{a}n Cooper and Kieran Fraser and Giulio Zizzo and Muhammad Zaid Hameed and Mark Purcell and Michael Desmond and Qian Pan and Zahra Ashktorab and Inge Vejsbjerg and Elizabeth M. Daly and Michael Hind and Werner Geyer and Ambrish Rawat and Kush R. Varshney and Prasanna Sattigeri},
      year={2024},
      eprint={2412.07724},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.07724},
}

Resources

⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite
📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources