Activation Oracle: gemma-4-E2B-it

This is a LoRA adapter that turns gemma-4-E2B-it into an activation oracle -- an LLM that can read and interpret the internal activations of other LLMs (or itself) in natural language.

What is an activation oracle?

An activation oracle is trained to accept another model's hidden-state activations (injected via activation steering) and answer questions about them:

  • "What topic is the model thinking about?" -- classification from activations
  • "What token will come next?" -- next-token prediction from hidden states
  • "Is this SAE feature active?" -- sparse autoencoder feature detection

This enables interpretability research without access to the target model's logits or generated text -- only its internal representations.

Paper: Activation Oracles (arXiv:2512.15674)

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-E2B-it",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E2B-it")

# Load the activation oracle LoRA
model = PeftModel.from_pretrained(base_model, "EvilScript/activation-oracle-gemma-4-E2B-it-step-10000")
model.eval()

Training Details

Parameter Value
Base model google/gemma-4-E2B-it
Adapter LoRA
Training tasks LatentQA, classification, PastLens (next-token), SAE features
Activation injection Steering vectors at intermediate layers
Layer coverage 25%, 50%, 75% depth

Training Data

The oracle is trained on a mixture of:

  1. LatentQA -- open-ended questions about hidden states
  2. Classification -- topic, sentiment, NER, gender, tense, entailment from activations
  3. PastLens -- predicting upcoming tokens from hidden states
  4. SAE features -- identifying active sparse autoencoder features

Related Resources

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EvilScript/activation-oracle-gemma-4-E2B-it-step-10000

Adapter
(51)
this model

Paper for EvilScript/activation-oracle-gemma-4-E2B-it-step-10000