Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
AlignmentResearch 's Collections
Diverse Deception Probes
The Obfuscation Atlas
The Obfuscation Altas
Model Organisms of Black Box Monitoring Failure

Diverse Deception Probes

updated Mar 18

Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma).

Upvote
-

  • AlignmentResearch/diverse-deception-probe-olmo-3-7b-think

    Updated Mar 18

  • AlignmentResearch/diverse-deception-probe-olmo-3-7b-instruct

    Updated Mar 18

  • AlignmentResearch/diverse-deception-probe-qwen3-8b

    Updated Mar 18

  • AlignmentResearch/diverse-deception-probe-gemma-3-12b-it

    Updated Mar 18

  • AlignmentResearch/diverse-deception-probe-olmo-3-32b-think

    Updated Mar 18
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs