view article Article OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments +3 1 day ago • 13
DavidAU/Mistral-Nemo-Inst-2407-12B-Thinking-Uncensored-HERETIC-HI-Claude-Opus Text Generation • 12B • Updated Jan 12 • 713 • 18
view article Article Red Teaming with RL: Exploiting Tinker API for Harmful RL on 235B Model Jan 1 • 18
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models Paper • 2506.14682 • Published Jun 17, 2025
MAIF: Enforcing AI Trust and Provenance with an Artifact-Centric Agentic Paradigm Paper • 2511.15097 • Published Nov 19, 2025