When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents
Abstract
LLM agents frequently select higher-privilege tools unnecessarily, and while safety alignment doesn't ensure least-privilege choices, a post-training defense can reduce excessive privilege use without sacrificing performance.
As LLM agents increasingly select tools autonomously, their choices among tools with different privileges become safety-relevant. However, prior tool-selection studies focus on safety-agnostic metadata preferences, leaving privilege-sensitive choices underexplored. To address this gap, we study over-privileged tool selection, in which an agent selects or escalates to a higher-privilege tool despite a sufficient lower-privilege alternative. We introduce ToolPrivBench to evaluate whether agents choose higher-privilege tools despite sufficient lower-privilege alternatives, measuring both initial selection and escalation after transient tool failures. Across eight domains and five recurring risk patterns, we find that over-privileged tool selection is common among mainstream LLM agents and is further amplified by transient failures. We further find that general safety alignment does not reliably transfer to least-privilege tool choice, while prompt-level controls provide only limited mitigation under transient failures. We therefore introduce a privilege-aware post-training defense that teaches agents to prefer sufficient lower-privilege tools and escalate only when necessary. Our mitigation experiments show that this defense substantially reduces unnecessary high-privilege tool use while preserving general capabilities.
Community
As LLM agents increasingly select tools autonomously, their choices among tools with different privileges become safety-relevant. However, prior tool-selection studies focus on safety-agnostic metadata preferences, leaving privilege-sensitive choices underexplored. To address this gap, we study over-privileged tool selection, in which an agent selects or escalates to a higher-privilege tool despite a sufficient lower-privilege alternative. We introduce ToolPrivBench to evaluate whether agents choose higher-privilege tools despite sufficient lower-privilege alternatives, measuring both initial selection and escalation after transient tool failures. Across eight domains and five recurring risk patterns, we find that over-privileged tool selection is common among mainstream LLM agents and is further amplified by transient failures. We further find that general safety alignment does not reliably transfer to least-privilege tool choice, while prompt-level controls provide only limited mitigation under transient failures. We therefore introduce a privilege-aware post-training defense that teaches agents to prefer sufficient lower-privilege tools and escalate only when necessary. Our mitigation experiments show that this defense substantially reduces unnecessary high-privilege tool use while preserving general capabilities.
interesting finding, but have you considered that dynamic tool provisioning eliminates this entirely? If the model only sees tools appropriate to its current privilege tier and has to explicitly escalate through a separate evaluation gate to access higher-privilege tools, over-privileged selection becomes structurally impossible regardless of model behaviour. No post-training needed.
Good point — structural controls are genuinely complementary. I'm not sure they'd fully eliminate the problem, though. Tool availability is typically defined by user role, whereas over-privileged selection is task-relative: a workspace-admin tool may be legitimate for the user, yet unnecessary for a calendar query. Filtering it out still requires the same task-to-minimum-privilege inference we evaluate—the decision is shifted to the gate rather than removed. And whoever implements that gate inherits the issue: a human breaks agent autonomy, another LLM just relocates the same bias, and static policies generally can't express per-task minimum privilege. Plus, many real deployments (MCP, IDE agents) already expose a broad tool superset under a single identity — which is the setting we benchmark.
Fair points, and you're right that the inference problem doesn't vanish — it relocates. But I'd argue it becomes a much simpler problem at the gate than inside the agent.
Your own findings show that agents fail at least-privilege selection while simultaneously trying to complete the task. Those are competing objectives in the same forward pass. Separating them means the gate only needs to solve task classification to tool subset, without the completion pressure that drives privilege escalation.
One concrete pattern that eliminates most of the inference entirely: named agents with fixed permission scopes. Instead of one agent choosing from a full tool superset, you route by name — "Jeff handles email, Brian handles calendar." Same underlying model, different tool sets loaded based on invocation. The gatekeeper reduces to name classification, which is trivial.
The read/write distinction simplifies it further. All agents get read access universally — the risk lives in write, delete, and execute operations. Those are scoped to the domain-specific agent. Cross-domain requests that need elevated permissions get bounced to the agent that owns them.
This doesn't require per-task minimum privilege inference. It requires a static permission map and a name router. The hard problem your benchmark measures — selecting minimum privilege from a broad superset — disappears because the superset is never presented.
I'll admit this pattern applies to tool-calling architectures specifically. For unconstrained code generation where privilege is expressed in the output text itself — like generating sudo commands in a terminal — tool-layer gating can't intercept it. That's a different problem requiring execution-layer sandboxing or confirmation gates before anything with elevated privileges runs. Your benchmark's concern fully stands in that context.
But for the growing number of agent deployments built around structured tool calls — MCP, function calling, plugin systems — the architectural fix is available and arguably simpler than trying to train the behaviour out of the model.
I firmly believe that while models can do a lot they need external scaffolding to provide structure and limits.
Get this paper in your agent:
hf papers read 2606.20023 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
