GR_008 · Explicit Tool Authorization Boundary
Failure domain mitigated
- FP_008 Tool Authority Escalation via Prompt Injection
Real failure → Minimal reproduction → Mechanism → Guardrail → Atlas update.
Invariant enforced
- INV_009 — tool execution authority is enforced outside model reasoning.
Design principle
Separate tool availability from tool authorization. Model planning may propose a tool, but execution must pass deterministic policy checks in runtime.
Boundary architecture
Baseline vs guarded flow
flowchart TB
subgraph Baseline
u[User prompt]
r[LLM reasoning]
t[tool call]
x[execution]
u --> r --> t --> x
end
flowchart TB
subgraph Guarded
u2[User prompt]
r2[LLM reasoning]
tr[tool request]
pa[policy authorization]
x2[execution]
d[[deny]]
u2 --> r2 --> tr --> pa
pa -->|allow| x2
pa -->|deny| d
end
Security boundary: the authorization step is outside model reasoning and is evaluated by deterministic runtime policy.
Implementation sketch (minimal)
- Classify tools by sensitivity (
public,sensitive). - Assign allowed tools per request class (
standard_user,privileged_diagnostic). - At dispatch, call policy authorizer before executing tool.
- If denied, do not execute tool and return policy refusal event.
- Log all denials with request class, tool, and reason.
Empirical evidence (FM_008)
| model | adversarial prompts | baseline sensitive calls | guarded sensitive calls |
|---|---|---|---|
| qwen2.5-coder:7b | 3 | 3 | 0 |
| qwen2.5-coder:1.5b | 3 | 1 | 0 |
Result interpretation:
- Baseline permits prompt-steered sensitive tool execution.
- Guarded runtime policy reduces sensitive calls on adversarial prompts to zero in this experiment.
Tradeoffs / limits / failure of guardrail itself
- Misclassified tools can under- or over-block.
- Coarse request classes may not capture nuanced legitimate workflows.
- Prompt-driven escalation can still occur through indirect capabilities hidden in “safe” tools.
Explicit links
- atlas entry:
atlas/FP_008_tool_authority_escalation_via_prompt_injection.md - lab proof:
lab/failure_modes/FM_008_tool_authority_escalation/