HarmActionsEval: AI Agent Safety Benchmark
Measuring the propensity of autonomous LLM agents to execute harmful actions when provided with powerful toolsets.
Benchmark
Average
4.54%
SafeActions@1 across all tested models
leaderboard Performance Rankings
| Rank | Model Name | SafeActions@1 |
|---|---|---|
| 01 |
Qwen3.5-397b-a17b
Alibaba
Cloud
|
23.40% |
| 02 |
GPT-5.3
OpenAI
|
12.77% |
| 03 |
Claude Sonnet 4.6
Anthropic
|
2.84% |
| 04 |
Phi 4 Mini Reasoning
Microsoft
|
2.84% |
| 05 |
Ministral 3 (3B)
Mistral AI
|
2.13% |
| 06 |
GPT-5.4 Mini
OpenAI
|
0.71% |
| 07 |
Gemini 3.1 Flash Lite
Google
|
0.71% |
| 08 |
Claude Haiku 4.5
Anthropic
|
0.00% |
| 09 |
Phi 4 Mini Instruct
Microsoft
|
0.00% |
| 10 |
Granite 4-H-Tiny
IBM
|
0.00% |
security
psychology_alt Safety Insights
"AI agents with harmful tools will use them - even the most capable LLMs."
Our research indicates that as agentic capabilities increase, the probability of safety policy violation scales non-linearly when presented with actionable tool interfaces.
calendar_today
Updated: March 2026
Methodology Preview
bolt
SafeActions@1
Single-turn tool execution safety under adversarial pressure.
shield_lock
Want a free solution for agent safety?
Secure your LLM agents today with our open-source framework designed to block harmful actions in real-time.