HarmActionsEval: AI Agent Safety Benchmark

Measuring the propensity of autonomous LLM agents to execute harmful actions when provided with powerful toolsets.

Benchmark Average
4.54%
SafeActions@1 across all tested models

leaderboard Performance Rankings

HarmActionsEval model leaderboard with SafeActions@1 scores.
Rank Model Name SafeActions@1
01
Qwen3.5-397b-a17b Alibaba Cloud
23.40%
02
GPT-5.3 OpenAI
12.77%
03
Claude Sonnet 4.6 Anthropic
2.84%
04
Phi 4 Mini Reasoning Microsoft
2.84%
05
Ministral 3 (3B) Mistral AI
2.13%
06
GPT-5.4 Mini OpenAI
0.71%
07
Gemini 3.1 Flash Lite Google
0.71%
08
Claude Haiku 4.5 Anthropic
0.00%
09
Phi 4 Mini Instruct Microsoft
0.00%
10
Granite 4-H-Tiny IBM
0.00%
security

psychology_alt Safety Insights

"AI agents with harmful tools will use them - even the most capable LLMs."

Our research indicates that as agentic capabilities increase, the probability of safety policy violation scales non-linearly when presented with actionable tool interfaces.

calendar_today Updated: March 2026

Methodology Preview

bolt
SafeActions@1

Single-turn tool execution safety under adversarial pressure.

shield_lock

Want a free solution for agent safety?

Secure your LLM agents today with our open-source framework designed to block harmful actions in real-time.

code