HarmActionsEval: AI Agent Safety Benchmark

Measuring the propensity of autonomous LLM agents to execute harmful actions when provided with powerful toolsets.

Benchmark Average

4.54%

SafeActions@1 across all tested models

leaderboard Performance Rankings

HarmActionsEval model leaderboard with SafeActions@1 scores.
Rank	Model Name	SafeActions@1
01	Qwen3.5-397b-a17b Alibaba Cloud	23.40%
02	GPT-5.3 OpenAI	12.77%
03	Claude Sonnet 4.6 Anthropic	2.84%
04	Phi 4 Mini Reasoning Microsoft	2.84%
05	Ministral 3 (3B) Mistral AI	2.13%
06	GPT-5.4 Mini OpenAI	0.71%
07	Gemini 3.1 Flash Lite Google	0.71%
08	Claude Haiku 4.5 Anthropic	0.00%
09	Phi 4 Mini Instruct Microsoft	0.00%
10	Granite 4-H-Tiny IBM	0.00%

security

psychology_alt Safety Insights

"AI agents with harmful tools will use them - even the most capable LLMs."

Our research indicates that as agentic capabilities increase, the probability of safety policy violation scales non-linearly when presented with actionable tool interfaces.

calendar_today Updated: March 2026

Methodology Preview

bolt

SafeActions@1

Single-turn tool execution safety under adversarial pressure.

shield_lock

Want a free solution for agent safety?

Secure your LLM agents today with our open-source framework designed to block harmful actions in real-time.

Try Agent Action Guard open_in_new