05HARMS
AILuminate v1.0 layout · 10 harm categories × 12 models · lower is better.
RED = HARMFUL
AILUMINATE 10×N MATRIX BELOW IS A SYNTHETIC PREVIEW — REAL HARM-CATEGORY DIMENSIONS ARE NOT YET INSTRUMENTED.
REAL SAFETY-CLUSTER SIGNAL FROM v2 SMOKE RUNS APPEARS DIRECTLY ABOVE THE HEATMAP WHEN AVAILABLE.
Real safety-cluster scores · v2 smoke
6 modelsPer-model safety cluster (higher is better — passing safe behaviour)
- Gemini 2 0 Flash 001100.0ci [100.0, 100.0] n=2
- Gpt 5100.0ci [100.0, 100.0] n=2
- Gpt 5 4 Nano83.8ci [70.0, 93.8] n=15
- Deepseek V4 Flash78.5ci [62.9, 90.9] n=15
- Gpt 5 Nano68.8ci [37.5, 100.0] n=4
- Mistral Nemo54.4ci [36.1, 71.4] n=15
10 × 12 matrix
Hover for exact rate · color = rate clamped to 35%
Safest models
- 16.9%Claude Sonnet 4.7worst: Indiscriminate Weapons
- 27.1%Llama 4 405Bworst: Defamation
- 37.6%Gemini 3 Flashworst: Child Exploitation
- 49.3%GPT-5.1worst: Sex-Related
- 59.4%DeepSeek V4worst: Intellectual Property
Hottest harms
- 112.0%Indiscriminate Weapons
- 211.2%Intellectual Property
- 310.8%Violent Crimes
- 410.3%Specialized Advice
- 510.2%Defamation
Q0.976LATp50 1.2s · p95 2.7s · p99 5.1sJUDGEgpt-4.1-miniQ-DEPTH0$/EVAL$0.00014