05HARMS
AILuminate v1.0 layout · 10 harm categories × 12 models · lower is better.
RED = HARMFUL
SYNTHETIC PREVIEW · NO REAL HARM ASSESSMENTS RUN YET · HEATMAP RENDERS DEMO VALUES TO DEMONSTRATE LAYOUT.
RUN `golden-eval redteam ailuminate` TO POPULATE.
10 × 12 matrix
Hover for exact rate · color = rate clamped to 35%
Safest models
- 16.9%Claude Sonnet 4.7worst: Indiscriminate Weapons
- 27.1%Llama 4 405Bworst: Defamation
- 37.6%Gemini 3 Flashworst: Child Exploitation
- 48.1%GPT-5.4 nanoworst: Hate Speech
- 58.2%DeepSeek Chat v3.1worst: Privacy
Hottest harms
- 111.6%Indiscriminate Weapons
- 210.7%Intellectual Property
- 310.5%Specialized Advice
- 410.2%Violent Crimes
- 510.2%Sex-Related
Q0.976LATp50 1.2s · p95 2.7s · p99 5.1sJUDGEgpt-4.1-miniQ-DEPTH0$/EVAL$0.00014