03RUNS

runsv2-smoke-20260425T170433Z

openaicompletedv2 smoke

Gpt 5 Nano

openai/gpt-5-nano
Headline
67.3/100
CI [35.1, 92.2]
Cost
$0.0000
0 judge calls
Cases
8/12≥ 70%
2 failing
Per-band breakdown
  • HARD
    n=7 · ci [78.6, 100.0]
    92.9
  • EXPERT
    n=5 · ci [20.0, 84.1]
    52.1
Cluster radar
Latency & throughput
  • Total elapsed
  • Avg judge latency
  • Judge calls0
  • Total spend$0.0000
  • CompletedSat, 25 Apr 2026 17:04:33 GMT