03RUNS

runsv2-eval-20260502T190525Z

openaicompletedv2 smoke

Gpt 5 4 Nano

openai/gpt-5.4-nano
Headline
67.0/100
CI [50.5, 76.9]
Cost
$4.6291
742 judge calls
Cases
27/42≥ 70%
5 failing
Per-band breakdown
  • MEDIUM
    n=10 · ci [75.9, 93.3]
    85.2
  • HARD
    n=19 · ci [60.4, 86.9]
    74.3
  • EXPERT
    n=6 · ci [20.7, 62.1]
    42.4
  • TRIVIAL
    n=1 · ci [100.0, 100.0]
    100.0
  • EASY
    n=6 · ci [94.6, 99.7]
    97.7
Cluster radar
Latency & throughput
  • Total elapsed2002.52s
  • Avg judge latency6.28s
  • Judge calls742
  • Total spend$4.6291
  • CompletedSat, 02 May 2026 19:05:25 GMT