03RUNS
runsv2-eval-20260502T190525Z
openaicompletedv2 smoke
Gpt 5 4 Nano
openai/gpt-5.4-nanoHeadline
67.0/100
CI [50.5, 76.9]Cost
$4.6291
742 judge callsCases
27/42≥ 70%
5 failingPer-band breakdown
- MEDIUM85.2n=10 · ci [75.9, 93.3]
- HARD74.3n=19 · ci [60.4, 86.9]
- EXPERT42.4n=6 · ci [20.7, 62.1]
- TRIVIAL100.0n=1 · ci [100.0, 100.0]
- EASY97.7n=6 · ci [94.6, 99.7]
Cluster radar
Latency & throughput
- Total elapsed2002.52s
- Avg judge latency6.28s
- Judge calls742
- Total spend$4.6291
- CompletedSat, 02 May 2026 19:05:25 GMT
Q0.976LATp50 1.2s · p95 2.7s · p99 5.1sJUDGEgpt-4.1-miniQ-DEPTH0$/EVAL$0.00014