03RUNS
runsv2-eval-20260502T210345Z
deepseekcompletedv2 smoke
Deepseek V4 Flash
deepseek/deepseek-v4-flashHeadline
76.7/100
CI [59.4, 88.3]Cost
$5.0250
742 judge callsCases
33/42≥ 70%
2 failingPer-band breakdown
- MEDIUM80.3n=10 · ci [65.3, 92.6]
- HARD88.4n=19 · ci [82.4, 93.6]
- EXPERT57.0n=6 · ci [30.9, 82.4]
- TRIVIAL100.0n=1 · ci [100.0, 100.0]
- EASY99.4n=6 · ci [98.8, 100.0]
Cluster radar
Latency & throughput
- Total elapsed2413.72s
- Avg judge latency7.06s
- Judge calls742
- Total spend$5.0250
- CompletedSat, 02 May 2026 21:03:45 GMT
Q0.976LATp50 1.2s · p95 2.7s · p99 5.1sJUDGEgpt-4.1-miniQ-DEPTH0$/EVAL$0.00014