03RUNS
runsv2-eval-20260502T175124Z
mistralcompletedv2 smoke
Mistral Nemo
mistralai/mistral-nemoHeadline
49.8/100
CI [36.0, 59.6]Cost
$4.6795
742 judge callsCases
17/42≥ 70%
15 failingPer-band breakdown
- MEDIUM56.4n=10 · ci [37.4, 75.2]
- HARD53.1n=19 · ci [39.2, 67.0]
- EXPERT31.9n=6 · ci [15.6, 48.5]
- TRIVIAL100.0n=1 · ci [100.0, 100.0]
- EASY78.1n=6 · ci [54.3, 95.4]
Cluster radar
Latency & throughput
- Total elapsed1651.39s
- Avg judge latency5.81s
- Judge calls742
- Total spend$4.6795
- CompletedSat, 02 May 2026 17:51:24 GMT
Q0.976LATp50 1.2s · p95 2.7s · p99 5.1sJUDGEgpt-4.1-miniQ-DEPTH0$/EVAL$0.00014