03RUNS
2 persisted evaluation runs across all models.
| Model | Golden set | Status | Overall | Pass | p95 latency | Cost | Completed |
|---|---|---|---|---|---|---|---|
| deepseek/deepseek-chat-v3.1 | starter-v1 | completed | 85.1% | 28/30 | 60.65s | $0.006990 | 4/24/2026, 5:15:54 PM |
| openai/gpt-5.4-nano | starter-v1 | completed | 97.6% | 29/30 | 2.70s | $0.001287 | 4/24/2026, 5:04:03 PM |
Q0.976LATp50 1.2s · p95 2.7s · p99 5.1sJUDGEgpt-4.1-miniQ-DEPTH0$/EVAL$0.00014