Evaluation Metrics
Opt@1: Estimator of fraction of tasks where a single attempt achieves ≥95% human speedup and passes correctness tests.
Opt@K: Estimator of fraction of tasks where at least one attempt among K tries achieves ≥95% human speedup and passes correctness tests.
| Rank | Model | Scaffold | Setting | Score | Date | 
|---|---|---|---|---|---|
| Loading GSO leaderboard from official website... | |||||