benchmarks
SWE-Bench Verified: How AI Coding Agents Are Measured
SWE-Bench Verified is the benchmark that grades AI coding agents on real GitHub issues. Here's what it tests, what it misses, and how to read the scores.
- #benchmarks
- #swe-bench
- #coding-agents