Posts tagged #coding-agents

1 post

benchmarks

SWE-Bench Verified: How AI Coding Agents Are Measured

SWE-Bench Verified is the benchmark that grades AI coding agents on real GitHub issues. Here's what it tests, what it misses, and how to read the scores.

  • #benchmarks
  • #swe-bench
  • #coding-agents