benchmarks
HLE Explained: Humanity's Last Exam for AI Models
Humanity's Last Exam is a 3,000-question benchmark designed to outlast frontier AI models. Here's what HLE actually tests and how to read the score.
- #benchmarks
- #hle
- #reasoning
1 post
Humanity's Last Exam is a 3,000-question benchmark designed to outlast frontier AI models. Here's what HLE actually tests and how to read the score.