About AI Models Benchmark

AI Models Benchmark is a free, open resource that helps developers, researchers, and decision-makers compare leading AI and large language models side by side. We aggregate publicly available benchmark scores, pricing, and capability data into a single, searchable leaderboard so you can find the right model for your use case in seconds.

What We Track

Our leaderboard covers models from providers such as OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, xAI, and many more. For each model we display:

GPQA — Graduate-level STEM question accuracy
AIME 2025 — American Invitational Mathematics Examination score
SWE-bench Verified — Real-world software engineering task performance
HLE — Human-Level Evaluation score
Pricing — Input and output cost per million tokens
Context window, parameter count, license type, and knowledge cutoff

Where the Data Comes From

Benchmark data is sourced from public leaderboards and official model provider publications. Our automated scraper collects and normalizes the data on a regular schedule, so the leaderboard stays current as new models are released and scores are updated.

How to Use the Leaderboard

Visit the home page to explore the full leaderboard. You can sort by any column, search by model or organization name, and select models to compare them side by side. Whether you are picking a model for production, evaluating cost-efficiency, or researching state-of-the-art performance, the leaderboard gives you the data you need at a glance.

Open & Free

AI Models Benchmark is completely free to use. We believe transparent, accessible benchmarking benefits the entire AI community by enabling informed decisions and healthy competition among model providers.