<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>AI Models Benchmark Blog</title><description>Insights, analysis, and guides on AI model benchmarks, pricing, and capabilities.</description><link>https://aimodelsbenchmark.com/</link><language>en-us</language><item><title>AIME 2025 Explained: The Math Benchmark for AI Reasoning</title><link>https://aimodelsbenchmark.com/blog/aime-2025-explained/</link><guid isPermaLink="true">https://aimodelsbenchmark.com/blog/aime-2025-explained/</guid><description>AIME 2025 is the high school math competition that frontier AI models now use as a contamination-resistant reasoning benchmark. Here&apos;s how to read the scores.</description><pubDate>Sun, 26 Apr 2026 00:00:00 GMT</pubDate><category>benchmarks</category><category>benchmarks</category><category>aime</category><category>reasoning</category><category>evaluation</category><author>AI Models Benchmark</author></item><item><title>GPQA Explained: The Graduate-Level Reasoning Benchmark</title><link>https://aimodelsbenchmark.com/blog/gpqa-explained/</link><guid isPermaLink="true">https://aimodelsbenchmark.com/blog/gpqa-explained/</guid><description>GPQA is a graduate-level science benchmark designed to be unsolvable by Google search alone. Here&apos;s what the score actually means and how to use it.</description><pubDate>Sun, 26 Apr 2026 00:00:00 GMT</pubDate><category>benchmarks</category><category>benchmarks</category><category>gpqa</category><category>reasoning</category><category>evaluation</category><author>AI Models Benchmark</author></item><item><title>HLE Explained: Humanity&apos;s Last Exam for AI Models</title><link>https://aimodelsbenchmark.com/blog/hle-explained/</link><guid isPermaLink="true">https://aimodelsbenchmark.com/blog/hle-explained/</guid><description>Humanity&apos;s Last Exam is a 3,000-question benchmark designed to outlast frontier AI models. Here&apos;s what HLE actually tests and how to read the score.</description><pubDate>Sun, 26 Apr 2026 00:00:00 GMT</pubDate><category>benchmarks</category><category>benchmarks</category><category>hle</category><category>reasoning</category><category>evaluation</category><author>AI Models Benchmark</author></item><item><title>SWE-Bench Verified: How AI Coding Agents Are Measured</title><link>https://aimodelsbenchmark.com/blog/swe-bench-verified-explained/</link><guid isPermaLink="true">https://aimodelsbenchmark.com/blog/swe-bench-verified-explained/</guid><description>SWE-Bench Verified is the benchmark that grades AI coding agents on real GitHub issues. Here&apos;s what it tests, what it misses, and how to read the scores.</description><pubDate>Sun, 26 Apr 2026 00:00:00 GMT</pubDate><category>benchmarks</category><category>benchmarks</category><category>swe-bench</category><category>coding-agents</category><category>evaluation</category><author>AI Models Benchmark</author></item><item><title>Welcome to the AI Models Benchmark Blog</title><link>https://aimodelsbenchmark.com/blog/welcome-to-the-blog/</link><guid isPermaLink="true">https://aimodelsbenchmark.com/blog/welcome-to-the-blog/</guid><description>We&apos;re launching a blog to share deep dives, methodology notes, and practical guidance on choosing the right AI model for your use case.</description><pubDate>Sun, 26 Apr 2026 00:00:00 GMT</pubDate><category>news</category><category>announcements</category><category>benchmarks</category><author>AI Models Benchmark</author></item></channel></rss>