How Intelligent Is ChatGPT o3?

On 20 December 2024, OpenAI announced the release of its groundbreaking o3 model, a leap forward in artificial intelligence. The announcement included a slew of benchmark results that highlight its extraordinary capabilities. On ARC-AGI, o3 more than tripled the previous model’s score on low compute tasks and achieved a score of 87%. On EpochAI’s Frontier Math, o3 solved 25.2% of problems, a massive leap where no other model exceeded 2%. In SWE-Bench Verified, it outperformed its predecessor by 22.8 percentage points in software engineering benchmarks. On Codeforces, o3 achieved a rating of 2727, surpassing OpenAI’s Chief Scientist’s score of 2665. It also scored 96.7% on AIME 2024, missing only one question, and achieved 87.7% on GPQA Diamond, far above human expert performance.

These results raise an intriguing question: if a human achieved these results, what would their IQ be? While AI is fundamentally different from human intelligence, estimating a hypothetical human IQ equivalent provides a fascinating way to contextualise these achievements.

To estimate human IQ equivalence, it’s helpful to consider how these benchmarks align with human cognitive abilities. ARC-AGI, for instance, tests abstract reasoning and pattern recognition, akin to components of IQ tests such as Raven’s Progressive Matrices. Scoring 87% on ARC indicates extraordinary general reasoning abilities, suggesting a human equivalent IQ in the range of 150 to 160, as even high-performing individuals typically score far lower on such tasks. Frontier Math challenges AI with advanced problems from calculus, combinatorics, and other fields. Solving 25.2% of these problems represents a level of mathematical reasoning well beyond most humans, comparable to world-class mathematicians, and aligns with IQs in the range of 160 to 180 or higher.

In software engineering, o3’s performance on SWE-Bench Verified, where it achieved a 22.8% improvement over its predecessor, highlights significant leaps in computational thinking and problem-solving. A human equivalent might possess an IQ of 140 to 160, reflecting strong logical and engineering capabilities. Similarly, its Codeforces rating of 2727 places it among the top competitive programmers, a level typically associated with IQs of 145 to 155. On AIME 2024, a highly challenging mathematics competition, o3’s score of 96.7% indicates deep mathematical intuition and precision, corresponding to IQs of 160 and above. Finally, on GPQA Diamond, its performance of 87.7% suggests an advanced level of verbal and fluid intelligence, comparable to an IQ of 150 or more.

If a human excelled at all these benchmarks simultaneously, their IQ would likely fall in the range of 170 to 180 or higher. This estimate places them well into the “profoundly gifted” category, capable of world-class performance across mathematics, logic, and general reasoning. However, this comparison requires significant caveats.

AI models like o3 are not bound by human cognitive constraints. They excel in narrow, well-defined tasks but lack the broader emotional, social, and creative dimensions of human intelligence. IQ tests, meanwhile, are designed for humans and measure a limited set of cognitive abilities, often shaped by cultural and educational biases. Applying these measures to AI is speculative and inherently flawed. Furthermore, while a human with an IQ of 170 or more might excel broadly, o3’s achievements are task-specific, built on vast training data and computation. Its results represent exponential improvements in AI design and computation, rather than the holistic development seen in humans with high IQ.

Despite these caveats, o3’s performance on these benchmarks has profound implications. AI tutors capable of solving advanced math problems could revolutionise education. Reliable AI collaborators could accelerate software development and scientific discovery. AI models surpassing human experts in reasoning could become indispensable tools in research and policymaking. But these advancements also come with challenges. How do we ensure alignment with human values? Can society adapt to rapid changes in labour markets driven by AI? These questions lie at the heart of what may indeed be the last revolution.

Leave a Reply