On 20 December 2024, OpenAI announced the release of its groundbreaking o3 model, a leap forward in artificial intelligence. The announcement included a slew of benchmark results that highlight its extraordinary capabilities. On ARC-AGI, o3 more than tripled the previous model’s score on low compute tasks and achieved a score of 87%. On EpochAI’s Frontier Math, o3 solved 25.2% of problems, a massive leap where no other model exceeded 2%. In SWE-Bench Verified, it outperformed its predecessor by 22.8 percentage points in software engineering benchmarks. On Codeforces, o3 achieved a rating of 2727, surpassing OpenAI’s Chief Scientist’s score of 2665. It also scored 96.7% on AIME 2024, missing only one question, and achieved 87.7% on GPQA Diamond, far above human expert performance.
Continue reading “How Intelligent Is ChatGPT o3?”