OpenAI announces o3 reasoning models
In short: On the final day of its ‘12 Days of OpenAI’, the company announced its most powerful reasoning model, o3.
What's happened: Following on from its o1 release, which only came out three months ago, OpenAI has previewed its newest and most powerful reasoning models, marking a further push towards creating smarter models that can tackle complex problems.
There are two new models: o3 and o3-mini, the latter being a smaller model which is fine-tuned for specific tasks. Here is the launch video:
While the consensus is that this model hasn’t achieved the aspirational “Artificial General Intelligence” (or AGI), it is seen as a significant step toward that milestone. The model has achieved a breakthrough high score on a notable AI reasoning test, the ARC Challenge, and exceeded previous scores for other programming and mathematics-related benchmarks. Specifically:
Exceptional Coding Performance: o3 surpasses o1 by 22.8 percentage points on SWE-Bench Verified and achieves a Codeforces rating of 2727, outperforming OpenAI’s Chief Scientist’s score of 2665.
Math and Science Mastery: o3 scores 96.7% on the AIME 2024 exam, missing only one question, and achieves 87.7% on GPQA Diamond, far exceeding human expert performance.
Frontier Benchmarks: The model sets new records on challenging tests like EpochAI’s Frontier Math, solving 25.2% of problems where no other model exceeds 2%. On the ARC-AGI test, o3 triples o1’s score and surpasses 85% (as verified live by the ARC Prize team), representing a milestone in conceptual reasoning.
The model hasn’t been released to end users; instead, OpenAI has invited external researches to apply for testing. The company has stated that the public release of o3-mini will be at the end of January 2025, with the full version to follow.
Why you should care: Despite concerns that the progression of these models may stall due to a lack of data (WSJ, $), the new battleground appears to be focused on reasoning - the ability of a prompt to “think” through alternative responses to a question, and present what it considers the most accurate response.
While this particular release doesn’t appear to attain AGI, and while it has not yet been made publicly available, commentators assert this is a meaningful step forward, signalling a further leap forward in AI performance, particularly in areas requiring advanced reasoning and problem-solving capabilities.
There is a performance aspect to the results it produces: “o3, like o1 before it, takes a little longer — usually seconds to minutes longer — to arrive at solutions compared to a typical non-reasoning model. The upside? It tends to be more reliable in domains such as physics, science, and mathematics.”
One consideration for reasoning models is cost: the amount of compute power required to work through options and provide the detailed summaries is expensive. According to The Decoder:
This thorough process explains why o3 needs so much computing power - it processes up to 33 million tokens for a single task.
This intensive token processing comes with significant costs compared to current AI systems. The high-efficiency version runs about $20 per task, which adds up quickly - $2,012 for 100 test tasks, or $6,677 for the full set of 400 public tasks (averaging about $17 per task).
The low-efficiency version demands even more resources - 172 times more computing power than the high-efficiency version. While OpenAI hasn't revealed the exact costs, testing shows this version processes between 33 and 111 million tokens and requires about 1.3 minutes of computing time per task.
Such costs may be prohibitive for everyday ChatGPT prompts. But for organisations running complex tasks, particularly in technical domains, the speed, accuracy, and performance of these models could deliver real ROI.
What will be interesting to see is if and how the o3-mini model is incorporated into the popular ChatGPT client and what limitations, if any, there may be around its use.
As we saw with the release of ChatGPT Pro - which was also part of the 12 Days of OpenAI - the company is exploring ways to package and price its technologies. According to the company, ChatGPT Pro “includes unlimited access to our smartest model, OpenAI o1, as well as to o1-mini, GPT-4o, and Advanced Voice. It also includes o1 pro mode, a version of o1 that uses more compute to think harder and provide even better answers to the hardest problems. In the future, we expect to add more powerful, compute-intensive productivity features to this plan.”
Expect to see more variety - from all market players - in the pricing and packaging of AI-related solutions. As we move beyond comparisons based on “feeds and speeds,” the push for monetisation and ROI will require these technologies to be offered in ways that drive greater adoption.
For more: Watch the YouTube launch video for more context around the Open o3 models, and stay tuned for an exciting 2025 as we get ever-closer to even smarter systems.
The year ahead: What will happen in 2025?
In short: 2025 promises to be a year of continued AI innovation, as models continue to advance, but it may also be one of growing tension as we grapple with the benefits and risks of AI as these solutions integrate into our daily lives.
What's happened: Two years after OpenAI released ChatGPT, 2024 was a year dominated by AI.
This year saw record amounts of capex invested by hyperscalers into their data centres and platforms, with NVIDIA benefiting from this euphoric expansion, driving market valuations to soar.
But while tens of billions of dollars were spent by Microsoft, Google and AWS, adoption of AI has yet to become widespread.
While ChatGPT now has 300 million weekly users, making it one of the fastest growing products in history, the degree to which this is based on curiosity versus genuine value remains unclear.
This also applies to other players: while reports indicate Microsoft’s Copilot has had mixed success, Google’s Gemini is still a new entrant, and Apple Intelligence has not yet impressed.
Why you should care: Most observers believe we are still in the early stages of what AI can and will deliver, and 2025 will be an important year, a year where there will be further innovation, but the drumbeat around value will start to beat louder.
Foundation models will continue to evolve, and 2025 should see the release of GPT-5, which purportedly will deliver another tangible step forward towards AGI. Whether constraints around available data will impact future models remains unclear, which may require innovations at an algorithmic level, as opposed to throwing more compute and data at the challenge.
Expect to see OpenAI, Microsoft, AWS, Anthropic, and Google (in no particular order) continue to push boundaries - they all must, to remain competitive in the platform battleground of the future. Apple is the laggard here, currently dependent on third parties for broad AI support (it does use AI within many of its applications and services), although its distribution reach through its various platforms still means it is an important player. All in all, the big should get bigger in 2025.
We shall also see more talk about agents, which had greater prominence in the second half of 2024. Salesforce has been particularly bullish here with its Agentforce solutions, pushing hard to build momentum given that other generative AI platforms present a risk of disintermediation for the CRM company.
Whereas many of these larger players are having to “retrofit” AI into their existing product ranges, we can expect more “AI first” applications which have been deliberately designed to incorporate AI from the outset, and may also provide new ways to interact with applications and drive outcomes. Voice interaction will also play an increased role.
Agents and increased productisation of AI may result in new business models, and, potentially, higher licensing costs. Those investing in the models and platforms will need to get a return, so we need to get smarter at understanding what value means, and how to best calculate it.
And if you are a service provider or reseller of AI solutions, also pay attention: As these technologies get smarter, the role of implementation partners and resellers will continue to evolve.
While many implementation partners have customer relationships today, the value they add will need to evolve to align with the advancing capabilities of the products and services they sell.
Similarly, if you are simply reselling software that effectively implements and operates itself, what value are you adding - and will the margins you may previously have enjoyed be reduced or taken away?
For organisations watching from the sidelines, the time to start experimenting is now. AI developments show no sign of slowing down, and organisations of all kinds, large and small, in all industries, need to contemplate and act on the opportunities and threats that AI solutions may pose.
The world won’t stop and wait for you; you need to embrace AI now or risk being left behind.
For more: For now, take a breather: 2024 was a busy year, and 2025 shows no signs of slowing down.
Links
Google unveiled its Gemini 2.0 Flash Thinking model which, like o3, is a reasoning model capable of tackling complex problems with both speed and transparency. The latter point is interesting: “Gemini 2.0 enables users to access its step-by-step reasoning through a dropdown menu, offering clearer, more transparent insight into how the model arrives at its conclusions.” (VentureBeat)
Thanks for reading this newsletter, and thank you for your support in 2024. This is the final issue of the year. Happy holidays, and see you in 2025.
If you have any feedback on how it could be improved, or what you would like to see, please reach out.
Tim