
Professor Seok Joon Kwon of Sungkyunkwan University believes (via Jukan Choi) that the recent Apple research paper which found fundamental reasoning limits of modern large reasoning models (LRMs) and large language models (LLMs) is flawed because Apple does not have enough high-performance hardware to test what high-end LRMs and LLMs are truly capable of. The professor argues that Apple does not have a large GPU-based cluster comparable to those operated by Google, Microsoft, or xAI, and says its own hardware is unsuitable for AI.
Better hardware needed
The recently released Apple's research paper claimed that contemporary AI LLMs and LRMs fail to make sound judgements as the complexity of problems in controlled puzzle environments they were tasked to solve increased, revealing their fundamental limitations and debunking the common belief that these models can think like a human being. The researchers observed that models performed much better on well-known puzzles than on unfamiliar ones, indicating that their success likely stemmed from exposure during training rather than from adaptable or transferable problem-solving ability.
However, the professor claims that the key conclusion of the Apple research — that the accuracy of Claude 3.7 Sonnet Thinking and DeepSeek-R1 LRMs dropped to zero regardless of the available compute resources when complexity increases beyond a certain point — is flawed.
"This directly contradicts observations from actual language model scaling laws," Seok Joon Kwon argues. "Hundreds of scaling-related studies to date have consistently shown that performance improves in a power-law manner as the number of parameters increases, and beyond a certain size, performance is observed to move towards saturation. At the very least, performance might reach saturation, but it does not decrease. […] This might be because Apple does not have a GPU-based AI data center large enough to test a parameter space big enough to confirm scaling trends. […] Verifying the scaling law is similar to verifying the scaling law of large language models, and for this, Apple's researchers should have tested combinations of training data, parameters, and computational load and shown the performance curve."
The release of Apple's paper preceded its annual WWDC conference, where Apple, as expected, did not reveal anything significant related to its AI effort, prompting criticism that it may be falling behind in the global race for AI. Seok Joon Kwon believes that such a coincidence is not accidental, and Apple's intention was to downplay the achievements of companies like Anthropic, Google, OpenAI, or xAI, as the company is clearly behind the market leaders.
Fundamental hardware limitations
When Apple introduced its Apple Intelligence initiative in 2024, it focused on on-device processing and relatively basic tasks. At WWDC, the company revealed no progress related to its own data center-grade AI, thereby again limiting Apple Intelligence to on-device processing with strict privacy and performance constraints. While this approach strengthens its position among privacy-conscious users, it means that the company lacks the ability to train LLMs and LRMs that require substantial compute and user data to function competitively. At the same time, Apple now allows Siri and other AI tools to call out to external large language models — first ChatGPT 4o, soon Gemini — when Siri cannot answer a query on its own. In this case, ChatGPT only receives content explicitly approved by the user. Apple obscures the user's IP and assures no personal account data is shared or retained by OpenAI.
Such a hybrid approach is not common for Apple, and Professor Seok Joon Kwon believes that it is a result of Apple's fundamental focus on its closed ecosystem that prevented it from developing the right data center-grade hardware required for training LRMs and LLMs. In the end, Apple's M-series processors are designed primarily for client PCs and therefore their GPUs do not support FP16 used for AI training, whereas their memory subsystems rely on LPDDR5 memory rather than on high-performance HBM3E. Also, Apple’s M-series CPUs do not natively support widely used machine learning frameworks like PyTorch, requiring cumbersome conversions
As a result, if Apple wants to catch up with its rivals, it must develop dedicated server-grade processors with advanced memory subsystems and sophisticated AI training and inference capabilities that will not fundamentally rely on the designs of Apple's GPUs and NPUs for its M-series system-on-chips for client PCs.
Follow Tom's Hardware on Google News to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.