Cloud AI has felt limitless for years. But according to a Financial Times report, Google told Meta back in March that it couldn't supply all the Gemini computing capacity Meta wanted to buy.
Meta had been paying for access to Google's models through cloud and API services, leaning on Gemini for internal jobs like content moderation and scam detection, where it outperformed Meta's own Llama models. When Google couldn't meet the full request, the shortfall reportedly delayed several of Meta's internal AI projects, and Meta told employees to ration their token usage more carefully.
Think about that for a second. A company with a nine-figure AI budget was told by its own cloud provider to use fewer tokens.
The 'yikes' factor here
Google Cloud pulled in roughly $20 billion in a single quarter, yet CEO Sundar Pichai has openly acknowledged that compute constraints are capping growth, and the division's order backlog has ballooned to more than $460 billion. The bottleneck isn't money or demand, as you might expect. Instead, it's the physical supply of chips, memory, and power.
Google is even paying SpaceX nearly a billion dollars a month to borrow GPU capacity as a stopgap.
So here's my honest read on the Meta news: it doesn't prove you personally need local AI. Meta's problem is an industrial-scale one, and its actual response was to build its own in-house model (Muse Spark) and pour well over $100 billion into its own data centers, not to switch to laptops. But the episode does prove something worth internalizing: cloud AI is not an infinite faucet, even for the best-capitalized companies on Earth.
The real reasons local AI matters
- Privacy. When a model runs locally, your prompts and data never leave the machine. For health information, financial details, legal drafts, or anything you'd rather not hand to a server, that's a meaningful difference, and in some regulated fields, increasingly a requirement.
- Speed for the small stuff. A cloud round-trip adds noticeable lag before you see a single word. For quick, repetitive tasks, an on-device model can start responding almost instantly.
- It works offline. On a plane, in a dead zone or during an outage, a local model keeps going. A cloud one doesn't.
- Predictable cost at volume. If you're running the same kind of task thousands of times, owning the hardware can be cheaper over time than paying per token forever.
Today's local models still can't match the biggest cloud systems in complex reasoning.
But for summarizing documents, rewriting text, drafting code and answering everyday questions, they're already good enough. And with the dedicated neural processing units (NPUs) now shipping in AI PCs, more of that work can happen right on your laptop.
The catch
Here's the catch: the same shortage that's squeezing Meta is also making local AI hardware more expensive, not less.
Cloud and local AI draw from the same well, including the same chips, high-bandwidth memory, and DRAM. As demand for AI has soared, manufacturers have shifted production toward data-center parts, and consumer prices have followed.
It's a big reason laptops, memory upgrades, and even game consoles have crept up in price this year. So while local AI is a real way to sidestep cloud rationing, you may pay for the privilege upfront — and that trade-off deserves to be part of the decision.
Frontier reasoning is the other honest caveat. If you need the smartest possible model for a genuinely hard problem, the cloud still wins, and it isn't close. Local AI is a complement to that, not a replacement for it.
Follow Amanda Caswell and stay ahead of the AI curve