Future TechnologyFuture Technology
AI

Nvidia's Rubin Platform Just Moved AI From Software Into Hardware

· 4 min read · By Future Technology

Key takeaways

  • Nvidia's Rubin platform bundles six new chips, including the Vera CPU, a next-gen NVLink interconnect, an upgraded Transformer Engine, and a confidential computing and RAS engine
  • Nvidia claims Rubin cuts the cost per token of running reasoning and mixture-of-experts models by up to 10x versus the current Blackwell platform
  • AWS, Google Cloud, Microsoft, and Oracle Cloud are lined up to offer Vera Rubin instances later this year
  • The launch landed the same week AMD showed its 320-billion-transistor MI455X and OpenAI confirmed its own custom chip, Alapeno, is in development

Nvidia has formally launched Rubin, its next AI hardware platform, and the headline number is a claimed 10x drop in the cost per token for running reasoning and mixture-of-experts models compared to the current Blackwell platform. If that holds up outside Nvidia's own benchmarks, it's the kind of shift that eventually shows up in what you pay for an AI subscription.

The future, in 3 minutes a day. The biggest tech story explained every morning, free. Get the briefing →

What's actually in the Rubin platform

Rubin isn't one chip, it's six. The centrepiece is Vera, Nvidia's new CPU built to sit alongside the Rubin GPU rather than rely on a generic server processor. Around it sit a next-generation NVLink interconnect for moving data between GPUs faster, an upgraded Transformer Engine tuned for the attention math behind large language models, and a new confidential computing and RAS (reliability, availability, serviceability) engine aimed at enterprise and government customers who need workloads isolated and auditable.

That last piece matters more than it sounds. As AI moves from chatbots into things like agentic workflows handling financial data or health records, the ability to prove a workload ran in isolation, and to catch hardware faults before they corrupt a training run, becomes a selling point in its own right, not an afterthought.

AWS, Google Cloud, Microsoft, and Oracle Cloud are all named as launch partners for Vera Rubin instances later this year, which means the platform will show up as a line item in cloud pricing rather than something only Nvidia's direct customers touch.

Why the timing isn't a coincidence

Rubin launched the same week AMD showed off its MI455X GPU, a 320-billion-transistor chip built with Samsung, and OpenAI confirmed its first custom AI chip, codenamed Alapeno, is now in development specifically to cut its dependence on Nvidia. Three companies, three different chip strategies, in the same seven days.

That's not a coincidence, it's a supply chain reacting to concentration risk. We covered Samsung's $648 billion bet on AI chip manufacturing last week, and AMD's MI455X is a direct product of that same Samsung partnership. Meanwhile OpenAI's custom Jalapeno chip plans and Qualcomm's acquisition of Tenstorrent are both bets that the industry's biggest AI buyers don't want to be entirely dependent on one supplier for the hardware their products run on.

Why it matters if you just use AI tools

You don't need to care about NVLink specs to feel the effect of this. Cost per token is the number that eventually decides how cheap or expensive AI subscriptions get, how generous free tiers are, and how much reasoning a model is allowed to do before a company caps it to control costs. A genuine 10x reduction, even partially realised once Rubin ships in cloud instances later this year, is the kind of change that trickles down into product decisions long before most users ever hear the word Rubin.

The more interesting story might be what happens if Nvidia's 10x claim doesn't survive contact with real-world workloads. AMD and OpenAI are both betting it won't, or at least that it won't matter enough to justify staying locked into one vendor. Whether that bet pays off is the thing worth watching for the rest of 2026.