OpenAI and Broadcom just unveiled Jalapeno, their first custom AI chip
Key takeaways
- OpenAI and Broadcom unveiled Jalapeno on June 24, a custom inference chip built in just nine months, which Broadcom says may be the fastest ASIC cycle in high-performance semiconductor history
- The chip is a reticle-sized ASIC built for LLM inference, not training, and delivers substantially better performance per watt than current hardware
- OpenAI used its own AI models to help accelerate chip design, making this a real example of AI designing AI infrastructure
- Prototype deployment starts in late 2026, scaling through 2027, with full capacity targeted for H1 2028
Nine months. That is how long it took OpenAI and Broadcom to go from initial design to tape-out on Jalapeno, the first custom inference chip OpenAI has built from scratch. Broadcom says that may be the fastest development cycle ever achieved for a high-performance semiconductor. For context, custom chip projects of this complexity typically take three to five years.
Jalapeno is a reticle-sized ASIC, meaning it uses the maximum die size that a lithography machine can physically expose in a single shot. The chip is designed specifically for LLM inference: the part of AI that responds when you type a prompt, not the multi-month training runs that build the model in the first place. Inference and training have different hardware demands, and a chip optimised purely for inference can be substantially more efficient than a general-purpose GPU running the same workload.
What makes Jalapeno different
General-purpose GPUs like Nvidia's H100 are extremely good at the kind of parallelised matrix multiplication that AI workloads need. They are also expensive to buy and expensive to run, partly because they are not purpose-built for inference. A reticle-sized ASIC built around a single task can reduce the memory bandwidth constraints, improve throughput per watt, and drop the cost per token generated.
Early testing suggests Jalapeno does deliver substantially better performance per watt than current best-in-class hardware, though OpenAI has not published specific benchmark figures. Prototype deployment starts in late 2026, scaling up through 2027 and reaching full production capacity in the first half of 2028.
The detail that elevates this above a standard chip launch: OpenAI used its own AI models to accelerate parts of the chip design and optimisation process. That is not a marketing line. AI-assisted chip design is an active research area, and for OpenAI to use its own models on its own hardware design closes a loop that has been purely theoretical for most companies.
Why every major AI lab is building its own silicon
Google has its TPUs, now in their sixth generation. Apple builds Neural Engines into every device it ships. Meta has MTIA, its in-house inference accelerator. Amazon runs its Trainium chips in AWS. Now OpenAI joins that list.
The reason is margin. Nvidia's GPUs are the dominant supply constrained resource in AI. Every inference query run on Nvidia silicon costs money that flows to Nvidia rather than back into the lab. At OpenAI's scale, running ChatGPT for hundreds of millions of users daily, that cost is significant. A chip that delivers the same throughput at lower operating cost changes the economics of the whole business.
There is also a strategic argument that goes beyond cost. Owning your own silicon means you can optimise the hardware for your exact model architecture rather than working around a general-purpose design. As AI models get more specialised, the advantage of a chip tuned specifically for your workload compounds.
What this means for Nvidia
The narrative of Nvidia being displaced by custom silicon is getting ahead of the reality. Nvidia's H100 and H200 GPUs are still the default training hardware for virtually every frontier lab. Custom inference chips like Jalapeno replace Nvidia at the inference layer, not at training. For now, those are different markets with different demands.
But the direction of travel is clear. If inference economics matter, and at ChatGPT scale they absolutely do, then the long-term share of AI compute running on Nvidia silicon will fall as more labs build and scale custom chips. South Korea's $880 billion semiconductor investment plan, anchored by Samsung and SK Hynix, is partly a bet on exactly that transition in the memory market.
Prototype deployment this year. Full capacity in 2028. Keep an eye on the first real-world performance numbers when they arrive.