OpenAI and Broadcom Unveil Jalapeno AI Chip for LLM Inference
OpenAI and Broadcom have introduced Jalapeno, OpenAI's first custom Intelligence Processor designed specifically for large language model (LLM) inference. The chip completed a nine-month design-to-tape-out cycle and was delivered to OpenAI's leadership. Deployment at gigawatt scale with data center partners across multiple generations is expected to begin by the end of 2026.
According to a press release by Broadcom, early lab tests demonstrate Jalapeno running machine learning workloads at production target frequency and power, with performance per watt substantially better than current state-of-the-art accelerators.
Multi-Generation Compute Platform
Broadcom stated that Jalapeno marks the start of a multi-generation compute platform built with OpenAI. Hock Tan, President and CEO of Broadcom, said, "Our collaboration with OpenAI represents a fundamental commitment to scaling the physical infrastructure required for the next decade of AI." Broadcom added, "This is just the beginning of a multi-generation roadmap. By co-developing our industry-leading silicon directly with OpenAI, we are enabling the deployment of gigawatt-scale data centres with Microsoft and other partners beginning in 2026."
Nine-Month Co-Development Cycle
The chip was co-developed from initial design to manufacturing tape-out in just nine months. Broadcom contributed silicon implementation expertise, board and rack integration, high-performance networking, and scalable production systems. The accelerator is described as a "blank-slate design for modern LLM inference, not a general-purpose accelerator adapted from earlier AI workloads," according to Broadcom.
Architecture and Performance
The architecture reduces data movement and balances compute, memory, and networking resources to achieve utilization much closer to theoretical peak performance. Broadcom's silicon implementation and networking technologies, including Tomahawk networking silicon, help bring the platform to large-scale production. Engineering samples are already running ML workloads in the lab at production target frequency and power, including GPT-5.3-Codex-Spark.
Broadcom emphasized that OpenAI designed the chip around its understanding of LLM fundamentals, kernels, serving systems, and product needs, while Broadcom and Celestica industrialized the platform. "Designed to be the best inference platform for LLMs," Broadcom said.
Latency and Use Cases
Jalapeno combines the power and throughput of today's leading AI accelerators with latency closer to the fastest specialized inference systems, making it suitable for interactive LLM products at scale across ChatGPT, Codex, the API, and future agentic products.
While OpenAI is still measuring final performance, Broadcom noted that early testing shows Jalapeno will deliver performance per watt substantially better than the current state-of-the-art. A detailed technical report on performance is expected in the coming months.
Fastest Development Cycles in Semiconductors
The companies said the custom ASIC program reflects one of the fastest development cycles ever in advanced semiconductors. Broadcom attributed the speed to deep software-hardware co-development with OpenAI's engineering teams and the use of OpenAI models to accelerate parts of the design and optimization process.
Jalapeno is the first step in a platform that combines OpenAI-designed accelerators with Broadcom silicon and connectivity technologies and Celestica's board, rack, and system expertise for initial deployment by the end of 2026.



