Alphabet (GOOGL) Unveils Dual TPU Architecture: 8th-Gen Chips Target Training and Inference

Key Highlights

Table of Contents

Alphabet introduces its eighth-generation tensor processing units: TPU 8t designed for training and TPU 8i optimized for inference tasks
The inference-focused TPU 8i achieves 80% superior performance-per-dollar compared to its predecessor, Ironwood
Broadcom partnered with Google on chip development, while Google DeepMind contributed to the design process
TPU 8t training systems can scale up to 9,600 chips with double the interchip bandwidth of previous generations
Google Cloud will make both processors available to enterprise customers before year-end

Alphabet revealed a pair of purpose-built AI processors on Wednesday, representing the first time the tech giant has divided its tensor processing unit architecture into dedicated chips for distinct workloads.

Google Cloud unveiled the latest generation of its tensor processing unit, or TPU, a homegrown chip that’s designed to make AI computing services faster and more efficient https://t.co/MkGU7h2SkT
— Bloomberg (@business) April 22, 2026

The eighth-generation lineup includes the TPU 8t for training artificial intelligence models and the TPU 8i engineered specifically for inference operations — deploying trained models in real-world applications. Broadcom collaborated with Google on both processors, extending a technical partnership spanning more than ten years.

Alphabet Inc., GOOGL

This represents a strategic departure from previous approaches. Earlier TPU iterations combined both capabilities within a unified chip architecture. Google attributes this change to the emergence of agentic AI systems — autonomous models that execute continuous decision-making cycles with minimal human oversight — which require more tailored silicon.

“With the rise of AI agents, we determined the community would benefit from chips individually specialized to the needs of training and serving,” said Amin Vahdat, Google’s SVP and chief technologist for AI and infrastructure.

The inference-oriented TPU 8i integrates 384 megabytes of SRAM memory per processor — a threefold increase over Ironwood’s capacity. Google indicates this expansion eliminates latency bottlenecks that emerge when numerous concurrent users query a model simultaneously, a phenomenon the company describes as the “waiting room” effect.

Inference Capabilities Receive Substantial Boost

The TPU 8i provides 80% improved performance-per-dollar versus Ironwood. In operational terms, organizations can accommodate approximately double the workload at equivalent expenditure levels.

The chip also achieves up to 2x enhanced performance-per-watt efficiency through integrated power management technology that dynamically adjusts energy consumption based on real-time demand.

Both processors now operate on Google’s Axion CPU host infrastructure for the first time, enabling system-wide optimization beyond individual chip-level improvements.

For training workloads, the TPU 8t superpod configuration supports up to 9,600 chips connected with 2 petabytes of high-bandwidth memory. This architecture delivers double Ironwood’s interchip bandwidth capacity, and Google claims it can compress frontier model development timelines from months down to weeks.

The training processor delivers 2.8x the computational performance of seventh-generation Ironwood at identical pricing.

Early Adopters and Enterprise Deployment

Usage continues expanding across diverse sectors. Citadel Securities developed quantitative research platforms using Google’s TPU infrastructure. Every one of the 17 U.S. Department of Energy national laboratories operates AI co-scientist applications on the processors. Anthropic has pledged to utilize multiple gigawatts of Google TPU computing capacity.

DA Davidson analysts projected in September that the combined valuation of Google’s TPU operations and DeepMind division could reach approximately $900 billion.

Google maintains an exclusive distribution model for TPUs — the processors are accessible solely through Google Cloud services rather than direct sales. Nvidia continues supplying GPUs to Google, and the company confirmed it will be among the initial cloud platforms offering Nvidia’s forthcoming Vera Rubin architecture later this year.

Google DeepMind participated directly in the chip design process and has deployed them for training Gemini language models while powering algorithms behind Search and YouTube services.

Google announced that both the TPU 8t and TPU 8i will reach general availability for cloud platform customers before the end of this year.

Alphabet (GOOGL) Unveils Dual TPU Architecture: 8th-Gen Chips Target Training and Inference

Key Highlights

Inference Capabilities Receive Substantial Boost

Early Adopters and Enterprise Deployment

Get 3 Free Stock Ebooks

Related Posts