Google reveals TPU 8t and TPU 8i, splitting training and inference at 2nm

Google's eighth-generation TPUs ship as two chips: a training part that scales to 9,600 in a superpod, and an inference part with 3x the on-chip SRAM.

What's new

At Google Cloud Next 2026 in Las Vegas on April 22, Google announced its eighth-generation Tensor Processing Unit as two purpose-built chips. TPU 8t is the training part. It uses a new Inter-Chip Interconnect to scale up to 9,600 TPUs and 2 PB of shared high-bandwidth memory in a single superpod, delivering roughly 3x the peak compute of the prior-generation Ironwood and up to 2x better performance per watt. TPU 8i is the inference part. It connects 1,152 TPUs in a single pod through what Google calls the Boardfly topology, carries 3x more on-chip SRAM than the previous generation, and includes a dedicated Collectives Acceleration Engine. Google claims 80% better performance per dollar for inference relative to the prior generation. Both chips are co-designed with Google DeepMind and built on TSMC's 2 nm node.

Alongside the eighth-generation announcement, the seventh-generation Ironwood TPU reached general availability after a year in preview, and the Axion N4A virtual machine, built on Google's Arm Neoverse Axion CPU, also became generally available with a claimed 2x price-performance advantage over comparable x86 instances. Source: Google Cloud Blog, "Ironwood TPUs and new Axion-based VMs for your AI workloads."

Why it matters

Splitting training and inference into separate silicon is a meaningful architectural choice. The training part doubles down on raw compute and shared memory bandwidth across thousands of chips, the part of the workload that benefits from a larger coherent fabric. The inference part trades that for on-chip SRAM, a specialized collectives engine, and a tighter pod radius, which suit the smaller, latency-sensitive kernels that dominate agentic serving. With Axion now generally available, Google can ship a CPU plus accelerator plus interconnect plus storage stack end-to-end inside the AI Hypercomputer, which is the pattern NVIDIA and AMD are converging on at the rack level but only Google operates as a single cloud provider.

Caveats

Google's performance numbers are first-party; no independent reproduction has been published. Specific availability dates and pricing for TPU 8t and TPU 8i were not disclosed beyond "later this year." Boardfly topology is a Google-coined name, and the wire-level details have not been published. Ironwood's general availability has been associated publicly with Anthropic as the marquee customer, but Google has not disclosed broader allocation. The 2 nm node is stated, but yield, wafer pricing, and ramp curve are not public. Source: Google Cloud Blog, April 22, 2026.