Accelerated Computing
FIELD GUIDE
§ 04
Architecture

Anatomy of an accelerator.

Hover any block. The CPU dispatches work over PCIe; the device's stream scheduler hands it to compute lanes that read and write to high-bandwidth memory. Cache and bandwidth are the levers that decide whether a workload sings or stalls.

Host CPUORCHESTRATES KERNELSPCIe / NVLinkBUSStream schedulerON-DEVICECompute lanes (SMs)~144 STREAMING MULTIPROCESSORSHBM3 memory~80 GB · 3 TB/SL2 cache50 MB