Glossary — accelerated computing terms

§ 10

Glossary

The terms.

The vocabulary an engineer uses to talk about accelerated systems, kept short and operational.

CUDA

NVIDIA's parallel programming model and toolchain. The historical reason GPUs broke into general computing.

FLOPS

Floating-point operations per second. The base unit of throughput on numeric workloads.

HBM

High-Bandwidth Memory. DRAM stacks bonded next to the accelerator die for very high bandwidth.

Kernel

A unit of code launched on the accelerator. Thousands of threads run the same kernel concurrently.

MFU

Model FLOPs Utilisation. The fraction of peak compute a training job actually achieves; 30–60% is typical.

PCIe

The expansion bus that connects accelerators to the host CPU. Latency here often gates small workloads.

Quantisation

Reducing numerical precision (FP32 → INT8, FP4) to fit larger models in memory and run them faster.

Streaming Multiprocessor. The basic compute unit of a GPU; modern accelerators carry over a hundred.

Sparsity

Skipping computation on zero-valued weights. A 2:4 pattern can roughly double effective throughput.

Systolic array

A grid of processing elements that pumps data rhythmically through neighbours. The heart of TPUs.

Tensor core

A hardware unit that performs a small matrix multiply in one cycle. Backbone of modern training.

Throughput

Work completed per unit time. Accelerators optimise this; CPUs optimise latency.