medium

Computer Architecture

The memory hierarchy, CPU caches, and why data locality is the secret to high-performance code.

To write truly fast code, you must understand that not all memory is created equal. While we often think of “RAM” as a single pool of storage, the physical reality is a layered hierarchy designed to hide the massive speed gap between the CPU and your main memory.

Watch the CPU request a stream of addresses against a small cache below. Each access is a hit or a miss; when the cache fills, the least-recently-used line is evicted. The bars on the right show how much slower each level of the hierarchy is than the last.

Access sequence:

10111210131411101513

CPUrequests —

L1 cache (4 lines, LRU)

line 0empty

line 1empty

line 2empty

line 3empty

hits 0misses 0hit rate —

Relative latency (approx cycles, log scale)

RAM

200

Disk

1M+

A miss to RAM costs roughly 50x an L1 hit; a disk miss is millions of cycles. Keeping the working set in cache is what makes code fast.

Speed

1/12

Empty 4-line cache. The CPU will request each address in turn.

The Memory Hierarchy

The CPU is incredibly fast, but fetching data from main RAM is relatively slow (it takes hundreds of clock cycles). To compensate, CPUs use small, lightning-fast on-chip memories called caches.

Level	Size	Latency (approx)	Location
Registers	< 1 KB	1 cycle	Inside CPU core
L1 Cache	64 KB	4 cycles	Inside CPU core
L2 Cache	256 KB	12 cycles	Shared per core
L3 Cache	16 MB	40 cycles	Shared across all cores
Main RAM	16 GB	200+ cycles	Separate sticks
SSD/Disk	1 TB	Millions of cycles	PCIe / SATA

Two Types of Locality

The reason caches work so well is because programs are predictable.

Spatial Locality: If you access address $X$ , you’ll probably access $X+1$ soon. Caches exploit this by fetching data in Cache Lines (usually 64 bytes). This is why arrays (contiguous) are much faster than linked lists (scattered).
Temporal Locality: If you access address $X$ now, you’ll probably access it again soon (e.g., in a loop). Caches keep recently used data close to the CPU.

The Performance Secret

In modern computing, memory latency is often the biggest bottleneck. High-performance code is often “cache-aware”: it organizes data to stay in L1/L2 as much as possible, avoiding the “long walk” to main RAM. This is why “Data-Oriented Design” is so popular in game engines and high-performance databases.

Takeaways

CPUs are much faster than RAM; caches (L1, L2, L3) bridge the gap.
Spatial locality (arrays) and temporal locality (loops) are what make caches effective.
Cache misses are the “silent killer” of performance.