cs.thefarshad
medium

Computer Architecture

The memory hierarchy, CPU caches, and why data locality is the secret to high-performance code.

To write truly fast code, you must understand that not all memory is created equal. While we often think of “RAM” as a single pool of storage, the physical reality is a layered hierarchy designed to hide the massive speed gap between the CPU and your main memory.

Watch the CPU request a stream of addresses against a small cache below. Each access is a hit or a miss; when the cache fills, the least-recently-used line is evicted. The bars on the right show how much slower each level of the hierarchy is than the last.

Access sequence:
10111210131411101513
CPUrequests
L1 cache (4 lines, LRU)
line 0empty
line 1empty
line 2empty
line 3empty
hits 0misses 0hit rate
Relative latency (approx cycles, log scale)
L1
4
L2
12
L3
40
RAM
200
Disk
1M+

A miss to RAM costs roughly 50x an L1 hit; a disk miss is millions of cycles. Keeping the working set in cache is what makes code fast.

1/12
Empty 4-line cache. The CPU will request each address in turn.

The Memory Hierarchy

The CPU is incredibly fast, but fetching data from main RAM is relatively slow (it takes hundreds of clock cycles). To compensate, CPUs use small, lightning-fast on-chip memories called caches.

LevelSizeLatency (approx)Location
Registers< 1 KB1 cycleInside CPU core
L1 Cache64 KB4 cyclesInside CPU core
L2 Cache256 KB12 cyclesShared per core
L3 Cache16 MB40 cyclesShared across all cores
Main RAM16 GB200+ cyclesSeparate sticks
SSD/Disk1 TBMillions of cyclesPCIe / SATA

Two Types of Locality

The reason caches work so well is because programs are predictable.

  1. Spatial Locality: If you access address XX, you’ll probably access X+1X+1 soon. Caches exploit this by fetching data in Cache Lines (usually 64 bytes). This is why arrays (contiguous) are much faster than linked lists (scattered).
  2. Temporal Locality: If you access address XX now, you’ll probably access it again soon (e.g., in a loop). Caches keep recently used data close to the CPU.

The Performance Secret

In modern computing, memory latency is often the biggest bottleneck. High-performance code is often “cache-aware”: it organizes data to stay in L1/L2 as much as possible, avoiding the “long walk” to main RAM. This is why “Data-Oriented Design” is so popular in game engines and high-performance databases.

Takeaways

  • CPUs are much faster than RAM; caches (L1, L2, L3) bridge the gap.
  • Spatial locality (arrays) and temporal locality (loops) are what make caches effective.
  • Cache misses are the “silent killer” of performance.