Computer Architecture
The memory hierarchy, CPU caches, and why data locality is the secret to high-performance code.
To write truly fast code, you must understand that not all memory is created equal. While we often think of “RAM” as a single pool of storage, the physical reality is a layered hierarchy designed to hide the massive speed gap between the CPU and your main memory.
Watch the CPU request a stream of addresses against a small cache below. Each access is a hit or a miss; when the cache fills, the least-recently-used line is evicted. The bars on the right show how much slower each level of the hierarchy is than the last.
A miss to RAM costs roughly 50x an L1 hit; a disk miss is millions of cycles. Keeping the working set in cache is what makes code fast.
The Memory Hierarchy
The CPU is incredibly fast, but fetching data from main RAM is relatively slow (it takes hundreds of clock cycles). To compensate, CPUs use small, lightning-fast on-chip memories called caches.
| Level | Size | Latency (approx) | Location |
|---|---|---|---|
| Registers | < 1 KB | 1 cycle | Inside CPU core |
| L1 Cache | 64 KB | 4 cycles | Inside CPU core |
| L2 Cache | 256 KB | 12 cycles | Shared per core |
| L3 Cache | 16 MB | 40 cycles | Shared across all cores |
| Main RAM | 16 GB | 200+ cycles | Separate sticks |
| SSD/Disk | 1 TB | Millions of cycles | PCIe / SATA |
Two Types of Locality
The reason caches work so well is because programs are predictable.
- Spatial Locality: If you access address , you’ll probably access soon. Caches exploit this by fetching data in Cache Lines (usually 64 bytes). This is why arrays (contiguous) are much faster than linked lists (scattered).
- Temporal Locality: If you access address now, you’ll probably access it again soon (e.g., in a loop). Caches keep recently used data close to the CPU.
The Performance Secret
In modern computing, memory latency is often the biggest bottleneck. High-performance code is often “cache-aware”: it organizes data to stay in L1/L2 as much as possible, avoiding the “long walk” to main RAM. This is why “Data-Oriented Design” is so popular in game engines and high-performance databases.
Takeaways
- CPUs are much faster than RAM; caches (L1, L2, L3) bridge the gap.
- Spatial locality (arrays) and temporal locality (loops) are what make caches effective.
- Cache misses are the “silent killer” of performance.