Markov Chains
Systems that hop between states with fixed probabilities — and where they settle in the long run.
A Markov chain is a system that moves between states with fixed transition probabilities, where the next state depends only on the current one — not on how you got there. This “memoryless” property (the Markov property) makes them simple to analyze and surprisingly powerful.
Below is a 3-state weather chain. Press Multiply by P to advance the distribution one step (); watch the bars settle onto the stationary distribution (dashed marks) no matter where you start. The diagram highlights a single random walker hopping between states — shuffle it for a fresh path.
Transition matrix
The chain is captured by a transition matrix , where is the probability of going from state to state . Each row sums to 1 (you must go somewhere): . If today’s distribution over states is a row vector , then tomorrow’s is — one matrix multiply advances time by one step, and steps is .
The stationary distribution
Apply over and over and many chains settle into a stationary distribution — one that no longer changes: . It tells you the long-run fraction of time spent in each state, regardless of where you started (for well-behaved chains). Finding is a left-eigenvector problem (eigenvalue 1), or you can just iterate until it stops moving — exactly what the visualizer does above.
Why they matter in CS
- PageRank models a random web surfer as a Markov chain; the stationary distribution is the page ranking.
- MCMC (Markov Chain Monte Carlo) samples hard distributions by building a chain whose stationary distribution is the target.
- MDPs (previous lesson) are Markov chains with actions and rewards added.
- They model queues, weather, text generation, and reliability.
Connection to the gridworld
In the RL lesson, fixing a policy turns the MDP into a plain Markov chain — the agent now hops between cells with set probabilities, and you could ask where it spends its time. Adding choice and reward on top is what makes it RL.
Takeaways
- A Markov chain hops between states with fixed probabilities; the future depends only on the present.
- The transition matrix advances a distribution by one step via .
- Many chains converge to a stationary distribution with — the basis of PageRank and MCMC.