cs.thefarshad
medium

Markov Chains

Systems that hop between states with fixed probabilities — and where they settle in the long run.

A Markov chain is a system that moves between states with fixed transition probabilities, where the next state depends only on the current one — not on how you got there. This “memoryless” property (the Markov property) makes them simple to analyze and surprisingly powerful.

Below is a 3-state weather chain. Press Multiply by P to advance the distribution one step (vk+1=vkPv_{k+1} = v_k\,P); watch the bars settle onto the stationary distribution π\pi (dashed marks) no matter where you start. The diagram highlights a single random walker hopping between states — shuffle it for a fresh path.

Transition diagram (walker in solid)
0.20.10.30.30.20.5self 0.7self 0.4self 0.3SunnyCloudyRainy
Distribution vₖ after 0 steps (dashed = stationary π)
Sunny1.00Cloudy0.00Rainy0.00
step 0/16
Walker now in Sunny · π ≈ [0.47, 0.33, 0.21] · distance to π = 0.660

Transition matrix

The chain is captured by a transition matrix PP, where PijP_{ij} is the probability of going from state ii to state jj. Each row sums to 1 (you must go somewhere): jPij=1\sum_j P_{ij} = 1. If today’s distribution over states is a row vector vv, then tomorrow’s is vPv\,P — one matrix multiply advances time by one step, and kk steps is vPkv\,P^k.

The stationary distribution

Apply PP over and over and many chains settle into a stationary distribution π\pi — one that no longer changes: πP=π\pi\,P = \pi. It tells you the long-run fraction of time spent in each state, regardless of where you started (for well-behaved chains). Finding π\pi is a left-eigenvector problem (eigenvalue 1), or you can just iterate vPv\,P until it stops moving — exactly what the visualizer does above.

Why they matter in CS

  • PageRank models a random web surfer as a Markov chain; the stationary distribution is the page ranking.
  • MCMC (Markov Chain Monte Carlo) samples hard distributions by building a chain whose stationary distribution is the target.
  • MDPs (previous lesson) are Markov chains with actions and rewards added.
  • They model queues, weather, text generation, and reliability.

Connection to the gridworld

In the RL lesson, fixing a policy turns the MDP into a plain Markov chain — the agent now hops between cells with set probabilities, and you could ask where it spends its time. Adding choice and reward on top is what makes it RL.

Takeaways

  • A Markov chain hops between states with fixed probabilities; the future depends only on the present.
  • The transition matrix PP advances a distribution by one step via vPv\,P.
  • Many chains converge to a stationary distribution π\pi with πP=π\pi\,P = \pi — the basis of PageRank and MCMC.