Reinforcement Learning

Reinforcement learning, optimization, and the math behind it.

0/7 complete

Reinforcement LearningAn agent learns by trial and reward — MDPs, value, policy, and value iteration on a gridworld.hard
Markov ChainsSystems that hop between states with fixed probabilities — and where they settle in the long run.medium
OptimizationFinding the best solution under constraints — convexity, gradients, and why it underlies ML.medium
Q-LearningModel-free control — learn a Q-table from experience with the temporal-difference update and ε-greedy exploration.hard
Multi-Armed BanditsThe purest exploration-vs-exploitation problem — ε-greedy, UCB, estimated action values, and regret.medium
Temporal-Difference LearningLearn value estimates from incomplete episodes — Monte Carlo returns vs TD(0) bootstrapping and the TD error.medium
Policy GradientsOptimize a parameterized policy directly — the REINFORCE idea, the log-likelihood trick, and how it contrasts with value-based RL.hard