Reinforcement Learning
Reinforcement learning, optimization, and the math behind it.
0/7 complete
- Reinforcement LearningAn agent learns by trial and reward — MDPs, value, policy, and value iteration on a gridworld.hard
- Markov ChainsSystems that hop between states with fixed probabilities — and where they settle in the long run.medium
- OptimizationFinding the best solution under constraints — convexity, gradients, and why it underlies ML.medium
- Q-LearningModel-free control — learn a Q-table from experience with the temporal-difference update and ε-greedy exploration.hard
- Multi-Armed BanditsThe purest exploration-vs-exploitation problem — ε-greedy, UCB, estimated action values, and regret.medium
- Temporal-Difference LearningLearn value estimates from incomplete episodes — Monte Carlo returns vs TD(0) bootstrapping and the TD error.medium
- Policy GradientsOptimize a parameterized policy directly — the REINFORCE idea, the log-likelihood trick, and how it contrasts with value-based RL.hard