Mathematical Foundations of Reinforcement Learning
Zhao, S. (2024). Mathematical foundations of reinforcement learning (1st ed.). Springer.
Read the following Chapter sections.
Chapter 3.1 – Motivating example: How to improve policies?
This chapter shows you why policy improvement matters and gets you thinking about smarter decision-making in real-world problems.
Chapter 3.2 – Optimal state values and optimal policies
Here, you’ll learn how the best decisions are tied to value functions—essential knowledge for building solid RL strategies.
Chapter 4.1 – Value iteration
This section teaches you a step-by-step method to find the best actions by repeatedly refining value estimates.
Chapter 4.2 – Policy iteration
You’ll explore a powerful technique that flips between evaluating and improving decisions—great for understanding how RL algorithms can become more efficient.
Chapter 4.3 – Truncated policy iteration
Learn how to speed things up by blending two key strategies—this hybrid approach helps you balance accuracy and performance.
Chapter 5.1 – Motivating example: Mean estimation
This chapter connects RL to basic stats, showing how sampling can help you estimate outcomes even when you don’t know the full picture.
Chapter 5.2 – MC Basic: The simplest MC-based algorithm
You’ll dive into Monte Carlo learning and see how to learn from complete experiences—perfect for when you can't rely on a model.
Chapter 5.3 – MC Exploring Starts
This one helps you understand how to ensure your learning covers enough ground—important for avoiding bias and blind spots.
Chapter 7.2 – TD learning of action values: Sarsa
Here, you'll get hands-on with Sarsa, an on-policy learning method that teaches you how to adjust actions based on what actually happens.
Chapter 7.3 – TD learning of action values: n-step Sarsa
You’ll learn how to take multiple steps ahead in your updates, which helps you balance learning speed and stability.
Chapter 7.4 – TD learning of optimal action values: Q-learning
This chapter gives you a tool to learn the best strategy—even when you're not following it—using Q-learning, a cornerstone of modern RL.
Chapter 7.5 – A unified viewpoint
Finally, you’ll see how everything fits together, helping you make sense of the big picture in RL learning methods.