LibGuides: CSC457: Module 2

Module 2 Required Resources

Mathematical Foundations of Reinforcement Learning
Zhao, S. (2024). Mathematical foundations of reinforcement learning (1st ed.). Springer.

Read the following Chapter sections.
Chapter 3.1 – Motivating example: How to improve policies?
This chapter shows you why policy improvement matters and gets you thinking about smarter decision-making in real-world problems.

Chapter 3.2 – Optimal state values and optimal policies
Here, you’ll learn how the best decisions are tied to value functions—essential knowledge for building solid RL strategies.

Chapter 4.1 – Value iteration
This section teaches you a step-by-step method to find the best actions by repeatedly refining value estimates.

Chapter 4.2 – Policy iteration
You’ll explore a powerful technique that flips between evaluating and improving decisions—great for understanding how RL algorithms can become more efficient.

Chapter 4.3 – Truncated policy iteration
Learn how to speed things up by blending two key strategies—this hybrid approach helps you balance accuracy and performance.

Chapter 5.1 – Motivating example: Mean estimation
This chapter connects RL to basic stats, showing how sampling can help you estimate outcomes even when you don’t know the full picture.

Chapter 5.2 – MC Basic: The simplest MC-based algorithm
You’ll dive into Monte Carlo learning and see how to learn from complete experiences—perfect for when you can't rely on a model.

Chapter 5.3 – MC Exploring Starts
This one helps you understand how to ensure your learning covers enough ground—important for avoiding bias and blind spots.

Chapter 7.2 – TD learning of action values: Sarsa
Here, you'll get hands-on with Sarsa, an on-policy learning method that teaches you how to adjust actions based on what actually happens.

Chapter 7.3 – TD learning of action values: n-step Sarsa
You’ll learn how to take multiple steps ahead in your updates, which helps you balance learning speed and stability.

Chapter 7.4 – TD learning of optimal action values: Q-learning
This chapter gives you a tool to learn the best strategy—even when you're not following it—using Q-learning, a cornerstone of modern RL.

Chapter 7.5 – A unified viewpoint
Finally, you’ll see how everything fits together, helping you make sense of the big picture in RL learning methods.

View all video lectures in the order listed below.
These lectures are presented by Shiyu Zhao, author of the book Mathematical Foundations of Reinforcement Learning. The slides used in this module are based on content from this book.

1. Value Iteration

2. Policy Iteration

3. Monte Carlo 1

4. Monte Carlo 2

5. Monte Carlo 3

6. Epsilon Greedy1

7. Epison Greedy2

8. Temporal Difference1

9. Sarsa

10. Q-Learning

11. Q-Learning

Reinforcement Learning
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.

This widely-used textbook by Sutton and Barto introduces the core concepts and algorithms of reinforcement learning. The slides and lectures in this module are based on this book.