This chapter we will introduce a model-free approach for deriving optimal policy.

Here, model-free refers that we do not rely on a specific mathematical model to obtain state value or action value. Like, in the policy evaluation, we use BOE to obtain state value, which is just model-based. For model-free, we do not use that equation anymore. Instead, we leverage the mean estimation methods.

RyanLee_ljx...大约 4 分钟

Value Iteration and Policy Iteration

In the last chapter, we study the Bellman Optimality Equation and introduce the iterative algorithm. This chapter we will introduce three model-based approach for deriving optimal policy. I recommand read the pdf tutorial by yourself. In this blog I will mainly focus on the difference between value iteration, policy iteration and truncated policy iteration.

RyanLee_ljx...大约 2 分钟

Chapter 3 Optimal Policy and Bellman Optimality Equation

We know that RL's ultimate goal is to find the optimal policy. In this chapter we will show how we obtain optimal policy through Bellman Optimality Equation.

Optimal Policy

The state value could be used to evaluate if a policy is good or not: if

v_{\pi_{1}}(s) \ge v_{\pi_{2}}(s), \ \ \forall s \in \mathcal S

RyanLee_ljx...大约 3 分钟

Chapter 2 Bellman Equation

This chapter we will introduce two key concepts and one important formula.

Revision

I recommand you reading the motivating examples in the tutorial. Here I will skip this part and directly introduce the concepts.

Before delving into the context, we need to do a revision about previous key concepts.

RyanLee_ljx...大约 4 分钟

Before reading

This blog is mainly a notebook of Mathematical Foundations of Reinforcement Learning by Shiyu Zhao from Westlake University WindyLab.

You can find more about the book and related tutorial videos at this link.

RyanLee_ljx...小于 1 分钟

Chapter 1 Basic Concepts of Reinforcement Learning

Reinforcement Learning (RL) can be described by the grid world example.

We place one agent in an environment, the goal of the agent is to find a good route to the target. Every cell/grid the agent placed can be seen as a state. Agent can take one action at each state according to a certain policy. The goal of RL is to find a good policy to guide the agent taking a sequence of acitons, travelling from the start place, moving from one state to another, and finally reach the target.

RyanLee_ljx...大约 3 分钟

Graph Neural Network

[1] A Gentle Introduction to Graph Neural Networks

[2] Graph neural networks: A review of methods and applications

RyanLee_ljx...大约 8 分钟

Attention Mechanism

This article will introduce a powerful technique in machine learning called Ateention Mechanism.

The core method of attention mechanism is to pay more attention to what we want. It allows model to weigh the importance of different parts of input dynamically rather than treating them equally. The model learns to assign higher weights to the most relevant elements.

RyanLee_ljx...大约 6 分钟

Log-Derivative Trick

RyanLee_ljx...小于 1 分钟

Control Variate

layout: Slide sidebar: false breadcrumb: false pageInfo: false

Introduction to Control Variate

Target

Reduce the variance of a random variable $X$ .

RyanLee_ljx...大约 1 分钟