RL 分类

Chapter 10 Actor-Critic Methods

In chapter 8, we introduce value approximation function, that is to replace tabular representations for state/action value with function. Similarly, in chapter 9 we use function to represent policy instead fo tabular and turn to policy-based methods. So in this chapter we combine both of them, representing both value and policy with function and incorporating both policy-based and value-based methods.

RyanLee_ljx...大约 10 分钟

Chapter 9 Policy Gradient Methods

In all previous chapters, all the methods are value-based methods. The difference between value-based and policy-based methods lies in their approach. Value-based methods generate policies implicitly and indirectly. The algorithm itself does not directly maintain a policy function. Instead, it solves for the value (state or action value) based on model-free or model-based methods. It greedily (or uses an epsilon-greedy approach) determines the action by maximizing the value function at each step (e.g., $\arg\max_a Q(s, a)$ ), thereby deriving the policy from the value. In contrast, policy-based methods directly represent the policy as a parameterized function $\pi(a|s, \theta)$ , where $\theta$ is a parameter vector (instead of previous tabular representation). The probability distribution of the policy is obtained by directly optimizing the parameter $\theta$ .

RyanLee_ljx...大约 8 分钟

Chapter 8 Value Function Methods

In this chapter we move from previous tabular representation for state/action value to function representation. That is to say, we use a function to fit the true expression of the state/action value function. Such a function can be predifined, e.g., a linear function, or a neural network.

The reason why we move from tabular-based representation to function-based are:

RyanLee_ljx...大约 10 分钟

Before reading

这篇博客及后续内容将主要介绍扩散模型的相关内容，包括一些基础知识，最终引入扩散模型，最终希望介绍Diffusion Policy在机械臂motion planning的应用。

RyanLee_ljx...小于 1 分钟

Chapter 7 Temporal-Difference learning

In this section we will first introduce TD learning, which refers a wide range of algorithms. It can solve Bellman equation of a given policy $\pi$ without model. We refer TD learning in the first chapter specifically as a classic algorithm for estimating state values. Then we will introduce other algorithms belonging to the wide range of TD learning in the next section.

RyanLee_ljx...大约 9 分钟

Chapter 6 Stochastic Approximation

Stochastic Approximation (SA) refers to a broad class of stochastic iterative algorithms solving root finding or optimization problems. Compared to many other root-finding algorithms such as gradient-based methods, SA is powerful in the sense that it does not require to know the expression of the objective function nor its derivative.

RyanLee_ljx...大约 3 分钟

Chapter 5 Monte Carlo Learning

This chapter we will introduce a model-free approach for deriving optimal policy.

Here, model-free refers that we do not rely on a specific mathematical model to obtain state value or action value. Like, in the policy evaluation, we use BOE to obtain state value, which is just model-based. For model-free, we do not use that equation anymore. Instead, we leverage the mean estimation methods.

RyanLee_ljx...大约 4 分钟

Chapter 4 Value Iteration and Policy Iteration

In the last chapter, we study the Bellman Optimality Equation. This chapter we will introduce three model-based, iterative algorithm —— value iteration, policy iteration and truncated policy iteration —— for solving the BOE to derive the optimal policy,

Value Iteration

RyanLee_ljx...大约 3 分钟

Chapter 3 Optimal Policy and Bellman Optimality Equation

We know that RL's ultimate goal is to find the optimal policy. In this chapter we will show how we obtain optimal policy through Bellman Optimality Equation.

Optimal Policy

The state value could be used to evaluate if a policy is good or not: if

v_{\pi_{1}}(s) \ge v_{\pi_{2}}(s), \ \ \forall s \in \mathcal S

RyanLee_ljx...大约 3 分钟

Chapter 2 Bellman Equation

This chapter we will introduce two key concepts and one important formula.

Revision

I recommand you reading the motivating examples in the tutorial. Here I will skip this part and directly introduce the concepts.

Before delving into the context, we need to do a revision about previous key concepts.

RyanLee_ljx...大约 5 分钟