跳至主要內容
Ryan Lee

Ryan Lee

I'm just a boy trying to find a place in this world.

交通仿真
基于元胞自动机(Cellular Automata,CA),模拟交通流、行人流过程。
交通规划原理
四种增长系数法预测交通分布的GUI编写
路径规划
改进A*、Dijkstra、Floyd、0-1规划模型实现全局路径规划
机器学习
机器学习一些常见代码(BP、Pytorch实现ANN、MNIST手写数字识别等)
数学建模
常见数学建模算法整理
个人简介
个人简介
谈谈自己
Chapter 8 Value Function Methods

In this chapter we move from previous tabular representation for state/action value to function representation. That is to say, we use a function to fit the true expression of the state/action value function. Such a function can be predifined, e.g., a linear function, or a neural network.

The reason why we move from tabular-based representation to function-based are:


RyanLee_ljx...大约 9 分钟RL
现代机器人学

第 1 章:绪论 (Preliminary)

机器人的本质是由刚体 (Rigid Bodies) 组成的系统。

  • 连杆 (Links): 机器人系统中的刚体。
  • 关节 (Joints): 连接相邻连杆并允许其发生相对运动的部件。

第 2 章:位形空间 (Configuration Space)

2.1 基本概念

  • 位形 (Configuration): 指定机器人上每一个点的位置(Position)和姿态(Orientation)的一组参数。
  • 位形空间 (C-space): 所有可能位形的集合。
  • 自由度 (Degrees of Freedom, dof): C-space 的维度,即表示机器人位形所需的最小实数参数的个数。

RyanLee_ljx...大约 29 分钟robot
Preliminaries

对数据的认识

机器学习就是对一个未知分布的数据建模的过程。无论是机器学习哪种学派,其都认为观察到的数据并不是凭空产生的,而是由一个潜在的、客观存在的数据生成过程所产生。这个数据生成过程可以用一个概率分布来描述。

例如抛硬币,会出现正面或反面,我们抛了kk次,得到kk个数据。这个结果就可以看作是由一个伯努利分布生成(采样)的。


RyanLee_ljx...大约 11 分钟ML
变分推断与VAE

隐变量

举一个例子(源于【隐变量(潜在变量)模型】硬核介绍):

观察下图,表面上我们观测到的数据是一堆点 x={x1,x2,,xn}x = \{x_1, x_2, \dots, x_n\},但实际上我们可以直观地发现这些点以某种概率采样自四个不同的分布(假设都是高斯分布)。而潜在变量 ziz_i 控制了 xix_i 从哪个分布中采样:ziN(μk,σk2)z_i \sim N(\mu_k, \sigma_k^2),其中 k=1,2,3,4k = 1, 2, 3, 4。设 σk\sigma_k 已知。于是,潜在变量 ziz_i 表示观测变量 xix_i 对应类别的序号。


RyanLee_ljx...大约 6 分钟ML
Before reading

这篇博客及后续内容将主要介绍扩散模型的相关内容,包括一些基础知识,最终引入扩散模型,最终希望介绍Diffusion Policy在机械臂motion planning的应用。


RyanLee_ljx...小于 1 分钟RL
Chapter 7 Temporal-Difference learning

In this section we will first introduce TD learning, which refers a wide range of algorithms. It can solve Bellman equation of a given policy π\pi without model. We refer TD learning in the first chapter specifically as a classic algorithm for estimating state values. Then we will introduce other algorithms belonging to the wide range of TD learning in the next section.


RyanLee_ljx...大约 9 分钟RL
Chapter 6 Stochastic Approximation

Stochastic Approximation (SA) refers to a broad class of stochastic iterative algorithms solving root finding or optimization problems. Compared to many other root-finding algorithms such as gradient-based methods, SA is powerful in the sense that it does not require to know the expression of the objective function nor its derivative.


RyanLee_ljx...大约 3 分钟RL
Chapter 5 Monte Carlo Learning

This chapter we will introduce a model-free approach for deriving optimal policy.

Here, model-free refers that we do not rely on a specific mathematical model to obtain state value or action value. Like, in the policy evaluation, we use BOE to obtain state value, which is just model-based. For model-free, we do not use that equation anymore. Instead, we leverage the mean estimation methods.


RyanLee_ljx...大约 4 分钟RL
Chapter 4 Value Iteration and Policy Iteration

In the last chapter, we study the Bellman Optimality Equation. This chapter we will introduce three model-based, iterative algorithm —— value iteration, policy iteration and truncated policy iteration —— for solving the BOE to derive the optimal policy,

Value Iteration


RyanLee_ljx...大约 3 分钟RL