跳至主要內容

Chapter 7 Temporal-Difference learning

RyanLee_ljx...小于 1 分钟RL

Chapter 7 Temporal-Difference learning

TD learning refers a wide range of algorithms.

TD algorithm can solve Bellman equation of a given policy π\pi without model.

评论
  • 按正序
  • 按倒序
  • 按热度
Powered by Waline v3.1.3