TRPO-张量流 纯TensorFlow中的信任区域策略优化(TRPO)
1
文章链接:https://blog.csdn.net/shoppingend/article/details/124297444?spm=1001.2014.3001.5502
2022-04-21 17:06:44 4KB 算法
1
#6.2_DDPG_(Deep_Deterministic_Policy_Gradient)_(强化学习_Reinforceme
2021-09-01 21:00:29 44.8MB 学习资源
附件为policy gradient,actor critic相关的基础代码,可以跑的通,有助于对policy gradient,actor critic, advantage actor critic三种算法的认识和了解
2021-08-22 21:11:26 3KB policygradient actorcritic
1
强化学习中policy gradient 类型的经典算法有PG,TRPO,PPO,DPPO
2021-08-17 09:13:44 536KB 强化学习
1