强化学习算法:此存储库包含大多数基于pytorch实现的经典深度强化学习算法,包括-DQN,DDQN,Dualling Network,DDPG,SAC,A2C,PPO,TRPO。 (更多算法仍在进行中)-源码

上传者: 42117150 | 上传时间: 2021-08-29 18:54:48 | 文件大小: 3.92MB | 文件类型: ZIP
深度强化学习算法 该存储库将使用PyTorch实现经典的深度强化学习算法。 该存储库的目的是为人们提供清晰的代码,以供他们学习深度强化学习算法。 将来,将添加更多算法,并且还将保留现有代码。 当前实施 深度Q学习网络(DQN) 基本DQN 双Q网络 决斗网络架构 深度确定性策略梯度(DDPG) 优势演员评判(A2C) 信任区域策略梯度(TRPO) 近端政策优化(PPO) 使用克罗内克因素信任区域(ACKTR)的演员评论家 软演员评论(SAC) 更新信息 :triangular_flag: 2018年10月17日-在此更新中,大多数算法已得到改进,并添加了更多关于图的实验(DPPG除外)。 PPO现在支持atari游戏和mujoco-env 。 TRPO非常稳定,可以得到更好的结果! :triangular_flag: 2019-07-15-在此更新中,不再需要为openai基准安装。 我在rl__utils模块中集成了有用的功能。 DDPG也重新实现,并支持更多结果。 自述文件已被修改。 代码结构也有微小的调整。 :triangular_flag: 201

文件下载

资源详情

( 72 个子文件 3.92MB ) 强化学习算法:此存储库包含大多数基于pytorch实现的经典深度强化学习算法,包括-DQN,DDQN,Dualling Network,DDPG,SAC,A2C,PPO,TRPO。 (更多算法仍在进行中)-源码
reinforcement-learning-algorithms-master
figures
hopper.gif 1.79MB
06_sac.png 135.70KB
04_trpo.png 141.83KB
01_dqn.png 233.10KB
05_ppo.png 130.17KB
03_a2c.png 164.86KB
breakout.gif 451.78KB
logo.png 12.62KB
bipedal.gif 815.07KB
02_ddpg.png 135.88KB
rl_utils
mpi_utils
utils.py 1.39KB
__init__.py 0B
normalizer.py 2.71KB
running_filter
__init__.py 0B
running_filter.py 1.67KB
logger
bench.py 5.57KB
__init__.py 0B
logger.py 14.46KB
plot.py 3.87KB
experience_replay
experience_replay.py 1.35KB
__init__.py 0B
env_wrapper
create_env.py 2.19KB
atari_wrapper.py 10.09KB
multi_envs_wrapper.py 3.98KB
__init__.py 5.74KB
frame_stack.py 1.13KB
seeds
seeds.py 407B
rl_algorithms
ddpg
ddpg_agent.py 8.63KB
train.py 717B
arguments.py 2.06KB
utils.py 686B
models.py 950B
demo.py 1.48KB
README.md 354B
dqn_algos
train.py 589B
dqn_agent.py 5.51KB
arguments.py 2.19KB
utils.py 1.58KB
models.py 2.54KB
demo.py 1.11KB
README.md 437B
ppo
train.py 757B
arguments.py 2.22KB
utils.py 1.34KB
models.py 3.82KB
demo.py 2.58KB
README.md 754B
ppo_agent.py 10.88KB
a2c
a2c_agent.py 6.22KB
train.py 612B
arguments.py 1.84KB
utils.py 749B
models.py 1.91KB
demo.py 1.17KB
README.md 269B
sac
train.py 450B
arguments.py 3.01KB
utils.py 2.77KB
models.py 1.70KB
sac_agent.py 10.62KB
demo.py 1.40KB
README.md 268B
trpo
train.py 461B
arguments.py 1.49KB
utils.py 1.98KB
models.py 1.34KB
demo.py 1.35KB
README.md 261B
trpo_agent.py 9.08KB
setup.py 275B
README.md 5.96KB
.gitignore 1.25KB
[{"title":"( 72 个子文件 3.92MB ) 强化学习算法:此存储库包含大多数基于pytorch实现的经典深度强化学习算法,包括-DQN,DDQN,Dualling Network,DDPG,SAC,A2C,PPO,TRPO。 (更多算法仍在进行中)-源码","children":[{"title":"reinforcement-learning-algorithms-master","children":[{"title":"figures","children":[{"title":"hopper.gif <span style='color:#111;'> 1.79MB </span>","children":null,"spread":false},{"title":"06_sac.png <span style='color:#111;'> 135.70KB </span>","children":null,"spread":false},{"title":"04_trpo.png <span style='color:#111;'> 141.83KB </span>","children":null,"spread":false},{"title":"01_dqn.png <span style='color:#111;'> 233.10KB </span>","children":null,"spread":false},{"title":"05_ppo.png <span style='color:#111;'> 130.17KB </span>","children":null,"spread":false},{"title":"03_a2c.png <span style='color:#111;'> 164.86KB </span>","children":null,"spread":false},{"title":"breakout.gif <span style='color:#111;'> 451.78KB </span>","children":null,"spread":false},{"title":"logo.png <span style='color:#111;'> 12.62KB </span>","children":null,"spread":false},{"title":"bipedal.gif <span style='color:#111;'> 815.07KB </span>","children":null,"spread":false},{"title":"02_ddpg.png <span style='color:#111;'> 135.88KB </span>","children":null,"spread":false}],"spread":true},{"title":"rl_utils","children":[{"title":"mpi_utils","children":[{"title":"utils.py <span style='color:#111;'> 1.39KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"normalizer.py <span style='color:#111;'> 2.71KB </span>","children":null,"spread":false}],"spread":true},{"title":"running_filter","children":[{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"running_filter.py <span style='color:#111;'> 1.67KB </span>","children":null,"spread":false}],"spread":true},{"title":"logger","children":[{"title":"bench.py <span style='color:#111;'> 5.57KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"logger.py <span style='color:#111;'> 14.46KB </span>","children":null,"spread":false},{"title":"plot.py <span style='color:#111;'> 3.87KB </span>","children":null,"spread":false}],"spread":true},{"title":"experience_replay","children":[{"title":"experience_replay.py <span style='color:#111;'> 1.35KB </span>","children":null,"spread":false}],"spread":true},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"env_wrapper","children":[{"title":"create_env.py <span style='color:#111;'> 2.19KB </span>","children":null,"spread":false},{"title":"atari_wrapper.py <span style='color:#111;'> 10.09KB </span>","children":null,"spread":false},{"title":"multi_envs_wrapper.py <span style='color:#111;'> 3.98KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 5.74KB </span>","children":null,"spread":false},{"title":"frame_stack.py <span style='color:#111;'> 1.13KB </span>","children":null,"spread":false}],"spread":true},{"title":"seeds","children":[{"title":"seeds.py <span style='color:#111;'> 407B </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"rl_algorithms","children":[{"title":"ddpg","children":[{"title":"ddpg_agent.py <span style='color:#111;'> 8.63KB </span>","children":null,"spread":false},{"title":"train.py <span style='color:#111;'> 717B </span>","children":null,"spread":false},{"title":"arguments.py <span style='color:#111;'> 2.06KB </span>","children":null,"spread":false},{"title":"utils.py <span style='color:#111;'> 686B </span>","children":null,"spread":false},{"title":"models.py <span style='color:#111;'> 950B </span>","children":null,"spread":false},{"title":"demo.py <span style='color:#111;'> 1.48KB </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 354B </span>","children":null,"spread":false}],"spread":true},{"title":"dqn_algos","children":[{"title":"train.py <span style='color:#111;'> 589B </span>","children":null,"spread":false},{"title":"dqn_agent.py <span style='color:#111;'> 5.51KB </span>","children":null,"spread":false},{"title":"arguments.py <span style='color:#111;'> 2.19KB </span>","children":null,"spread":false},{"title":"utils.py <span style='color:#111;'> 1.58KB </span>","children":null,"spread":false},{"title":"models.py <span style='color:#111;'> 2.54KB </span>","children":null,"spread":false},{"title":"demo.py <span style='color:#111;'> 1.11KB </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 437B </span>","children":null,"spread":false}],"spread":true},{"title":"ppo","children":[{"title":"train.py <span style='color:#111;'> 757B </span>","children":null,"spread":false},{"title":"arguments.py <span style='color:#111;'> 2.22KB </span>","children":null,"spread":false},{"title":"utils.py <span style='color:#111;'> 1.34KB </span>","children":null,"spread":false},{"title":"models.py <span style='color:#111;'> 3.82KB </span>","children":null,"spread":false},{"title":"demo.py <span style='color:#111;'> 2.58KB </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 754B </span>","children":null,"spread":false},{"title":"ppo_agent.py <span style='color:#111;'> 10.88KB </span>","children":null,"spread":false}],"spread":true},{"title":"a2c","children":[{"title":"a2c_agent.py <span style='color:#111;'> 6.22KB </span>","children":null,"spread":false},{"title":"train.py <span style='color:#111;'> 612B </span>","children":null,"spread":false},{"title":"arguments.py <span style='color:#111;'> 1.84KB </span>","children":null,"spread":false},{"title":"utils.py <span style='color:#111;'> 749B </span>","children":null,"spread":false},{"title":"models.py <span style='color:#111;'> 1.91KB </span>","children":null,"spread":false},{"title":"demo.py <span style='color:#111;'> 1.17KB </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 269B </span>","children":null,"spread":false}],"spread":true},{"title":"sac","children":[{"title":"train.py <span style='color:#111;'> 450B </span>","children":null,"spread":false},{"title":"arguments.py <span style='color:#111;'> 3.01KB </span>","children":null,"spread":false},{"title":"utils.py <span style='color:#111;'> 2.77KB </span>","children":null,"spread":false},{"title":"models.py <span style='color:#111;'> 1.70KB </span>","children":null,"spread":false},{"title":"sac_agent.py <span style='color:#111;'> 10.62KB </span>","children":null,"spread":false},{"title":"demo.py <span style='color:#111;'> 1.40KB </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 268B </span>","children":null,"spread":false}],"spread":true},{"title":"trpo","children":[{"title":"train.py <span style='color:#111;'> 461B </span>","children":null,"spread":false},{"title":"arguments.py <span style='color:#111;'> 1.49KB </span>","children":null,"spread":false},{"title":"utils.py <span style='color:#111;'> 1.98KB </span>","children":null,"spread":false},{"title":"models.py <span style='color:#111;'> 1.34KB </span>","children":null,"spread":false},{"title":"demo.py <span style='color:#111;'> 1.35KB </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 261B </span>","children":null,"spread":false},{"title":"trpo_agent.py <span style='color:#111;'> 9.08KB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"setup.py <span style='color:#111;'> 275B </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 5.96KB </span>","children":null,"spread":false},{"title":".gitignore <span style='color:#111;'> 1.25KB </span>","children":null,"spread":false}],"spread":true}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明
服务器状态检查中...