iw-transfer-rl:论文《强化学习中样本的重要性加权转移》(ICML 2018)的代码

上传者: 42137028 | 上传时间: 2025-04-02 21:44:35 | 文件大小: 4.92MB | 文件类型: ZIP
强化学习中样本的重要性加权转移 此存储库包含我们的强化学习中的重要性加权样本转移》的代码,该代码已在ICML 2018上接受。我们提供了一个小库,用于RL中的样本转移(名为TRLIB),包括重要性加权拟合Q的实现-迭代(IWFQI)算法[1]以及有关如何重现本文提出的实验的说明。 抽象的 我们考虑了从一组源任务中收集的强化学习(RL)中经验样本(即元组)的转移,以改善给定目标任务中的学习过程。 大多数相关方法都专注于选择最相关的源样本来解决目标任务,但随后使用所有已转移的样本,而无需再考虑任务模型之间的差异。 在本文中,我们提出了一种基于模型的技术,该技术可以自动估计每个源样本的相关性(重要性权重)以解决目标任务。 在所提出的方法中,所有样本都通过批处理RL算法转移并用于解决目标任务,但它们对学习过程的贡献与它们的重要性权重成正比。 通过扩展监督学习文献中提供的重要性加

文件下载

资源详情

[{"title":"( 127 个子文件 4.92MB ) iw-transfer-rl:论文《强化学习中样本的重要性加权转移》(ICML 2018)的代码","children":[{"title":".gitignore <span style='color:#111;'> 1.22KB </span>","children":null,"spread":false},{"title":"fqi-long500.json <span style='color:#111;'> 293.31KB </span>","children":null,"spread":false},{"title":"wfqi-mean-neff.json <span style='color:#111;'> 183.19KB </span>","children":null,"spread":false},{"title":"fqi-long.json <span style='color:#111;'> 176.58KB </span>","children":null,"spread":false},{"title":"wfqi-ideal-s2.json <span style='color:#111;'> 135.85KB </span>","children":null,"spread":false},{"title":"wfqi-mean-s2.json <span style='color:#111;'> 135.74KB </span>","children":null,"spread":false},{"title":"wfqi-ideal.json <span style='color:#111;'> 135.70KB </span>","children":null,"spread":false},{"title":"wfqi-mean.json <span style='color:#111;'> 135.59KB </span>","children":null,"spread":false},{"title":"lazaric2008-neff.json <span style='color:#111;'> 129.11KB </span>","children":null,"spread":false},{"title":"lazaric2008.json <span style='color:#111;'> 116.94KB </span>","children":null,"spread":false},{"title":"lazaric2008-s2.json <span style='color:#111;'> 102.97KB </span>","children":null,"spread":false},{"title":"lazaric2008.json <span style='color:#111;'> 97.92KB </span>","children":null,"spread":false},{"title":"wfqi-ideal.json <span style='color:#111;'> 91.10KB </span>","children":null,"spread":false},{"title":"wfqi-mean.json <span style='color:#111;'> 90.92KB </span>","children":null,"spread":false},{"title":"fqi.json <span style='color:#111;'> 74.64KB </span>","children":null,"spread":false},{"title":"fqi.json <span style='color:#111;'> 59.56KB </span>","children":null,"spread":false},{"title":"wfqi-ideal.json <span style='color:#111;'> 44.68KB </span>","children":null,"spread":false},{"title":"wfqi-heuristic.json <span style='color:#111;'> 44.03KB </span>","children":null,"spread":false},{"title":"wfqi-mean.json <span style='color:#111;'> 44.03KB </span>","children":null,"spread":false},{"title":"wfqi-ideal.json <span style='color:#111;'> 38.99KB </span>","children":null,"spread":false},{"title":"wfqi-mean.json <span style='color:#111;'> 38.38KB </span>","children":null,"spread":false},{"title":"lazaric2008.json <span style='color:#111;'> 34.59KB </span>","children":null,"spread":false},{"title":"lazaric2008.json <span style='color:#111;'> 34.59KB </span>","children":null,"spread":false},{"title":"laroche2017.json <span style='color:#111;'> 34.37KB </span>","children":null,"spread":false},{"title":"fqi.json <span style='color:#111;'> 27.91KB </span>","children":null,"spread":false},{"title":"laroche2017.json <span style='color:#111;'> 25.65KB </span>","children":null,"spread":false},{"title":"fqi.json <span style='color:#111;'> 19.15KB </span>","children":null,"spread":false},{"title":"kernel_params <span style='color:#111;'> 604B </span>","children":null,"spread":false},{"title":"kernel_params <span style='color:#111;'> 483B </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 4.85KB </span>","children":null,"spread":false},{"title":"acrobot_ess.pdf <span style='color:#111;'> 32.57KB </span>","children":null,"spread":false},{"title":"figure.pdf <span style='color:#111;'> 20.41KB </span>","children":null,"spread":false},{"title":"dam.pdf <span style='color:#111;'> 19.76KB </span>","children":null,"spread":false},{"title":"acrobot.pdf <span style='color:#111;'> 18.51KB </span>","children":null,"spread":false},{"title":"ess.pdf <span style='color:#111;'> 17.81KB </span>","children":null,"spread":false},{"title":"puddleworld_sd.pdf <span style='color:#111;'> 17.75KB </span>","children":null,"spread":false},{"title":"puddleworld.pdf <span style='color:#111;'> 17.75KB </span>","children":null,"spread":false},{"title":"acrobot_s2.pdf <span style='color:#111;'> 17.58KB </span>","children":null,"spread":false},{"title":"acrobot_s2_steps.pdf <span style='color:#111;'> 16.93KB </span>","children":null,"spread":false},{"title":"acrobot_steps.pdf <span style='color:#111;'> 16.90KB </span>","children":null,"spread":false},{"title":"disc_rew.pdf <span style='color:#111;'> 16.48KB </span>","children":null,"spread":false},{"title":"disc_cost.pdf <span style='color:#111;'> 16.35KB </span>","children":null,"spread":false},{"title":"avg_rew.pdf <span style='color:#111;'> 16.20KB </span>","children":null,"spread":false},{"title":"avg_cost.pdf <span style='color:#111;'> 16.10KB </span>","children":null,"spread":false},{"title":"perf_greedy.pdf <span style='color:#111;'> 16.10KB </span>","children":null,"spread":false},{"title":"perf_greedy.pdf <span style='color:#111;'> 15.83KB </span>","children":null,"spread":false},{"title":"perf_greedy.pdf <span style='color:#111;'> 15.82KB </span>","children":null,"spread":false},{"title":"perf_greedy_s2.pdf <span style='color:#111;'> 15.38KB </span>","children":null,"spread":false},{"title":"disc_rew_all.pdf <span style='color:#111;'> 15.24KB </span>","children":null,"spread":false},{"title":"perf_greedy_s2_steps.pdf <span style='color:#111;'> 14.27KB </span>","children":null,"spread":false},{"title":"perf_greedy_steps.pdf <span style='color:#111;'> 14.24KB </span>","children":null,"spread":false},{"title":"dam_fqi.pdf <span style='color:#111;'> 14.08KB </span>","children":null,"spread":false},{"title":"fqi_disc_rew.pdf <span style='color:#111;'> 13.43KB </span>","children":null,"spread":false},{"title":"fqi_disc_cost.pdf <span style='color:#111;'> 12.98KB </span>","children":null,"spread":false},{"title":"fqi_avg_rew.pdf <span style='color:#111;'> 12.63KB </span>","children":null,"spread":false},{"title":"fqi_avg_cost.pdf <span style='color:#111;'> 12.41KB </span>","children":null,"spread":false},{"title":"source_data_5.pkl <span style='color:#111;'> 1012.87KB </span>","children":null,"spread":false},{"title":"source_data_6.pkl <span style='color:#111;'> 1012.87KB </span>","children":null,"spread":false},{"title":"source_data_4.pkl <span style='color:#111;'> 1012.87KB </span>","children":null,"spread":false},{"title":"source_data_3.pkl <span style='color:#111;'> 1012.87KB </span>","children":null,"spread":false},{"title":"source_data_1.pkl <span style='color:#111;'> 1012.87KB </span>","children":null,"spread":false},{"title":"source_data_2.pkl <span style='color:#111;'> 1012.87KB </span>","children":null,"spread":false},{"title":"source_data_2.pkl <span style='color:#111;'> 859.93KB </span>","children":null,"spread":false},{"title":"source_data_1.pkl <span style='color:#111;'> 584.92KB </span>","children":null,"spread":false},{"title":"source_data_2.pkl <span style='color:#111;'> 50.03KB </span>","children":null,"spread":false},{"title":"source_data_2.pkl <span style='color:#111;'> 46.64KB </span>","children":null,"spread":false},{"title":"source_data_3.pkl <span style='color:#111;'> 35.26KB </span>","children":null,"spread":false},{"title":"source_data_3.pkl <span style='color:#111;'> 34.39KB </span>","children":null,"spread":false},{"title":"source_data_1.pkl <span style='color:#111;'> 33.84KB </span>","children":null,"spread":false},{"title":"source_data_1.pkl <span style='color:#111;'> 32.20KB </span>","children":null,"spread":false},{"title":"acrobot_multitask.py <span style='color:#111;'> 10.00KB </span>","children":null,"spread":false},{"title":"wfqi.py <span style='color:#111;'> 9.81KB </span>","children":null,"spread":false},{"title":"dam.py <span style='color:#111;'> 7.25KB </span>","children":null,"spread":false},{"title":"test_policies.py <span style='color:#111;'> 7.20KB </span>","children":null,"spread":false},{"title":"qfunction.py <span style='color:#111;'> 6.59KB </span>","children":null,"spread":false},{"title":"lazaric2008.py <span style='color:#111;'> 6.50KB </span>","children":null,"spread":false},{"title":"wfqi_utils.py <span style='color:#111;'> 4.86KB </span>","children":null,"spread":false},{"title":"test_lazaric.py <span style='color:#111;'> 4.86KB </span>","children":null,"spread":false},{"title":"puddleworld.py <span style='color:#111;'> 4.43KB </span>","children":null,"spread":false},{"title":"test_evaluation.py <span style='color:#111;'> 4.13KB </span>","children":null,"spread":false},{"title":"callbacks.py <span style='color:#111;'> 4.01KB </span>","children":null,"spread":false},{"title":"run_wfqi_ideal.py <span style='color:#111;'> 3.71KB </span>","children":null,"spread":false},{"title":"run_wfqi_ideal.py <span style='color:#111;'> 3.66KB </span>","children":null,"spread":false},{"title":"run_algorithms.py <span style='color:#111;'> 3.59KB </span>","children":null,"spread":false},{"title":"run_algorithms.py <span style='color:#111;'> 3.59KB </span>","children":null,"spread":false},{"title":"test_qfunction.py <span style='color:#111;'> 3.29KB </span>","children":null,"spread":false},{"title":"run_wfqi.py <span style='color:#111;'> 3.16KB </span>","children":null,"spread":false},{"title":"run_algorithms.py <span style='color:#111;'> 3.15KB </span>","children":null,"spread":false},{"title":"valuebased.py <span style='color:#111;'> 3.12KB </span>","children":null,"spread":false},{"title":"run_wfqi.py <span style='color:#111;'> 3.10KB </span>","children":null,"spread":false},{"title":"run_wfqi.py <span style='color:#111;'> 2.93KB </span>","children":null,"spread":false},{"title":"visualization.py <span style='color:#111;'> 2.84KB </span>","children":null,"spread":false},{"title":"learn_source_policy.py <span style='color:#111;'> 2.77KB </span>","children":null,"spread":false},{"title":"learn_source_policy.py <span style='color:#111;'> 2.77KB </span>","children":null,"spread":false},{"title":"fqi.py <span style='color:#111;'> 2.76KB </span>","children":null,"spread":false},{"title":"parametric.py <span style='color:#111;'> 2.68KB </span>","children":null,"spread":false},{"title":"run_wfqi.py <span style='color:#111;'> 2.66KB </span>","children":null,"spread":false},{"title":"algorithm.py <span style='color:#111;'> 2.59KB </span>","children":null,"spread":false},{"title":"results.py <span style='color:#111;'> 2.50KB </span>","children":null,"spread":false},{"title":"run_algorithms.py <span style='color:#111;'> 2.50KB </span>","children":null,"spread":false},{"title":"......","children":null,"spread":false},{"title":"<span style='color:steelblue;'>文件过多,未全部展示</span>","children":null,"spread":false}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明