RAdam:自适应学习率的方差及超越

上传者: 42168555 | 上传时间: 2022-09-26 17:47:33 | 文件大小: 650KB | 文件类型: ZIP
拉丹 自适应学习率的方差及超越 我们处于早期版本的Beta中。 期待一些冒险和艰难的边缘。 目录 介绍 如果热身是答案,那么问题是什么? Adam的学习速度预热是在某些情况下(或eps调整)进行稳定训练的必备技巧。 但是基本机制尚不清楚。 在我们的研究中,我们提出一个根本原因是自适应学习率的巨大差异,并提供理论和经验支持证据。 除了解释为什么要使用预热之外,我们还提出RAdam ,这是Adam的理论上合理的变体。 动机 如图1所示,我们假定梯度遵循正态分布(均值:\ mu,方差:1)。 模拟了自适应学习率的方差,并将其绘制在图1中(蓝色曲线)。 我们观察到,在训练的早期阶段,自适应学习率具有很大的差异。 将变压器用于NMT时,通常需要进行预热阶段以避免收敛问题(例如,图2中的Adam-vanilla收敛于500 PPL左右,而Adam-warmup成功收敛于10 PPL以下)。 在进

文件下载

资源详情

[{"title":"( 61 个子文件 650KB ) RAdam:自适应学习率的方差及超越","children":[{"title":"RAdam-master","children":[{"title":".travis.yml <span style='color:#111;'> 120B </span>","children":null,"spread":false},{"title":"img","children":[{"title":"variance.png <span style='color:#111;'> 225.84KB </span>","children":null,"spread":false}],"spread":true},{"title":"LICENSE <span style='color:#111;'> 11.08KB </span>","children":null,"spread":false},{"title":"radam","children":[{"title":"radam.py <span style='color:#111;'> 10.11KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 44B </span>","children":null,"spread":false}],"spread":true},{"title":"nmt","children":[{"title":"my_module","children":[{"title":"poly_schedule.py <span style='color:#111;'> 2.45KB </span>","children":null,"spread":false},{"title":"radam.py <span style='color:#111;'> 5.82KB </span>","children":null,"spread":false},{"title":"novograd.py <span style='color:#111;'> 4.91KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 126B </span>","children":null,"spread":false},{"title":"linear_schedule.py <span style='color:#111;'> 3.18KB </span>","children":null,"spread":false},{"title":"adam2.py <span style='color:#111;'> 6.48KB </span>","children":null,"spread":false}],"spread":true},{"title":"recipes.md <span style='color:#111;'> 3.96KB </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 244B </span>","children":null,"spread":false},{"title":"average_checkpoints.py <span style='color:#111;'> 5.28KB </span>","children":null,"spread":false},{"title":"eval.sh <span style='color:#111;'> 841B </span>","children":null,"spread":false}],"spread":true},{"title":"setup.py <span style='color:#111;'> 857B </span>","children":null,"spread":false},{"title":".gitignore <span style='color:#111;'> 3.96KB </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 10.80KB </span>","children":null,"spread":false},{"title":"language-model","children":[{"title":"pre_word_ada","children":[{"title":"gene_map.py <span style='color:#111;'> 1.01KB </span>","children":null,"spread":false},{"title":"encode_data2folder.py <span style='color:#111;'> 2.93KB </span>","children":null,"spread":false}],"spread":true},{"title":"recipes.md <span style='color:#111;'> 901B </span>","children":null,"spread":false},{"title":"eval_1bw.py <span style='color:#111;'> 3.65KB </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 305B </span>","children":null,"spread":false},{"title":"train_1bw.py <span style='color:#111;'> 7.52KB </span>","children":null,"spread":false},{"title":"model_word_ada","children":[{"title":"resnet.py <span style='color:#111;'> 1.51KB </span>","children":null,"spread":false},{"title":"densenet.py <span style='color:#111;'> 1.82KB </span>","children":null,"spread":false},{"title":"ldnet.py <span style='color:#111;'> 2.34KB </span>","children":null,"spread":false},{"title":"utils.py <span style='color:#111;'> 1.97KB </span>","children":null,"spread":false},{"title":"LM.py <span style='color:#111;'> 2.39KB </span>","children":null,"spread":false},{"title":"adaptive.py <span style='color:#111;'> 2.65KB </span>","children":null,"spread":false},{"title":"dataset.py <span style='color:#111;'> 5.00KB </span>","children":null,"spread":false},{"title":"radam.py <span style='color:#111;'> 5.20KB </span>","children":null,"spread":false},{"title":"bnlstm.py <span style='color:#111;'> 8.49KB </span>","children":null,"spread":false},{"title":"ddnet.py <span style='color:#111;'> 2.28KB </span>","children":null,"spread":false},{"title":"basic.py <span style='color:#111;'> 1.66KB </span>","children":null,"spread":false}],"spread":false}],"spread":true},{"title":"cifar_imagenet","children":[{"title":"models","children":[{"title":"cifar","children":[{"title":"resnet.py <span style='color:#111;'> 4.97KB </span>","children":null,"spread":false},{"title":"densenet.py <span style='color:#111;'> 4.61KB </span>","children":null,"spread":false},{"title":"resnext.py <span style='color:#111;'> 5.47KB </span>","children":null,"spread":false},{"title":"wrn.py <span style='color:#111;'> 3.80KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 2.20KB </span>","children":null,"spread":false},{"title":"alexnet.py <span style='color:#111;'> 1.33KB </span>","children":null,"spread":false},{"title":"vgg.py <span style='color:#111;'> 3.99KB </span>","children":null,"spread":false},{"title":"preresnet.py <span style='color:#111;'> 4.93KB </span>","children":null,"spread":false}],"spread":true},{"title":"imagenet","children":[{"title":"resnext.py <span style='color:#111;'> 5.56KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 63B </span>","children":null,"spread":false}],"spread":true},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false}],"spread":true},{"title":"cifar.py <span style='color:#111;'> 15.22KB </span>","children":null,"spread":false},{"title":"LICENSE <span style='color:#111;'> 1.04KB </span>","children":null,"spread":false},{"title":"imagenet.py <span style='color:#111;'> 14.06KB </span>","children":null,"spread":false},{"title":"recipes.md <span style='color:#111;'> 7.35KB </span>","children":null,"spread":false},{"title":".gitignore <span style='color:#111;'> 20B </span>","children":null,"spread":false},{"title":"fourstep.sh <span style='color:#111;'> 874B </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 331B </span>","children":null,"spread":false},{"title":"utils","children":[{"title":"eval.py <span style='color:#111;'> 523B </span>","children":null,"spread":false},{"title":"visualize.py <span style='color:#111;'> 3.71KB </span>","children":null,"spread":false},{"title":"images","children":[{"title":"cifar.png <span style='color:#111;'> 336.74KB </span>","children":null,"spread":false},{"title":"imagenet.png <span style='color:#111;'> 45.38KB </span>","children":null,"spread":false}],"spread":false},{"title":"logger.py <span style='color:#111;'> 4.34KB </span>","children":null,"spread":false},{"title":"misc.py <span style='color:#111;'> 2.17KB </span>","children":null,"spread":false},{"title":"radam.py <span style='color:#111;'> 9.98KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 242B </span>","children":null,"spread":false}],"spread":false}],"spread":true}],"spread":true}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明