DistributedDeepLearning:关于在Batch AI上运行分布式深度学习的教程-源码

上传者: 42127748 | 上传时间: 2021-02-01 14:36:09 | 文件大小: 437KB | 文件类型: ZIP
培训关于批处理AI的分布式培训 此仓库是有关如何使用Batch AI以分布式方式训练CNN模型的教程。 涵盖的场景是图像分类,但是该解决方案可以推广到其他深度学习场景,例如分段和对象检测。 图像分类是计算机视觉应用中的常见任务,通常通过训练卷积神经网络(CNN)来解决。 对于具有大型数据集的大型模型,单个GPU的训练过程可能需要数周或数月。 在某些情况下,模型太大,以致于无法在GPU上放置合理的批处理大小。 在这些情况下使用分布式培训有助于缩短培训时间。 在此特定方案中,使用Horovod在ImageNet数据集以及合成数据上训练ResNet50 CNN模型。 本教程演示了如何使用三个最受欢迎的深度学习框架来完成此任务:TensorFlow,Keras和PyTorch。 有许多方法可以以分布式方式训练深度学习模型,包括数据同步和基于同步和异步更新的模型并行方法。 当前,最常见的场景是与同步更新并行的数据-这是最容易实现的,并且对于大多数用例而言已经足够。 在具有同步更新的数据并行分布式训练中,该模型在N个硬件设备之间复制,并且一小批训练样本被划分为N个微批次(参见图2)。 每个设备都

文件下载

资源详情

[{"title":"( 31 个子文件 437KB ) DistributedDeepLearning:关于在Batch AI上运行分布式深度学习的教程-源码","children":[{"title":"DistributedDeepLearning-master","children":[{"title":"HorovodTF","children":[{"title":"01_TrainTensorflowModel.ipynb <span style='color:#111;'> 12.32KB </span>","children":null,"spread":false},{"title":"src","children":[{"title":"imagenet_estimator_tf_horovod.py <span style='color:#111;'> 13.40KB </span>","children":null,"spread":false},{"title":"resnet_model.py <span style='color:#111;'> 13.18KB </span>","children":null,"spread":false}],"spread":true},{"title":"Docker","children":[{"title":"Dockerfile <span style='color:#111;'> 2.26KB </span>","children":null,"spread":false}],"spread":true},{"title":"00_CreateImageAndTest.ipynb <span style='color:#111;'> 5.60KB </span>","children":null,"spread":false}],"spread":true},{"title":".gitignore <span style='color:#111;'> 1.17KB </span>","children":null,"spread":false},{"title":"images","children":[{"title":"dist_training_diag2.png <span style='color:#111;'> 65.44KB </span>","children":null,"spread":false}],"spread":true},{"title":"00_DataProcessing.ipynb <span style='color:#111;'> 4.20KB </span>","children":null,"spread":false},{"title":"Makefile <span style='color:#111;'> 1.18KB </span>","children":null,"spread":false},{"title":"HorovodKeras","children":[{"title":"src","children":[{"title":"imagenet_keras_horovod.py <span style='color:#111;'> 11.71KB </span>","children":null,"spread":false},{"title":"data_generator.py <span style='color:#111;'> 1.80KB </span>","children":null,"spread":false}],"spread":true},{"title":"01_TrainKerasModel.ipynb <span style='color:#111;'> 12.28KB </span>","children":null,"spread":false},{"title":"Docker","children":[{"title":"Dockerfile <span style='color:#111;'> 2.40KB </span>","children":null,"spread":false}],"spread":true},{"title":"00_CreateImageAndTest.ipynb <span style='color:#111;'> 5.58KB </span>","children":null,"spread":false}],"spread":true},{"title":"LICENSE <span style='color:#111;'> 1.13KB </span>","children":null,"spread":false},{"title":"HorovodPytorch","children":[{"title":"src","children":[{"title":"imagenet_pytorch_horovod.py <span style='color:#111;'> 10.54KB </span>","children":null,"spread":false}],"spread":true},{"title":"01_TrainPyTorchModel.ipynb <span style='color:#111;'> 12.22KB </span>","children":null,"spread":false},{"title":"Docker","children":[{"title":"Dockerfile <span style='color:#111;'> 2.99KB </span>","children":null,"spread":false}],"spread":true},{"title":"cluster_config","children":[{"title":"nodeprep.sh <span style='color:#111;'> 159B </span>","children":null,"spread":false},{"title":"docker.service <span style='color:#111;'> 1.23KB </span>","children":null,"spread":false},{"title":"cluster.json <span style='color:#111;'> 295B </span>","children":null,"spread":false}],"spread":true},{"title":"00_CreateImageAndTest.ipynb <span style='color:#111;'> 5.59KB </span>","children":null,"spread":false}],"spread":true},{"title":"Docker","children":[{"title":"dockerfile <span style='color:#111;'> 2.16KB </span>","children":null,"spread":false},{"title":"environment.yml <span style='color:#111;'> 269B </span>","children":null,"spread":false},{"title":"jupyter_notebook_config.py <span style='color:#111;'> 166B </span>","children":null,"spread":false}],"spread":true},{"title":"01_CreateResources.ipynb <span style='color:#111;'> 17.28KB </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 4.94KB </span>","children":null,"spread":false},{"title":"include","children":[{"title":"build.mk <span style='color:#111;'> 325B </span>","children":null,"spread":false}],"spread":true},{"title":"common","children":[{"title":"timer.py <span style='color:#111;'> 2.93KB </span>","children":null,"spread":false},{"title":"utils.py <span style='color:#111;'> 871B </span>","children":null,"spread":false}],"spread":true},{"title":"valprep.sh <span style='color:#111;'> 2.12MB </span>","children":null,"spread":false}],"spread":false}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明