使用NCCL进行多GPU深度学习训练,其中涉及多机多卡,单机多卡等技术。
Optimized inter-GPU communication for DL and HPC Optimized for all NVIDIA platforms, most OEMs and Cloud Scales to 100s of GPUs, targeting 10,000s in the near future.
Aims at covering all communication needs for multi-GPU computing. Only relies on CUDA. No dependency on MPI or any parallel environment.
1