An In-Network Parameter Aggregation using DPDK for Multi-GPU Deep Learning

Furukawa Masaki, Itsubo Tomoya, Matsutani Hiroki

doi:10.15803/ijnc.11.2_516

In distributed deep neural network using remote GPU nodes, communication occurs iteratively between remote nodes for gradient aggregation. This communication latency limits the benefit of distributed training with faster GPUs. In this paper, we therefore propose to offload the gradient aggregation to a DPDK (Data Plane Development Kit) based network switch between a host machine and remote GPUs. In this approach, the aggregation process is completed in the network using extra computation resources in the network switch and efficiently overlapped without increasing workload on remote nodes. The proposed DPDK-based switch supports reliable communication protocols for exchanging gradients data and can handle a part of MPI over TCP-based communication. We evaluate the proposed switch when GPUs and the host communicate with a standard IP communication over 40GbE, a PCI Express (PCIe) over 40Gbit Ethernet (40GbE) product and MPI communication over 10GbE, respectively. The evaluation results using a standard IP communication show that the aggregation is accelerated by 2.2-2.5x compared to the aggregation executed by a host machine. The results using the PCIe over 40GbE product show that the proposed switch outperforms the aggregation done by the host machine by 1.16x. The evaluations using MPI communication using Jetson Xaviers cluster show that the proposed switch provides up to 5.5-5.8x faster reduction operations than the conventional method.

An In-Network Parameter Aggregation using DPDK for Multi-GPU Deep Learning

抄録

収録刊行物

参考文献 (6)*注記

関連プロジェクト

詳細情報詳細情報について

書き出し

問題の指摘

An In-Network Parameter Aggregation using DPDK for Multi-GPU Deep Learning

抄録

収録刊行物

参考文献 (6)*注記

関連プロジェクト

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について