No. GPUs | PyTorch-DDP DALI | PyTorch-DDP native | ||||
---|---|---|---|---|---|---|
AllReduce [\(\%\)] (Communication) | data[\(\%\)] (I/O) | cuDNN[\(\%\)] (Computation) | AllReduce [\(\%\)] (Communication) | data[\(\%\)] (I/O) | cuDNN[\(\%\)] (Computation) | |
(a) Training of ResNet50 on ImageNet | ||||||
4 | 15.40 | 22.00 | 32.50 | 22.40 | 21.00 | 30.80 |
8 | 19.00 | 21.40 | 31.75 | 23.95 | 20.05 | 29.20 |
16 | 21.00 | 20.95 | 30.70 | 27.15 | 18.83 | 27.35 |
32 | 27.09 | 18.98 | 28.14 | 31.30 | 17.26 | 25.11 |
64 | 30.87 | 17.76 | 26.35 | 32.75 | 16.30 | 23.55 |
128 | 33.61 | 17.03 | 24.99 | 49.48 | 11.77 | 17.33 |
256 | 37.08 | 15.78 | 23.26 | 76.77 | 5.06 | 7.14 |
512 | 43.48 | 13.57 | 20.02 | 82.61 | 3.66 | 5.52 |
1024 | 46.18 | 11.56 | 17.31 | – | – | – |
(b) Training of ResNet101 on ImageNet | ||||||
4 | 13.30 | 23.00 | 46.00 | 28.65 | 22.50 | 38.12 |
8 | 20.55 | 21.25 | 41.45 | 30.15 | 18.28 | 35.52 |
16 | 24.08 | 20.37 | 39.65 | 35.67 | 16.76 | 32.71 |
32 | 25.36 | 18.71 | 36.99 | 35.46 | 14.59 | 28.43 |
64 | 37.17 | 16.69 | 33.39 | 37.69 | 15.31 | 29.88 |
128 | 36.29 | 16.74 | 34.02 | 42.32 | 13.39 | 26.38 |
256 | 39.31 | 15.54 | 31.56 | 56.43 | 11.38 | 22.83 |
512 | 37.73 | 15.40 | 31.59 | 59.18 | 11.87 | 24.45 |
1204 | 49.18 | 11.87 | 24.45 | – | – | – |
(c) Training of ResNet152 on ImageNet | ||||||
4 | 16.20 | 22.40 | 44.60 | 18.41 | 21.97 | 44.17 |
8 | 20.55 | 21.75 | 42.35 | 20.65 | 21.95 | 40.75 |
16 | 25.90 | 20.05 | 39.07 | 24.77 | 20.70 | 38.62 |
32 | 29.16 | 18.72 | 37.15 | 30.31 | 18.77 | 35.32 |
64 | 33.56 | 16.90 | 33.82 | 38.34 | 16.42 | 30.80 |
128 | 36.16 | 16.66 | 33.73 | 45.75 | 14.02 | 26.60 |
256 | 38.33 | 15.51 | 31.60 | 49.39 | 15.05 | 28.46 |
512 | 40.36 | 14.41 | 29.43 | 51.76 | 11.16 | 25.36 |
1024 | 43.21 | 13.08 | 26.99 | – | – | – |