Fig. 5From: Large scale performance analysis of distributed deep learning frameworks for convolutional neural networksParallel efficiency of Horovod, PyTorch-DDP and DeepSpeed on up to 512 GPUs with the native PyTorch data loader and raw ImageNet dataset for the ResNet50 case, averaged over three runs. Black line denotes the ideal case. The variance between runs is small (in general \(<5\%\)) and therefore not shownBack to article page