Fig. 4From: Large scale performance analysis of distributed deep learning frameworks for convolutional neural networksParallel efficiency of Horovod and PyTorch-DDP on up to 1024 GPUs with the DALI data loader for CPU- (a) and GPU-based (b) pre-processing with compressed ImageNet dataset for the ResNet50 case, averaged over three runs. Black line denotes the ideal case. The variance between runs is small (in general \(<5\%\)) and therefore not shownBack to article page