Skip to main content

Table 2 Parameter study results. B is the number of forward-backward passes between global synchronizations and W is the number of batches to wait for the global synchronization data

From: Accelerating neural network training with distributed asynchronous and selective optimization (DASO)

   32 GPUs (8 nodes) 128 GPUs (32 nodes)
B S Runtime, h Validation Top-1, % Runtime, h Validation Top-1, %
1 0 4.5606 76.7715 1.2064 76.5416
1 1 4.2545 76.0859 1.1556 74.9233
2 0 4.0365 76.8828 1.0769 76.3027
2 1 3.8943 75.8086 1.0427 74.8936
2 2 3.8919 75.9238 1.0450 75.0854
4 0 3.6984 76.4258 0.9775 74.3478
4 1 3.6560 75.8262 1.0142 73.2962
4 2 3.7191 75.5020 0.9843 71.9570
4 4 3.7064 75.7070 0.9784 73.8560
8 0 3.4922 75.2598 0.9078 69.2732
8 4 3.5259 74.6113 0.9170 65.4733
8 8 3.5770 75.2637 0.9302 69.6655
16 0 3.3235 73.1348 0.8585 58.5397
16 4 3.3417 73.1758 0.8590 56.8865
16 8 3.3934 73.2148 0.8724 54.5323
16 16 3.4828 74.2129 0.8933 62.3692
32 0 3.2224 70.7480 0.8231 43.6855
32 4 3.2302 70.2773 0.8247 44.0639
32 16 3.2969 69.5781 0.8430 41.2458
32 32 3.4083 72.5488 0.8656 50.9539