From: Accelerating neural network training with distributed asynchronous and selective optimization (DASO)
32 GPUs (8 nodes) | 128 GPUs (32 nodes) | ||||
---|---|---|---|---|---|
B | S | Runtime, h | Validation Top-1, % | Runtime, h | Validation Top-1, % |
1 | 0 | 4.5606 | 76.7715 | 1.2064 | 76.5416 |
1 | 1 | 4.2545 | 76.0859 | 1.1556 | 74.9233 |
2 | 0 | 4.0365 | 76.8828 | 1.0769 | 76.3027 |
2 | 1 | 3.8943 | 75.8086 | 1.0427 | 74.8936 |
2 | 2 | 3.8919 | 75.9238 | 1.0450 | 75.0854 |
4 | 0 | 3.6984 | 76.4258 | 0.9775 | 74.3478 |
4 | 1 | 3.6560 | 75.8262 | 1.0142 | 73.2962 |
4 | 2 | 3.7191 | 75.5020 | 0.9843 | 71.9570 |
4 | 4 | 3.7064 | 75.7070 | 0.9784 | 73.8560 |
8 | 0 | 3.4922 | 75.2598 | 0.9078 | 69.2732 |
8 | 4 | 3.5259 | 74.6113 | 0.9170 | 65.4733 |
8 | 8 | 3.5770 | 75.2637 | 0.9302 | 69.6655 |
16 | 0 | 3.3235 | 73.1348 | 0.8585 | 58.5397 |
16 | 4 | 3.3417 | 73.1758 | 0.8590 | 56.8865 |
16 | 8 | 3.3934 | 73.2148 | 0.8724 | 54.5323 |
16 | 16 | 3.4828 | 74.2129 | 0.8933 | 62.3692 |
32 | 0 | 3.2224 | 70.7480 | 0.8231 | 43.6855 |
32 | 4 | 3.2302 | 70.2773 | 0.8247 | 44.0639 |
32 | 16 | 3.2969 | 69.5781 | 0.8430 | 41.2458 |
32 | 32 | 3.4083 | 72.5488 | 0.8656 | 50.9539 |