Skip to main content

Table 2 Parameter study results. B is the number of forward-backward passes between global synchronizations and W is the number of batches to wait for the global synchronization data

From: Accelerating neural network training with distributed asynchronous and selective optimization (DASO)

  

32 GPUs (8 nodes)

128 GPUs (32 nodes)

B

S

Runtime, h

Validation Top-1, %

Runtime, h

Validation Top-1, %

1

0

4.5606

76.7715

1.2064

76.5416

1

1

4.2545

76.0859

1.1556

74.9233

2

0

4.0365

76.8828

1.0769

76.3027

2

1

3.8943

75.8086

1.0427

74.8936

2

2

3.8919

75.9238

1.0450

75.0854

4

0

3.6984

76.4258

0.9775

74.3478

4

1

3.6560

75.8262

1.0142

73.2962

4

2

3.7191

75.5020

0.9843

71.9570

4

4

3.7064

75.7070

0.9784

73.8560

8

0

3.4922

75.2598

0.9078

69.2732

8

4

3.5259

74.6113

0.9170

65.4733

8

8

3.5770

75.2637

0.9302

69.6655

16

0

3.3235

73.1348

0.8585

58.5397

16

4

3.3417

73.1758

0.8590

56.8865

16

8

3.3934

73.2148

0.8724

54.5323

16

16

3.4828

74.2129

0.8933

62.3692

32

0

3.2224

70.7480

0.8231

43.6855

32

4

3.2302

70.2773

0.8247

44.0639

32

16

3.2969

69.5781

0.8430

41.2458

32

32

3.4083

72.5488

0.8656

50.9539