Skip to main content

Table 3 Hyperparameters used to train the hierarchical multi-scale attention network using the Cityscapes dataset

From: Accelerating neural network training with distributed asynchronous and selective optimization (DASO)

Data Loader PyTorch
Local Optimizer SGD
Local Optimizer Parameters Momentum: 0.9 Weight Decay: 0.0001
Epochs 175
Learning Rate (LR) Decay Reduce on Stable
LR Parameters Stable Epochs Before Change: 5 Decay Factor: 0.75
LR Warmup Phase 5 epochs, see Goyal et al. [38]
Maximum LR 0.4
Loss Function Region Mutual Information [42]