Journal of Big Data

Table 3 Hyperparameters used to train the hierarchical multi-scale attention network using the Cityscapes dataset

From: Accelerating neural network training with distributed asynchronous and selective optimization (DASO)

Data Loader	PyTorch
Local Optimizer	SGD
Local Optimizer Parameters	Momentum: 0.9	Weight Decay: 0.0001
Epochs	175
Learning Rate (LR) Decay	Reduce on Stable
LR Parameters	Stable Epochs Before Change: 5	Decay Factor: 0.75
LR Warmup Phase	5 epochs, see Goyal et al. [38]
Maximum LR	0.4
Loss Function	Region Mutual Information [42]

Back to article page