Neural network training with limited precision and asymmetric exponent

Pietrołaj, Mariusz; Blok, Marek

doi:10.1186/s40537-022-00606-2

Journal of Big Data

Table 1 Detailed summary of the related study with comparison to the proposed technique of limitation with asymmetric exponent

From: Neural network training with limited precision and asymmetric exponent

Paper	Variable type	Technique	Category	Dataset	Topology	32-bit baseline accuracy	Accuracy after limitation	AI framework
Gupta et al. [31]	12-bit fixed point 14-bit fixed point	Stochastic rounding	Software limitation Hardware design	MNIST	Custom LeNet	99.23%	99.17% (14-bit fixed point) 99.11% (12-bit fixed point)	Not defined
Gupta et al. [31]	12-bit fixed point 14-bit fixed point	Stochastic rounding	Software limitation Hardware design	CIFAR10	3-layer CNN	75.4%	74.6% (14-bit fixed point) 71.2% (12-bit fixed point)	Not defined
Ortiz et al. [34]	12-bit floating point 12-bit fixed point	Stochastic rounding Context representation	Software limitation	CIFAR10	3-layer CNN	75.6%	63.03% (12-bit fixed point) 74.20% (12-bit floating point) 78.02% (12-bit context-float) 76.32% (12-bit context-fixed)	Caffe
Na and Mukhopadhyay [35]	16-bit fixed point 32-bit fixed point	Dynamic precision scaling (DPS) Flexible multiplier-accumulator (MAC)	Hardware design	MNIST	LeNet	Not given (only loss charts presented)	32-bit fixed point accuracy achieved on 16-bit fixed point with DPS	Caffe
Na and Mukhopadhyay [35]	16-bit fixed point 32-bit fixed point		Hardware design	Flickr images	AlexNet (pre-trained)	Not given (only loss charts presented)	64-bit fixed point accuracy achieved on 32-bit fixed point with DPS	Caffe
Taras and Stuart [36]	14-bit fixed point (weights) 16-bit fixed point (activations)	Stochastic rounding Dynamic precision scaling (DPS)	Software limitation	MNIST	LeNet	98.80%	98.80%	Caffe
Park et al. [37]	Combination of 8-bit and 16-bit integers	Stochastic gradient descent with Kahan summation Lazy update	Software limitation	MNIST	LeNet-like CNN	99.10%	99.24%	Caffe Tensorflow
				SVHN	4-layer CNN	97.06%	96.99%
				CIFAR10	3-layer CNN	81.56%	81.17%
				CIFAR10	ResNet-20	90.16%	90.23%
				ImageNet	AlexNet	80.81%	80.62%
Fuketa et al. [39]	9-bit floating point format with hidden most significant bit and sign bit	Custom float representation Custom MAC unit	Software limitation Hardware design	ILSVRC	AlexNet	48.27%	46.18%	Not defined
Fuketa et al. [39]		Custom float representation Custom MAC unit	Software limitation Hardware design	ILSVRC	ResNet-50	68.84%	67.55%	Not defined
Onishi et al. [42]	No strict parameters limitation, factorization based on LUT is used for limiting memory consumption and multi-adds operations	Lookup-Table (LUT) based quantization Cluster swap	Software limitation Hardware design	MNIST	LeNet	99.28%	97.87%	PyTorch
							Memory consumption reduced
							22.2% (forward pass)
							60% (backward pass)
Lee et al. [43]	Fully variable weight bit-precision from 1 to 16 b	Original hardware accelerator for CNN-RNN networks	Hardware design	Not applicable	AlexNet VGG-16	Not applicable	Operation based power savings presented	Not applicable
Our proposal	8-bit floating point 12-bit floating point 14-bit floating point	Asymmetric exponent No additional rounding	Software limitation	MNIST	LeNet	96.04%	75.89% (8-bit floating point) 95.01% (12-bit floating point) 97.13% (14-bit floating point)	PyTorch

Back to article page