Journal of Big Data

Table 7 Results of ablation study conducted on MSCOCO dataset

From: Image caption generation using Visual Attention Prediction and Contextual Spatial Relation Extraction

Configuration	Cross-Entropy loss		Self-Critical loss
	B@4	CD	B@4	CD
WCNN+atr+LSTM	33.1	109.2	34.4	116.5
WCNN+atr+SA+LSTM	33.9	110.8	35.7	117.9
WCNN+atr+SA+CA+LSTM	35.2	112.7	36.3	119.0
WCNN+atr+CA+SA+LSTM	35.9	113.4	37.1	120.4
WCNN+atr+CA+SA+CSE+LSTM	37.5	116.9	38.2	124.2

Here atr atrous convolution, SA spatial attention, CA channel attention

Back to article page