Skip to main content

Table 7 Results of ablation study conducted on MSCOCO dataset

From: Image caption generation using Visual Attention Prediction and Contextual Spatial Relation Extraction

Configuration

Cross-Entropy loss

Self-Critical loss

 

B@4

CD

B@4

CD

WCNN+atr+LSTM

33.1

109.2

34.4

116.5

WCNN+atr+SA+LSTM

33.9

110.8

35.7

117.9

WCNN+atr+SA+CA+LSTM

35.2

112.7

36.3

119.0

WCNN+atr+CA+SA+LSTM

35.9

113.4

37.1

120.4

WCNN+atr+CA+SA+CSE+LSTM

37.5

116.9

38.2

124.2

  1. Here atr atrous convolution, SA spatial attention, CA channel attention