Fig. 5From: Image caption generation using Visual Attention Prediction and Contextual Spatial Relation ExtractionSamples of image captions generated by the DWCNN based image captioning method with the five ground truth sentences denoted as GT1, GT2, GT3, GT4 and GT5 respectively. D1, D2 and D3 denotes the descriptions generated by the three configurations -WCNN+LSTM, WCNN+VPAN+LSTM amd WCNN+VAPN+CSE+LSTM, respectivelyBack to article page