Fig. 3From: Image caption generation using Visual Attention Prediction and Contextual Spatial Relation ExtractionComparison of the captions generated by the baseline approach and the proposed method for few samples images. Here GT represents the ground truth sentenceBack to article page