Fig. 2From: Image caption generation using Visual Attention Prediction and Contextual Spatial Relation ExtractionStructure of VAPNBack to article page