Fig. 1From: Image caption generation using Visual Attention Prediction and Contextual Spatial Relation ExtractionProposed model architectureBack to article page