From: Image captioning model using attention and object features to mimic human image understanding
Dataset
Training split
Validation split
Testing split
Total images
Flickr30k
28k
1k
30k
MS COCO
83k
41k
144k