Skip to main content

Table 1 A comparison of the used datasets

From: Image captioning model using attention and object features to mimic human image understanding

Dataset

Training split

Validation split

Testing split

Total images

Flickr30k

28k

1k

1k

30k

MS COCO

83k

41k

41k

144k