Image captioning model using attention and object features to mimic human image understanding

Journal of Big Data

Table 3 Comparison with the results of Yin and Ordonez [9] on MS COCO Karpathy testing split

Method	BLEU-1	BLEU-2	BLEU-3	BLEU-4	METEOR	CIDEr	ROUGE-L	SPICE
Yin and Ordonez [9] baseline model	NA	NA	NA	0.21	0.215	0.759	0.464	NA
Yin and Ordonez [9] results with object features	NA	NA	NA	0.253	0.238	0.922	0.507	NA
Yin and Ordonez [9] increase (%)	NA	NA	NA	20.47	10.69	21.47	9.26	NA
Our increase (%)	6.26	8.42	11.53	16.09	3.82	15.04	3.76	5.88