Skip to main content

Table 2 Results of adding object features to the baseline model on MS COCO Karpathy split

From: Image captioning model using attention and object features to mimic human image understanding

Model

BLEU-1

BLEU-2

BLEU-3

BLEU-4

METEOR

CIDEr

ROUGE-L

SPICE

Baseline model

0.463

0.273

0.156

0.087

0.157

0.339

0.345

0.102

Ours (with YOLO bounding boxes, without the importance factor)

0.486

0.293

0.173

0.099

0.164

0.390

0.358

0.108

Ours (with YOLO bounding boxes and the importance factor)

0.492

0.296

0.174

0.101

0.163

0.390

0.358

0.108

Increase due to the importance factor (%)

1.23

1.02

0.57

2.02

− 0.99

0

0

0

Increase over the baseline model (%)

6.26

8.42

11.53

16.09

3.82

15.04

3.76

5.88