Skip to main content

Table 4 A comparison with the results of Sharif et al. [19] on Flickr30k testing split

From: Image captioning model using attention and object features to mimic human image understanding

Method

BLEU-1

BLEU-2

BLEU-3

BLEU-4

METEOR

CIDEr

ROUGE-L

SPICE

Sharif et al.’s baseline model

0.4368

NA

NA

NA

0.1297

0.2517

0.2997

0.0700

Sharif et al.’s suggested model

0.4462

NA

NA

NA

0.1350

0.2835

0.3116

0.0741

Our baseline model

0.3990

0.2200

0.1170

0.0620

0.1230

0.1480

0.2930

0.0740

Our model (with the importance factor)

0.3980

0.2210

0.1160

0.0610

0.1290

0.1500

0.2980

0.0740

Sharif et al.’s increase (%)

2.15

NA

NA

NA

4.08

12.63

3.97

5.85

Our increase (%)

− 0.25

0.45

− 0.86

− 1.63

4.87

1.35

1.7

0

  1. Values in boldface represent the higher increase in the column