Skip to main content

Table 3 Model performance on SemEval2018-task9 dataset

From: Hypert: hypernymy-aware BERT with Hearst pattern exploitation for hypernym discovery

Subtask

Evaluation measures

Proposed method

RMM

SPON

Prompting

BERT

(is a)

Prompting

BERT

(such as)

1A

English

MRR

38.68 ± 2.00

27.21 ± 3.50

24.94 ± 4.10

19.74 ± 0.44

19.77 ± 0.41

MAP

24.17 ± 1.26

18.25 ± 1.16

15.72 ± 1.75

11.43 ± 0.16

10.93 ± 0.19

P@1

29.57 ± 1.98

18.71 ± 4.40

17.02 ± 3.95

12.10 ± 0.58

13.54 ± 0.50

P@3

21.56 ± 1.38

15.57 ± 1.76

13.53 ± 2.33

10.19 ± 0.22

9.64 ± 0.25

P@5

21.27 ± 1.25

16.14 ± 1.20

13.77 ± 1.87

10.20 ± 0.19

9.31 ± 0.14

P@15

27.52 ± 1.35

21.47 ± 2.12

18.42 ± 1.16

13.02 ± 0.23

12.73 ± 0.22

2A

Medical

MRR

64.83 ± 3.32

42.25 ± 2.01

46.54 ± 2.89

49.28 ± 1.60

41.40 ± 1.53

MAP

50.24 ± 2.29

32.44 ± 2.72

34.19 ± 1.87

21.99 ± 0.73

18.75 ± 0.67

P@1

53.28 ± 3.86

30.12 ± 2.44

35.26 ± 3.31

38.86 ± 1.65

30.62 ± 1.53

P@3

46.69 ± 3.52

28.68 ± 2.82

32.63 ± 2.63

25.07 ± 1.09

19.83 ± 0.94

P@5

46.60 ± 2.91

29.18 ± 2.73

32.02 ± 2.11

21.18 ± 0.92

17.30 ± 0.74

P@15

54.69 ± 1.74

37.72 ± 3.40

37.35 ± 1.55

19.28 ± 0.63

17.81 ± 0.56

2B

Music

MRR

67.43 ± 2.37

54.37 ± 3.06

60.47 ± 3.91

19.84 ± 0.98

19.54 ± 0.97

MAP

55.03 ± 1.98

47.52 ± 2.75

48.38 ± 2.07

8.91 ± 0.49

8.42 ± 0.44

P@1

56.68 ± 2.98

42.52 ± 3.45

48.34 ± 5.34

12.32 ± 0.65

12.42 ± 0.76

P@3

52.94 ± 2.05

44.53 ± 2.51

45.29 ± 3.25

9.08 ± 0.61

8.64 ± 0.37

P@5

52.92 ± 2.37

45.24 ± 2.92

46.00 ± 2.23

8.55 ± 0.47

7.72 ± 0.48

P@15

58.59 ± 2.05

52.72 ± 2.90

52.81 ± 1.00

8.86 ± 0.62

8.54 ± 0.53

  1. Bold indicates the best performance across the comparison models