Skip to main content

Table 3 Performance comparison of different methods for item recommendation task on five datasets

From: RTiSR: a review-driven time interval-aware sequential recommendation method

Dataset

Metric

BPRMF

CORE

SINE

LightSANs

SLRC

DeppCoNN

CFKG

RTiSR

Impv.

p-value

MIs

HR@5

0.4033

0.5976

0.5153

0.5477

0.5213

0.4136

0.5250

0.5951

NDCG@5

0.2921

0.3942

0.3957

0.4078

0.3890

0.2923

0.3625

0.4207

3.16%

8.9e–2

HR@10

0.4551

0.5869

0.5612

0.5698

0.5890

0.4741

0.5548

0.6874

16.71%

1.1e–4\(^\star \)

NDCG@10

0.3007

0.4056

0.4034

0.4248

0.4038

0.3136

0.3890

0.4569

7.56%

4.9e–2\(^\star \)

HR@20

0.4568

0.5935

0.5655

0.5786

0.5992

0.4947

0.5660

0.6957

16.10%

8.2e–4\(^\star \)

NDCG@20

0.3215

0.4104

0.4099

0.4292

0.4127

0.3531

0.3977

0.4599

7.15%

4.2e–2\(^\star \)

Auto

HR@5

0.4417

0.6097

0.5559

0.6068

0.5023

0.4319

0.5501

0.6207

1.80%

1.3e–2\(^\star \)

NDCG@5

0.2721

0.4660

0.4146

0.4703

0.3140

0.2812

0.3007

0.3861

HR@10

0.4668

0.6107

0.5663

0.6456

0.5662

0.4638

0.5945

0.7039

9.03%

4.9e–2\(^\star \)

NDCG@10

0.3102

0.4448

0.4200

0.4874

0.3871

0.3008

0.3915

0.4833

HR@20

0.4725

0.6187

0.5925

0.6591

0.5810

0.4850

0.6110

0.7099

7.71%

7.2e–2

NDCG@20

0.3320

0.4471

0.3274

0.3842

0.4020

0.3122

0.4201

0.4929

10.24%

4.8e–2\(^\star \)

LB

HR@5

0.4359

0.5269

0.5311

0.4973

0.4903

0.4759

0.4779

0.5531

4.14%

7.5e–3\(^\star \)

NDCG@5

0.2805

0.3639

0.3295

0.3538

0.3656

0.3032

0.3137

0.3992

9.19%

6.1e–4\(^\star \)

HR@10

0.4620

0.5279

0.5347

0.5016

0.5243

0.4904

0.5021

0.6075

13.62%

2.0e–3\(^\star \)

NDCG@10

0.2917

0.3791

0.3362

0.3581

0.3907

0.3426

0.3383

0.4251

8.80%

2.0e–2\(^\star \)

HR@20

0.4801

0.5323

0.5404

0.5189

0.5309

0.5011

0.5201

0.6089

12.68%

3.1e–3\(^\star \)

NDCG@20

0.3382

0.4421

0.3434

0.3618

0.4102

0.3213

0.3566

0.4315

Beer

HR@5

0.3625

0.4938

0.4611

0.5173

0.4526

0.3820

0.3661

0.5617

8.58%

4.1e–2\(^\star \)

NDCG@5

0.2496

0.3624

0.3633

0.3558

0.2926

0.2817

0.2457

0.3715

2.26%

2.7e–3

HR@10

0.3998

0.5793

0.5673

0.5769

0.4725

0.3947

0.3870

0.5921

2.21%

7.3e–3\(^\star \)

NDCG@10

0.2651

0.3963

0.3706

0.3616

0.3322

0.2920

0.2681

0.4107

3.63%

0.17

HR@20

0.4195

0.5868

0.5908

0.5838

0.5026

0.4218

0.4102

0.5026

NDCG@20

0.3005

0.3556

0.3987

0.3348

0.3539

0.3198

0.3108

0.4211

5.62%

0.18

Yelp

HR@5

0.4331

0.4868

0.5166

0.5326

0.4903

0.4741

0.4726

0.5637

5.84%

3.9e–2\(^\star \)

NDCG@5

0.2521

0.3601

0.3492

0.3439

0.3107

0.3112

0.2635

0.3826

6.25%

3.6e–2\(^\star \)

HR@10

0.4537

0.5159

0.5227

0.5411

0.5130

0.5001

0.4807

0.6139

13.45%

1.2e–2\(^\star \)

NDCG@10

0.2813

0.3956

0.3682

0.3457

0.3261

0.3138

0.2840

0.4217

6.60%

0.10

HR@20

0.4807

0.5338

0.5556

0.5469

0.5337

0.5238

0.5010

0.6219

11.93%

3.8e–3\(^\star \)

NDCG@20

0.3101

0.3964

0.3685

0.3514

0.3740

0.3329

0.3124

0.4361

10.02%

4.5e–2\(^\star \)

Statistic

Win/Loss

(18/0)

(17/1)

(17/1)

(16/2)

(18/0)

(18/0)

(18/0)

   

F-rank\(^{a} \)

1.33

6.23

5.13

5.67

4.65

2.30

3.00

7.68

  
  1. We use bold and underline fonts to denote the best performance and second best performance method in each metric respectively. Impv. refers to the relative improvement of RTiSR over the corresponding baseline, and p-value. Measures the significance of RTiSR relative to baseline improvement
  2. \(^{{\textrm{a}}}\)A higher F-rank value indicates a higher recommendation performance
  3. \(^\star \) denotes that the corresponding improvement has passed the significant test at the significance level of 0.05