Skip to main content

Table 1 Dataset statistics after pre-processing

From: Readers’ affect: predicting and understanding readers’ emotions with deep learning

Statistics

REN-10k

RENh-4k

SemEval-2007

Source

Rappler

Rappler

The New York Times, CNN, BBC, Google News

Year span

2014 to 2019

2015 to 2018

Length

Short-text (after pre-processing)

Short-text

Short-text

Number of news documents

10,272

4000

1246 (valid documents after pre-processing)

Total number of words

305,160

124,172

6364

Number of unique words

27,749

13,260

3286

Average words per document

29.70

31.043

5.09

Average sentences per document

1.18

1.1875

1.00

Number of annotations

528,327

242,680

6 (annotators)

Mean percentage of votes for each emotion class

Anger: 0.2124

Anger: 0.3388

Anger: 0.1013

Fear: 0.0658

Fear: 0.1475

Fear: 0.1639

Joy: 0.4215

Joy: 0.3137

Joy: 0.2860

Sadness: 0.1399

Sadness: 0.0781

Sadness: 0.2069

Surprise: 0.1606

Surprise: 0.1218

Surprise: 0.2416

Number of articles associated with each emotion class

Anger: 6904

Anger: 3068

Anger: 652

Fear: 4233

Fear: 1850

Fear: 820

Joy: 8917

Joy: 3267

Joy: 786

Sadness: 5972

Sadness: 2489

Sadness: 863

Surprise: 6431

Surprise: 2312

Surprise: 1102