Skip to main content

Table 3 Twitter dataset distribution

From: Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging

Dataset

Twitter (manually collected)

Type

Train

Test

Validation

Label

No

Yes

No

Yes

No

Yes

Openness

14600

17511

3128

3753

3129

3753

Conscientiousness

23666

8445

5072

1809

5072

1810

Extraversion

9210

22901

1974

4907

1974

4908

Agreeableness

14348

17763

3074

3807

3075

3807

Neuroticism

17712

14399

3796

3085

3796

3086