Skip to main content

Table 1 A survey of related works that uses and/or provide occupational-related datasets

From: Title2Vec: a contextual job title embedding for occupational named entity recognition and other applications

Publication

Data source

Size

Publicly available

IPOD (our proposed dataset) [10]

Professional network

475K

Yes

Mimno et al. [5]

Resumes

54K

No

Lou et al. [11]

Linkedin

67K

No

Paparrizos et al. [12]

Web

5M

No

Zhang et al. [13]

Job site

7K

No

Liu et al. [4]

Social network

30K

No

Li et al. [14]

Linkedin

–

No

Li et al. [15]

High tech company

–

No

Yang et al. [16]

Resumes

823K

No

Zhu et al. [17]

Job portals

2M

No

James et al. [1]

APS

60K

Yes

Yang et al. [2]

Various channels

–

No

Xu et al. [18]

Professional network

20M

No

Qin et al. [19]

High tech company

1M

No

Lim et al. [20]

Linkedin

10K

No

Shen et al. [21]

High tech company

14K

No

Zhang et al. [22]

Resumes

2.1M

No

Nigam et al. [23]

Job portals

4k

No

Meng et al. [24]

Professional network

414k

No

Van Huynh et al. [25]

Job portals

10k

Yes

Gugnani et al. [26]

Job portals

1.1M

No

Alanoca et al. [27]

Resumes

5k

No

Zhang et al. [28]

Linkedin

459k

No

  1. Apart from two datasets of comprising publications and authors [1] and job titles and descriptions [25], there are no publicly available occupational-related dataset from our survey. The first dataset [1] contains publications and authors from the American Physics Society (APS) but only describes the names and affiliations of physics scientists without their job title or appointments, while the second dataset [25] contains the job title and job description from a job portal but only pertaining to IT-related jobs. Our proposed dataset, IPOD is bolded and in the first row