Skip to main content

Table 1 A survey of related works that uses and/or provide occupational-related datasets

From: Title2Vec: a contextual job title embedding for occupational named entity recognition and other applications

Publication Data source Size Publicly available
IPOD (our proposed dataset) [10] Professional network 475K Yes
Mimno et al. [5] Resumes 54K No
Lou et al. [11] Linkedin 67K No
Paparrizos et al. [12] Web 5M No
Zhang et al. [13] Job site 7K No
Liu et al. [4] Social network 30K No
Li et al. [14] Linkedin No
Li et al. [15] High tech company No
Yang et al. [16] Resumes 823K No
Zhu et al. [17] Job portals 2M No
James et al. [1] APS 60K Yes
Yang et al. [2] Various channels No
Xu et al. [18] Professional network 20M No
Qin et al. [19] High tech company 1M No
Lim et al. [20] Linkedin 10K No
Shen et al. [21] High tech company 14K No
Zhang et al. [22] Resumes 2.1M No
Nigam et al. [23] Job portals 4k No
Meng et al. [24] Professional network 414k No
Van Huynh et al. [25] Job portals 10k Yes
Gugnani et al. [26] Job portals 1.1M No
Alanoca et al. [27] Resumes 5k No
Zhang et al. [28] Linkedin 459k No
  1. Apart from two datasets of comprising publications and authors [1] and job titles and descriptions [25], there are no publicly available occupational-related dataset from our survey. The first dataset [1] contains publications and authors from the American Physics Society (APS) but only describes the names and affiliations of physics scientists without their job title or appointments, while the second dataset [25] contains the job title and job description from a job portal but only pertaining to IT-related jobs. Our proposed dataset, IPOD is bolded and in the first row