Skip to main content

Table 4 Sample of PD “Big Data” Studies

From: The evolution of Big Data in neuroscience and neurology

Refs

Year

Author

Vol

Var

Vel

Ver

Val

[140]

2010

Dinov

PD (263 de novo, 40 SWEDD), 127 HC

PPMI imaging, genetics, clinical and demographic

O

A

C

[265, 266, 267]

2012

PDBP Cons

 > 2000 Parkinsonian, > 250 Lewy body

Biospecimen (e.g., blood), imaging (e.g., fMRI), clinical

O

A

C

[141]

2014

Nalls

“7,893,274 variants across 13,708 cases and 95,282 controls”

Demographics, genetic, clinical

O

Mix

 

[116]

2018

Prince

312 PD subjects & 236 HC

Demographics, clinical, walking, voice, finger tapping

F(ApD)

A

C

[268]

2016

Cohen

NA (but includes 700,000 h smartwatch data from hundreds of PD)

Clinical, kinematics

F(ApD)

A

C

[144, 269, 270, 271, 272]

2017

Age Plat. EU

 > 4500 Elderly Adults

Behaviroal (acitvity), location, typing, voice

O(ApD)

A

C

[273]

2017

Suo

153 PD, 81 HC

Clinical, imaging (e.g., rs-fMRI)

F

Mix

C

[180]

2017

Horn

95 PD patients with STN DBS [2 centers]

Imaging (eg. rs-fMRI), clinical

F

A

C

[274]

2018

Senthilarumugam

1479 patients (418 PD, 172 HC, 62 prodromal, 827 genetic cohorts)

PPMI imaging, genetics, clinical and demographic

O

A

C

[120]

2018

Peter

170 million health care–covered- > 144,018 IBD & 720,090 HC claim info

Incidence rates, anti-TNF Rx rates, ICD-9 & 10 codes

F

-

C

[275]

2019

Sreenivasan

20 early-stage drug-naïve PD,,16 HC

Clinical, imaging (e.g., MRI, fMRI)

F

A

C

[123]

2020

Yu

93 PD, 95 HC

Clinical, serum VK2 levels, genetic

F

A

C

[276]

2021

Wu

5,998 PD or ET DBS patients [283 centers]

Medicare Claims Files (eg., reoperation rate)

O

A

C

[277]

2021

Zhang

60,000 dialogues(40,000 patients & 3000 practitioners), 2895 Demographics

Demographics, patient descriptions of symptoms

O(SM)*

A

C

[278]

2021

De Micco

147 drug-naïve PD, 38 HC

Clinical, imaging (e.g., rs-fMRI), demographics

F

A

C

[191]

2022

Monte-Rubio

216 PD & 87 HC [4 centers]

Imaging(MRI from multiple sites)

F

Mix

C

[279]

2022

Loh

75 PD DBS candidates

Demographic,clinical, imaging (e.g., MRI, rs-fMRI)

F

A

C

  1. We have classified the example citations [116, 120, 123, 140, 141, 144, 180, 191, 265,266,267,268,269,270,271,272,273,274,275,276,277,278,279] with the classic 5 V’s definition. However, these are not always clearly defined in the prospective studies, retrospective studies, or review articles. For Volume: We focused on the size of patient cohorts. For Variety: We indicate the different data and specimen types derived from the cohorts (note, Variety can also be seen in patient type, tabulated in Volume; and vice versa the data type is indicative of volume indicated in Variety). For Velocity: We reported the data Velocity as either ‘F’ for Fixed studies (analyzing data from databases or studies which are no longer acquiring data) or ‘O’ for Ongoing studies (analyzing data from databases or clinical studies that are still acquiring data, although it should be noted that the reported results of the studies are based on analysis of a fixed data set with the noted volume at the time of the publication). We also indicate if any “real-time” data was or will be gathered as part of the study (ApD Mobile App Realtime Dependent; SM Social Media Dependent). Where any type of velocity information is given, and a velocity calculation can be made, it is provided in the Additional file 1: Table S4 (and noted herein with a *). For Veracity: M Manual verification; A Data verified through automated analytical process (AI, statistical methods); and Mix Automated Analytical and Manual (or semi-automated). However, all experimental data veracity is dependent on the methodological limitations of the core studies; thus, we also provide examples of variability or error in the Additional file 1: Table S4. For Value: As neither study costs are disclosed, health economics assessments completed, nor a monetary cost assigned in the sale or purchase of any of the above data sets, we report "P" for Preclinical or “Cl” for Clinical value, dependent on the study species and data use. The limitations to these definitions and study information availability are described in the text (e.g., see “Proposed Solutions”). In the Additional file Section, Additional file 1: Table S4, we also include information on the tools used, database source(s), and methodological limitations. For Year we indicate the year of the earliest publication