Skip to main content

Table 5 Sample of SUD and OUD “Big Data” Studies

From: The evolution of Big Data in neuroscience and neurology

Refs

Year

Author

Vol

Var

Vel

Ver

Val

[280]

2016

Kohno

39 methamphetamine (MA)-dependent subjects and 44 HC

Clinical, Imaging (e.g., rs-fMRI, PET)

F

A

C

[203]

2016

Mackey

 > 10,000 subjects (review)

Imaging (e.g., MRI), genetic

O

A

C

[16]

2017

Kim

NA

Social media-based metrics (e.g., number of likes on Facebook groups)

NA

NA

C

[160]

2017

Sanchez-Roige

 > 120,000 patients

Alcohol Use Disorders Identification Test (AUDIT), genetics

F

A

C

[281]

2018

Ipser

46 MA-dependent subjects and 26 HC

Clinical, Imaging (e.g., rs-fMRI)

F

A

C

[282]

2018

Lisdahl

12,000 youth (21 US sites) [283]

Cognitive, clinical (SUD focus), culture & environment, imaging (e.g., MRI), and bioassays

O

A

C

[284]

2018

Sun

78 heroin abusers and 79 HC

Imaging (e.g., DTI), clinical, and genetic

F

A

C

[159]

2019

Mackey

23 labs, 2,140 SUD, 1100 HC

Imaging (e.g., MRI), clinical for alcohol, nicotine, cocaine, methamphetamine, or cannabis dependent patients

O

A

C

[285]

2019

Yip

74 methadone-maintained, cocaine-dependent subjects

Imaging (e.g., fMRI), data from Monetary Incentive Delay task, clinical

F

A

C

[286]

2019

Young

NA-This is a viewpoint paper

Social media posts, location, cannabis outcomes

NA

NA

C

[161]

2020

Cuomo

10 M tweets- > 257 tweets about opioids, IV Drug Use or HIV hospitalizations and HIV cases

Twitter data, hospitalizations, and new HIV cases

F(SM)

Mix

C

[287]

2020

Segal

“10 M medical insurance claims” “from 550,000 patient records”

Diagnosis & procedures, medications, episode counts

O

A

C

[122]

2020

Slade

11,778,912 records, 118,063 with adolescent ADHD medication

Longitudinal clinical and medication hx, demographics

F

A

PC

[288]

2020

Zhou

 > 10,000 European ancestry OUD; > 70,000 opioid-exposed control

 > 5000 African ancestry OUD; > 25,000 opioid-exposed control

Genetic, clinical

O

A

 

[37]

2020

Thompson

33 sites, 12,347 individuals (including 2277 adults with SUD (alcohol, nicotine, cocaine, MA, or cannabis)

Imaging (e.g., MRI), clinical, genetic, and epigenetic

O

A

C

[289]

2021

Flores

19,721 tweets identified with opioid keywords across 7 US cities

Tweets, geolocation

O(SM)

Mix

C

[290]

2021

Gelernter

NA

Clinical, genetics

NA

NA

C

[291]

2021

Liu

31 heroin users

Clinical, imaging (e.g., fMRI during visual cues)

F

A

C

[292]

2021

Purushothaman

“56,464 Instagram posts and comments”, including 719 posts containing “suicide, substance use and/or mental health”

Instagram posts

O(SM)*

Mix

C

[293]

2021

Rosetti

660 Alcohol Dependence, 326 controls

Imaging (e.g., DTI, MRI), clinical (e.g., drug use)

O

A

C

[294]

2021

Tretter

NA

NA

NA

NA

C

[158]

2022

Hayes

 > 9 M veterans

Clinical, insurance claims, imaging (e.g., fMRI), genetics

O

A

C

[295]

2022

Li

46 MA-dependent subjects and 40 HC

Clinical, imaging (e.g., rs-fMRI)

F

A

C

[296]

2022

Ottino-Gonzalez

 > 700 subjects (cocaine (n = 147), MA (n = 132) nicotine (n = 189), and HC = 333)

Imaging (DTI, MRI), clinical (e.g., drug use)

O

A

C

  1. We have classified the example citations [16, 122, 159,160,161, 203, 280,281,282, 284,285,286,287,288,289,290,291,292,293,294,295,296,297] with the classic 5 V’s definition. However, these are not always clearly defined in the prospective studies, retrospective studies, or review articles. For Volume (Vol): We focused on the size of patient cohorts. For Variety: We indicate the different data and specimen types derived from the cohorts (note, Variety can also be seen in patient type, tabulated in Volume; and vice versa the data type is indicative of volume indicated in Variety). For Velocity (Vol): We reported the data Velocity as either ‘F’ for Fixed studies (analyzing data from databases or studies which are no longer acquiring data) or ‘O’ for Ongoing studies (analyzing data from databases or clinical studies that are still acquiring data, although it should be noted that the reported results of the studies are based on analysis of a fixed data set with the noted volume at the time of the publication). We also indicate if any “real-time” data was or will be gathered as part of the study (SM Social Media Dependent). Where any type of velocity information is given, and a velocity calculation can be made, it is provided in the Additional file 1: Table S5 (and noted herein with a *). For Veracity (Vol): M Manual verification, A Data verified through automated analytical process (AI, statistical methods), and Mix Automated Analytical and Manual (or semi-automated). However, all experimental data veracity is dependent on the methodological limitations of the core studies; thus, we also provide examples of variability or error in the Additional file 1: Table S5. For Value (Vol): As neither study costs are disclosed, health economics assessments completed, nor a monetary cost assigned in the sale or purchase of any of the above data sets, we report “P” for Preclinical, “C” for Clinical value or “PC” for Preclinical and Clinical, dependent on the study species and data use. The limitations to these definitions and study information availability are described in the text (e.g., see “Proposed Solutions”). In the Additional file 1, Additional file 1: Table S5, we also include information on the tools used, database source(s), and methodological limitations. For Year we indicate the year of the earliest publication. hx history