Healthcare knowledge graph construction: A systematic review of the state-of-the-art, open issues, and opportunities

Abu-Salih, Bilal; AL-Qurishi, Muhammad; Alweshah, Mohammed; AL-Smadi, Mohammad; Alfayez, Reem; Saadeh, Heba

doi:10.1186/s40537-023-00774-9

Journal of Big Data

Table 1 A Summary of KG construction approaches for drug discovery, drug repurposing, and adverse drug reaction

From: Healthcare knowledge graph construction: A systematic review of the state-of-the-art, open issues, and opportunities

Ref.	KG Specific Functionality	Knowledge Extraction Techniques		Type of KB	KG Resource(s)	KG Stats	Evaluation Measure(s)	Shortcoming(s)
Ref.	KG Specific Functionality	Entity-level	Relation-Level	Type of KB	KG Resource(s)	KG Stats	Evaluation Measure(s)	Shortcoming(s)
[46]	Drug discovery	Manual and fuzzy matching		Schema-based	Wikidata, DrugBank^{Footnote 13}, WedMD, and GoodRx	N/A	R, P	• Lack of statistics on the resultant KG. • Limited discussion on the Ontology design • The evaluation of the proposed model emphasized on KG embedding rather than the resultant integrated KG.
[47]	Drug discovery for COVID-19	Manual construction based on six KGs obtained from the literature		Schema-based	Literature on COVID-19	#n: 100,00 #e: 670,000	AUC, and AUPRC	• Insufficient discussion on the mechanism followed to integrate the incorporated KGs, • The evaluation of Att-GCN-DDI is limited and not detailed.
[48]	Drug discovery	Manual extraction based on Bio2RDF KG		Hybrid	Bio2RDF^{Footnote 14}	#n: 2,947,140 #e: 10,131,654	AUC, AUPR, F1	• Inadequate discussion on the construction of drug KG.
[55]	Drug repurposing	Algorithms developed at BenevolentAI^{Footnote 15} and part of their IP		Hybrid	Structured and unstructured resourced including Literature on COVID-19	#n: millions #e: hundreds of millions	Case study	• There is no detailed discussion on the mechanism followed to construct BenevolentAI graph. • The evaluation was merely measured by case study.
[56]	Drug repurposing	Coarse- and fine-grained entity extraction	Manually based on CTD and MeSH	Schema-based	Multimodal scientific literature (CTD^{Footnote 16})	#n: 67,217 #e: 77,844,574	Case study on Drug Repurposing Report Generation	• Although the proposed framework demonstrated success in tackling the quantity issue of relevant KG resources, the quality issue was not properly evaluated to demonstrate its effectiveness. • Observed bias in training and development data, source, and test queries.
[57]	Drug repurposing	Manually encoded in Biological Expression Language		Schema-free	PubMed, LitCovid^{Footnote 17}, EuropePMC, etc.	#n: 4,016 #e: 10,232	Case study (Gene Expression Analysis)	• The mechanism followed to construct the KG (manual-based) is poor in terms of scalability.
[58]	Drug repurposing	Cross-referencing		Schema-based	PharmGKB, TTD, KEGG DRUG, DrugBank, SIDER^{Footnote 18}, and DID	N/A	Case study (Finding drug–disease pairs)	• The proposed data model that was used for data integration can be improved by using formal domain ontology toward better conceptualizing the domain.
[67]	Prediction of adverse drug reactions	Direct construction from structural databases		Schema-free	DrugBank database and SIDER database	#n: 12,473 #e:154,239	P, R, F1, AUC, and a case study on Drug-induced liver injury	• The KG skips information of drugs and protein target, • The scope of information perceived by entities can be enlarged by using longer path in the KG as the input of Word2Vec model.
[66]	Prediction of adverse drug reactions	Direct construction from structural databases		Schema-free	DrugBank, SIDER	#n: 5,828 #e: 70,382	AUC and case study(Validation in EHRs and Eudravigilance)	• No clear discussion on KG construction approach, • Insufficient discussion on the methodology followed in the ML benchmark comparison.
[68]	Discovery of adverse drug reactions	cTAKES^{Footnote 19}	naive Bayesian model	Schema-based	MEDLINE	#n: 9,699 #e: 139,254	co-occurrence analysis and Case study (Osimertinib)	• The computed drug-biomarker groupings cannot differentiate between a drug-treatment relationship, • The study lacks the attention to drug-drug interaction, • lack of rationale on using the entity extraction method
[75]	Drug action	Automatically using rule-based approach		Schema-free	Medical papers	#n: 40,963 #e: 57,865	R, and accuracy	• Lack of verification to the textual prio KG construction. • Limited comparison with currently exiting similar KGs.

Back to article page