Healthcare knowledge graph construction: A systematic review of the state-of-the-art, open issues, and opportunities

Abu-Salih, Bilal; AL-Qurishi, Muhammad; Alweshah, Mohammed; AL-Smadi, Mohammad; Alfayez, Reem; Saadeh, Heba

doi:10.1186/s40537-023-00774-9

Journal of Big Data

Table 3 A summary of KG construction approaches for the biomedical domain

From: Healthcare knowledge graph construction: A systematic review of the state-of-the-art, open issues, and opportunities

Ref.	KG Specific Functionality	Knowledge Extraction Techniques		Type of KB	KG Resource(s)	KG Stats	Evaluation Measure(s)	Shortcoming(s)
Ref.	KG Specific Functionality	Entity-level	Relation-Level					Shortcoming(s)
[100]	Generic biomedicine	Manual integration and mapping of entities and relationships		Schema- base	OMIM, DrugBank, PharmGKB, Therapeutic Target Database], SIDER, and HumanNet	#n: 7,603 #e: 500,958	Hits@N and Downstream tasks	• The quality and integrity of the metadata cannot be fully assured. • The final version of the constructed graph does not have large-scale of entities compared with state-of-the-art KGs. • No discussion is provided on the adopted ontology.
[101]	Generic biomedicine	PubTator^{Footnote 36} and manual annotation (EBC)	Stanford Dependency Parser^{Footnote 37}	Schema-free	Biomedical literature (Medline abstracts^{Footnote 38})	#n: N/A #e: 2,236,307	Benchmark comparison	• Heavily dependent on the co-occurrence of paths to map scarcer paths to themes, • Lack of handling complex relations • There is a potential of a parser error,
[102]	Translational biomedicine	Manually and automatically using Snakemake^{Footnote 39}		Schema- base	70 knowledge sources including SemMedDB, ChEMBL, etc.	#n: 6.4 m #e: 39.3 m	Benchmark comparison	• The automation process to construct the KG was not detailed. • The comparison with other KGs is not well discussed nor formulated.
[103]	Biomedical Causal Discovery	Manual and rule-based approach		Schema-free	PubMed	#n: N/A #e: N/A	Accuracy	The paper failed to extract implicit causality, The process to identify concepts and relationships between concepts is not detailed.
[82]	Marine Chinese medicine	Manual mapping between the ontology and the KG		Schema- base	Medical literature	#n: N/A #e: N/A	NA	• The paper inadequately described the construction and evaluation of the proposed KG.
[104]	Generic biomedicine	BioDBLinker	Automatic mapping	Schema- free	UniProt^{Footnote 40}, REACTOME^{Footnote 41}, KEGG^{Footnote 42},DrugBank, SIDER, and d Human Protein Atlas (HPA)^{Footnote 43}.	#n: N/A #e: N/A	Benchmark comparison	• Suffers from sparsity of data, • Train-test data leakage in case used without careful review
[105]	Intestinal cells	Manually based on the conceptual model		Schema- base	PubMed	#n: 2443 #e: 160,253	Case study	• Poor entity and relation extraction approaches. • Data source is static and limited to medical literature, yet medical facts of intestinal cells can be obtained from future experiments.
[112]	Microbiology	NER and NLP techniques		Schema- base	KG Hub – COVID19^{Footnote 44}	#n: 266,000 #e: 432,000	N/A	• Poor discussion on mechanisms followed to construct and validate the KG
[113]	Gut microbiota	Manual annotation and mapping		Schema- base	Google Scholar and PubMed, UMLS, MeSH, SNOMED CT, and KEGG	#f: 31,268,998	Case studies	• Poor extraction of entities and relations. • The correctness and completeness of extracted relations limit the semantic search’s precision and reliability.
[114]	Microbe-Disease Associations	Kindred entity and relation classifier^{Footnote 45}		Schema- free	Wikidata, UMLS, NCBI	#n: 9,832 #e: 21,905	Hits@N	• KG can be expanded by means of a bacterial attribute mining tool, • Lacks a discussion on interactions between bacteria and antibiotics or viruses.
[115]	Coronavirus	Manual extraction and mapping		Schema- free	Analytical Graph (AG) and CORD-19^{Footnote 46}	#n: 588,820 #e: N/A	Case study	• Limited data sources, • Static KG
[116]	Coronavirus	BioBERT		Schema- free	PubMed and CORD-19	#n: N/A #e: N/A	P, R, and F1-score	• KG can be expanded to other bio-medical datasets. • Further biomedical NLP models for NER, e.g., blueBERT can be attempted to verify the validy of the extracted knowledge.

Back to article page