Skip to main content

Table 2 A Summary of KG construction approaches for diseases and disorders

From: Healthcare knowledge graph construction: A systematic review of the state-of-the-art, open issues, and opportunities

Ref.

KG Specific Functionality

Knowledge Extraction

Techniques

Type of

KB

KG Resource(s)

KG Stats

Evaluation Measure(s)

Shortcoming(s)

Entity-level

Relation-Level

[15]

Cardiovascular domain

LSTM-CR

pattern-based and supervised learning methods

Hybrid

UMLS, EMRs, medical standards, and expert knowledge.

#n: 8,293,284

#e: 32,256,360

The evaluation is conducted in the embedded modules

• The overall framework requires a detailed case study to evaluate the effectiveness of integrating the proposed modules.

[45]

Subarachnoid hemorrhage

Semantic analysis (Ontologies: LBO, IAO, etc.,)

Automatic (Rule-based)

Shema-based

clinical notes and brain angiograms

N/A

P, R, F1, and AC

• Limited discussion on the KG statistics

• The overall framework requires a detailed case study to evaluate the effectiveness of integrating the proposed modules.

[76]

Hepatocellular carcinoma

SemRepFootnote 20, rule-based method,and BioIE(with Att-BiLSTM-CRF)

Schema-based

PubMed, SemMedDB, UpToDate, and Clinical TrialsFootnote 21

#n: 5,028

#e: 13,296

Accuracy

• The KG was not properly evaluated on real-life case study that addresses hepatocellular carcinoma.

• There has been no detailed discussion on the mechanism followed to address the presented disagreements.

[80]

Stroke

DNormFootnote 22, tmChemFootnote 23, GNormPlusFootnote 24, PWTEESFootnote 25

NLTK, PKDE4J, and Bio-BERT

Shema-free

CIDFootnote 26, TCMIDFootnote 27, EU-ADRFootnote 28, ETCMFootnote 29

#n: 46 k

#e: 157 k

P, R, F1

• The constructed KG is limited to Chinese context and hard to replicate and build a more comprehensive map of medical knowledge.

[77]

Diagnosis and treatment of viral hepatitis B

N/A

N/A

Schema-based

EMR (8544 patients in China)

#n: 8,563

#e: 96,896

N/A

• No proper evaluation was conducted.

• No discussion on mechanism followed to construct the KG

[83]

Coronavirus pneumonia-related diseases,

CRF

Bio-BERT

Shema-free

COVID-19 scientific literatures

#n: 10,993

#e: 1,204,234

Specificity, P, R, F1, and AC

• The entity and relation extraction datasets are provided with lack of discussion on the mechanism followed to conduct the experiments on these datasets.

[78]

Identifying disease-gene associations

N/A

N/A

Shema-free

CTD, BioGridFootnote 30, MalaCardsFootnote 31

#n: 103,625

#e: 3,273,215

N/A

• No discussion on the mechanism followed to extract entities and relationships.

• The construction of the KG itself is not evaluated

[79]

Myopia

Prevention

Automatic using python script

Schema-based

Baidu Encyclopedia, Chinese Wikipedia, and professional websites

#n: N/A

#e: N/A

NA

• KG is not described in terms of mechanisms used to extract entities and relationships.

• No proper evaluation is undertaken.

[93]

Depression disorder

XMedlan, Semantic Queries with regular expressions,

Hybrid

PubMed, Clinical Trials5 DrugBankFootnote 32,

DrugBook, Wikipedia, SIDERFootnote 33, and

UMLS

#e: 8,892,722

Use cases

• Lack of proper evaluation,

• insufficient use of other important medical repositories,

• lack of discussion on both the methodology used for knowledge integration and KG statistics.

[91]

Autism spectrum disorder

MinHash lookup/UMLS

Skip-gram and kmeans++

Schema-free

PubMedFootnote 34 (autism spectrum disorder-related article abstracts)

#n: 6827

#e: 16,192

Hit@k

• Extracted relations are coarse-grained.

• Difficult to distinguish semantically related relations,

• Insufficient overall evaluation to the model

[94]

Depression

SemRepFootnote 35,OpenIE and rule-based method

Schema-based

SemMedDB, PubMed

#n: 3,055

#e: 30

Jaccard

• Poor data quality

• The utility of KG was not well-proven

[95]

Metabolism-depression associations

Manual curation and extraction by domain expert (traditional logical rules)

Schema-based

KEGG and scientific literature

#n: 3,724,526

#e: 5,725,821

Case study

• Ineffective inferences due to the incorporated traditional logical rules.

• Automatic extraction methods are required to enrich the functional diversity of the depression KG.