Skip to main content

Table 3 A summary of KG construction approaches for the biomedical domain

From: Healthcare knowledge graph construction: A systematic review of the state-of-the-art, open issues, and opportunities

Ref.

KG Specific Functionality

Knowledge Extraction

Techniques

Type of

KB

KG Resource(s)

KG Stats

Evaluation Measure(s)

Shortcoming(s)

Entity-level

Relation-Level

    

[100]

Generic biomedicine

Manual integration and mapping of entities and relationships

Schema- base

OMIM, DrugBank, PharmGKB, Therapeutic Target Database], SIDER, and HumanNet

#n: 7,603

#e: 500,958

Hits@N and Downstream tasks

• The quality and integrity of the metadata cannot be fully assured.

• The final version of the constructed graph does not have large-scale of entities compared with state-of-the-art KGs.

• No discussion is provided on the adopted ontology.

[101]

Generic biomedicine

PubTatorFootnote 36 and manual annotation (EBC)

Stanford Dependency ParserFootnote 37

Schema-free

Biomedical literature (Medline abstractsFootnote 38)

#n: N/A

#e: 2,236,307

Benchmark comparison

• Heavily dependent on the co-occurrence of paths to map scarcer paths to themes,

• Lack of handling complex relations

• There is a potential of a parser error,

[102]

Translational biomedicine

Manually and automatically using SnakemakeFootnote 39

Schema- base

70 knowledge sources including SemMedDB, ChEMBL, etc.

#n: 6.4 m

#e: 39.3 m

Benchmark comparison

• The automation process to construct the KG was not detailed.

• The comparison with other KGs is not well discussed nor formulated.

[103]

Biomedical Causal Discovery

Manual and rule-based approach

Schema-free

PubMed

#n: N/A

#e: N/A

Accuracy

The paper failed to extract implicit causality,

The process to identify concepts and relationships between concepts is not detailed.

[82]

Marine Chinese medicine

Manual mapping between the ontology and the KG

Schema- base

Medical literature

#n: N/A

#e: N/A

NA

• The paper inadequately described the construction and evaluation of the proposed KG.

[104]

Generic biomedicine

BioDBLinker

Automatic mapping

Schema- free

UniProtFootnote 40, REACTOMEFootnote 41, KEGGFootnote 42,DrugBank, SIDER, and d Human Protein Atlas (HPA)Footnote 43.

#n: N/A

#e: N/A

Benchmark comparison

• Suffers from sparsity of data,

• Train-test data leakage in case used without careful review

[105]

Intestinal cells

Manually based on the conceptual model

Schema- base

PubMed

#n: 2443

#e: 160,253

Case study

• Poor entity and relation extraction approaches.

• Data source is static and limited to medical literature, yet medical facts of intestinal cells can be obtained from future experiments.

[112]

Microbiology

NER and NLP techniques

Schema- base

KG Hub – COVID19Footnote 44

#n: 266,000

#e: 432,000

N/A

• Poor discussion on mechanisms followed to construct and validate the KG

[113]

Gut microbiota

Manual annotation and mapping

Schema- base

Google Scholar and PubMed, UMLS, MeSH, SNOMED CT, and KEGG

#f: 31,268,998

Case studies

• Poor extraction of entities and relations.

• The correctness and completeness of extracted relations limit the semantic search’s precision and reliability.

[114]

Microbe-Disease Associations

Kindred entity and relation classifierFootnote 45

Schema- free

Wikidata, UMLS, NCBI

#n: 9,832

#e: 21,905

Hits@N

• KG can be expanded by means of a bacterial attribute mining tool,

• Lacks a discussion on interactions between bacteria and antibiotics or viruses.

[115]

Coronavirus

Manual extraction and mapping

Schema- free

Analytical Graph (AG) and CORD-19Footnote 46

#n: 588,820

#e: N/A

Case study

• Limited data sources,

• Static KG

[116]

Coronavirus

BioBERT

Schema- free

PubMed and CORD-19

#n: N/A

#e: N/A

P, R, and F1-score

• KG can be expanded to other bio-medical datasets.

• Further biomedical NLP models for NER, e.g., blueBERT can be attempted to verify the validy of the extracted knowledge.