Skip to main content

Table 6 Entity and relation pair extraction

From: An analytical study of information extraction from unstructured and multidimensional big data

 

Technique

Purpose

Domain

Dataset

Results

P%

R%

F%

[25]

Transfer learning for domain dependent clustering

To adapt the world knowledge to the domain-dependent tasks by using semantic parsing and semantic filtering

News text

20 Newsgroups, RCV1

Case studies conducted to prove that conceptualization based semantic filter can produce more accurate indirect supervision

[26]

Distant supervised learning (deep learning)

To overcome the limitations of text mining methods such as clustering or rule-based etc. in keyword and information extraction with technology dependency graph

Scientific literature

473,935 articles, labeled 38 relation instances from 20 articles and expanded to 573 instances by bootstrapping

Case study: Technology driven graph to analyze the technology architecture of DSSC

[22]

MapReduce + semantic methods (attribute based, isA based and class based) + logistic regression

To overcome the long tail challenge using Sparse IE approach

To deal with scalability and effectiveness

Web pages

1.68 B web pages

Many entity pair identified and classified as good and bad pairs. Results of each entity pair with good and bad recall, precision abd F-measure is presented

[27]

Supervised Kernel methods

To extract morpho-syntactic information from mined text

To deal with challenges of data prioritization and curation

Biomedical

EU-ADR

Proposed method using morpho syntactic and dependency information outperform to identify entity relationship

[28]

Use of declarative rules in contextual exploration

Automatic detection and extraction of meaning from unstructured web using RDF WordNet, DBpedia, etc.

To bypass the limitation of lack of annotated data semantically and automatically usable using LOD

Free text

Large text corpuses provided by the labex OBVIL and the BNF (National Library of France)

EC3 software is implemented and shown considerable contribution in detecting real meaning of text

[29]

CRF and dictionary for NER, word clustering through Unsupervised training

Chemistry aware NLP pipeline with tokenization, POS tagging, NRE and phrase parsing

To populate chemical databases with minimal time, effort and expense

Scientific documents

50 open access chemistry articles

89.1

86.6

87.8

[24]

Hadoop (MapReduce)

To identify many to many relationships with less training data

Free text

100 GB-sized corpus, baike.baidu.com: big encyclopedia having 700M entries

Proposed Snowball++ achieved higher positive pairs as compared to snowball and PROSPERA

[23]

CNN (weakly supervised)

To obtain high-precision data and automatically generate annotated training sample set

Medical

Experiment selected seven medical sites, generate a total of 20,000 labeled samples at last and five categories of directional relations

91.87

91.58

89.08