Skip to main content

Table 6 Entity and relation pair extraction

From: An analytical study of information extraction from unstructured and multidimensional big data

  Technique Purpose Domain Dataset Results
P% R% F%
[25] Transfer learning for domain dependent clustering To adapt the world knowledge to the domain-dependent tasks by using semantic parsing and semantic filtering News text 20 Newsgroups, RCV1 Case studies conducted to prove that conceptualization based semantic filter can produce more accurate indirect supervision
[26] Distant supervised learning (deep learning) To overcome the limitations of text mining methods such as clustering or rule-based etc. in keyword and information extraction with technology dependency graph Scientific literature 473,935 articles, labeled 38 relation instances from 20 articles and expanded to 573 instances by bootstrapping Case study: Technology driven graph to analyze the technology architecture of DSSC
[22] MapReduce + semantic methods (attribute based, isA based and class based) + logistic regression To overcome the long tail challenge using Sparse IE approach
To deal with scalability and effectiveness
Web pages 1.68 B web pages Many entity pair identified and classified as good and bad pairs. Results of each entity pair with good and bad recall, precision abd F-measure is presented
[27] Supervised Kernel methods To extract morpho-syntactic information from mined text
To deal with challenges of data prioritization and curation
Biomedical EU-ADR Proposed method using morpho syntactic and dependency information outperform to identify entity relationship
[28] Use of declarative rules in contextual exploration Automatic detection and extraction of meaning from unstructured web using RDF WordNet, DBpedia, etc.
To bypass the limitation of lack of annotated data semantically and automatically usable using LOD
Free text Large text corpuses provided by the labex OBVIL and the BNF (National Library of France) EC3 software is implemented and shown considerable contribution in detecting real meaning of text
[29] CRF and dictionary for NER, word clustering through Unsupervised training Chemistry aware NLP pipeline with tokenization, POS tagging, NRE and phrase parsing
To populate chemical databases with minimal time, effort and expense
Scientific documents 50 open access chemistry articles 89.1 86.6 87.8
[24] Hadoop (MapReduce) To identify many to many relationships with less training data Free text 100 GB-sized corpus, big encyclopedia having 700M entries Proposed Snowball++ achieved higher positive pairs as compared to snowball and PROSPERA
[23] CNN (weakly supervised) To obtain high-precision data and automatically generate annotated training sample set Medical Experiment selected seven medical sites, generate a total of 20,000 labeled samples at last and five categories of directional relations 91.87 91.58 89.08