From: An analytical study of information extraction from unstructured and multidimensional big data
 | Technique | Purpose | Domain | Dataset | Results | ||
---|---|---|---|---|---|---|---|
P% | R% | F% | |||||
[25] | Transfer learning for domain dependent clustering | To adapt the world knowledge to the domain-dependent tasks by using semantic parsing and semantic filtering | News text | 20 Newsgroups, RCV1 | Case studies conducted to prove that conceptualization based semantic filter can produce more accurate indirect supervision | ||
[26] | Distant supervised learning (deep learning) | To overcome the limitations of text mining methods such as clustering or rule-based etc. in keyword and information extraction with technology dependency graph | Scientific literature | 473,935 articles, labeled 38 relation instances from 20 articles and expanded to 573 instances by bootstrapping | Case study: Technology driven graph to analyze the technology architecture of DSSC | ||
[22] | MapReduce + semantic methods (attribute based, isA based and class based) + logistic regression | To overcome the long tail challenge using Sparse IE approach To deal with scalability and effectiveness | Web pages | 1.68 B web pages | Many entity pair identified and classified as good and bad pairs. Results of each entity pair with good and bad recall, precision abd F-measure is presented | ||
[27] | Supervised Kernel methods | To extract morpho-syntactic information from mined text To deal with challenges of data prioritization and curation | Biomedical | EU-ADR | Proposed method using morpho syntactic and dependency information outperform to identify entity relationship | ||
[28] | Use of declarative rules in contextual exploration | Automatic detection and extraction of meaning from unstructured web using RDF WordNet, DBpedia, etc. To bypass the limitation of lack of annotated data semantically and automatically usable using LOD | Free text | Large text corpuses provided by the labex OBVIL and the BNF (National Library of France) | EC3 software is implemented and shown considerable contribution in detecting real meaning of text | ||
[29] | CRF and dictionary for NER, word clustering through Unsupervised training | Chemistry aware NLP pipeline with tokenization, POS tagging, NRE and phrase parsing To populate chemical databases with minimal time, effort and expense | Scientific documents | 50 open access chemistry articles | 89.1 | 86.6 | 87.8 |
[24] | Hadoop (MapReduce) | To identify many to many relationships with less training data | Free text | 100 GB-sized corpus, baike.baidu.com: big encyclopedia having 700M entries | Proposed Snowball++ achieved higher positive pairs as compared to snowball and PROSPERA | ||
[23] | CNN (weakly supervised) | To obtain high-precision data and automatically generate annotated training sample set | Medical | Experiment selected seven medical sites, generate a total of 20,000 labeled samples at last and five categories of directional relations | 91.87 | 91.58 | 89.08 |