An analytical study of information extraction from unstructured and multidimensional big data

Adnan, Kiran; Akbar, Rehan

doi:10.1186/s40537-019-0254-8

Journal of Big Data

Table 6 Entity and relation pair extraction

From: An analytical study of information extraction from unstructured and multidimensional big data

	Technique	Purpose	Domain	Dataset	Results
	Technique	Purpose	Domain	Dataset	P%	R%	F%
[25]	Transfer learning for domain dependent clustering	To adapt the world knowledge to the domain-dependent tasks by using semantic parsing and semantic filtering	News text	20 Newsgroups, RCV1	Case studies conducted to prove that conceptualization based semantic filter can produce more accurate indirect supervision
[26]	Distant supervised learning (deep learning)	To overcome the limitations of text mining methods such as clustering or rule-based etc. in keyword and information extraction with technology dependency graph	Scientific literature	473,935 articles, labeled 38 relation instances from 20 articles and expanded to 573 instances by bootstrapping	Case study: Technology driven graph to analyze the technology architecture of DSSC
[22]	MapReduce + semantic methods (attribute based, isA based and class based) + logistic regression	To overcome the long tail challenge using Sparse IE approach To deal with scalability and effectiveness	Web pages	1.68 B web pages	Many entity pair identified and classified as good and bad pairs. Results of each entity pair with good and bad recall, precision abd F-measure is presented
[27]	Supervised Kernel methods	To extract morpho-syntactic information from mined text To deal with challenges of data prioritization and curation	Biomedical	EU-ADR	Proposed method using morpho syntactic and dependency information outperform to identify entity relationship
[28]	Use of declarative rules in contextual exploration	Automatic detection and extraction of meaning from unstructured web using RDF WordNet, DBpedia, etc. To bypass the limitation of lack of annotated data semantically and automatically usable using LOD	Free text	Large text corpuses provided by the labex OBVIL and the BNF (National Library of France)	EC3 software is implemented and shown considerable contribution in detecting real meaning of text
[29]	CRF and dictionary for NER, word clustering through Unsupervised training	Chemistry aware NLP pipeline with tokenization, POS tagging, NRE and phrase parsing To populate chemical databases with minimal time, effort and expense	Scientific documents	50 open access chemistry articles	89.1	86.6	87.8
[24]	Hadoop (MapReduce)	To identify many to many relationships with less training data	Free text	100 GB-sized corpus, baike.baidu.com: big encyclopedia having 700M entries	Proposed Snowball++ achieved higher positive pairs as compared to snowball and PROSPERA
[23]	CNN (weakly supervised)	To obtain high-precision data and automatically generate annotated training sample set	Medical	Experiment selected seven medical sites, generate a total of 20,000 labeled samples at last and five categories of directional relations	91.87	91.58	89.08

Back to article page