From: An analytical study of information extraction from unstructured and multidimensional big data
Technique | Purpose | Domain | Dataset | Results | |||
---|---|---|---|---|---|---|---|
P% | R% | F% | |||||
[17] | CRF | To generate relationship knowledge base and annotation Lexical, POS and semantic features used | Chinese encyclopedia | 52,975 web pages | Model trained for 9 attributes, accuracy of global training is higher than the local whereas recall rate was low | ||
[18] | Knowledge oriented CNN with clustering using word filters (WordNet) | To overcome the limitations of RBM and LBM and to reduce the dimensionality | Text | 3 datasets were used: SemEval-2010 task 8 with 10,717 annotated samples, Causal-TimeBank dataset, Event StoryLine dataset | With max clustering achieved 91.34, 76.21, 81.84% macro averaged F1 on SemEval, Casual-TB, Event-SL resp., whereas, with average clustering, it achieved 91.20, 75.43, 81.96% F1 resp. | ||
[19] | Pattern-based method to build info network | To extract large-scale treatment drug-disease pairs and inducement drug-disease pairs | Medical literature for drug repurposing | 27M abstracts and titles from PubMed | Algorithm has shown high precision but low recall | ||
[20] | Weakly supervised method without man-made annotation and SVM to train model | To reduce the manual annotation effort and expand the relation types using semantic and syntactic features | News text | Baidu encyclopedia, 50,000 entry pages of 10 GB size | 83.61 | 82.63 | 83.12 |
Results proved that entity ambiguity, and poor universality affect the results | |||||||
[21] | Multi-class SVM and syntactic model development | To detect semantic relation, model architecture with preprocessing phase to build feature vector using lexical, semantic and syntactic features, training phase and RE phase | News Text | ReACE | 80.18 | 70.89 | 75.25 |