From: An analytical study of information extraction from unstructured and multidimensional big data
Task | Approach | Dataset | Results | Remarks |
---|---|---|---|---|
Term context understanding to deal with homonyms [31] | Semi-automated approach be combining automated content analysis and ANN | 26,259 research articles from Web of science | Proposed solution evaluated with different sparsity parameters. Results showed different effects of different modeling terms on error rate | The proposed solution outperformed with manual classification in some instances that could not automatically be classified. Hence, improvement is required to automate the sifting process of homonyms context identification |
IE from heterogeneous unstructured big data [32] | Unsupervised deep learning (multiple Kernel) | 13 different datasets from UCI Machine Learning Repository | Performance of the proposed system was better in speed from other competitors and same in accuracy | Accuracy of heterogeneous data can be improved with unsupervised learning but advancement in approach is required to handle the dynamicity of such data |
Deep semantic IE for big data mining from geoscience data [33] | convolutional neural networks (CNN) for classification and TF-IDF for word statistics | Multivariate and heterogeneous data of 16,098 PDN, 130 LAN’s | classification accuracy of 99.9% and 99.8% at the sentence and paragraph levels, respectively | Insufficient comprehensiveness, poor correlation and inconsistent formats are problems of heterogeneous data |
Open domain event extraction [34] | Schema discovery based on probabilistic generative models i.e. LinkLDA | Set of events generated and extracted from Twitter | The difference between proposed and related work is, it can handle complex queries and structured data browsing | The sparsity of unstructured big data can decrease the performance and scalability of solution. So, these are important factors to investigate the effectiveness of approach |
Biomedical Event extraction [35] | Syntactic and semantic features to identify event trigger + Phrase Structure Tree | BioNLP-ST 2013 | The solution was evaluated and shown 52.23% precision, 26.38% recall, and 35.06% F1-score | The proposed approach uses ML features that inherits the limitations of the ML feature based techniques |