Skip to main content

Table 7 Event extraction and salient fact extraction

From: An analytical study of information extraction from unstructured and multidimensional big data

Task Approach Dataset Results Remarks
Term context understanding to deal with homonyms [31] Semi-automated approach be combining automated content analysis and ANN 26,259 research articles from Web of science Proposed solution evaluated with different sparsity parameters. Results showed different effects of different modeling terms on error rate The proposed solution outperformed with manual classification in some instances that could not automatically be classified. Hence, improvement is required to automate the sifting process of homonyms context identification
IE from heterogeneous unstructured big data [32] Unsupervised deep learning (multiple Kernel) 13 different datasets from UCI Machine Learning Repository Performance of the proposed system was better in speed from other competitors and same in accuracy Accuracy of heterogeneous data can be improved with unsupervised learning but advancement in approach is required to handle the dynamicity of such data
Deep semantic IE for big data mining from geoscience data [33] convolutional neural networks (CNN) for classification and TF-IDF for word statistics Multivariate and heterogeneous data of 16,098 PDN, 130 LAN’s classification accuracy of 99.9% and 99.8% at the sentence and paragraph levels, respectively Insufficient comprehensiveness, poor correlation and inconsistent formats are problems of heterogeneous data
Open domain event extraction [34] Schema discovery based on probabilistic generative models i.e. LinkLDA Set of events generated and extracted from Twitter The difference between proposed and related work is, it can handle complex queries and structured data browsing The sparsity of unstructured big data can decrease the performance and scalability of solution. So, these are important factors to investigate the effectiveness of approach
Biomedical Event extraction [35] Syntactic and semantic features to identify event trigger + Phrase Structure Tree BioNLP-ST 2013 The solution was evaluated and shown 52.23% precision, 26.38% recall, and 35.06% F1-score The proposed approach uses ML features that inherits the limitations of the ML feature based techniques