Authors | Methodologies | Features | Challenges |
---|---|---|---|
• A. L. Heureux and G. S. Member [7] | • Machine learning (ML) mechanism for Big data • Data analytics stages • Data manipulation techniques PCA, dimensionality reduction | • Manipulation for Big data • Processing manipulation • Data manipulation • Algorithm manipulation • Suitable for decision making | • Processing performances in a large volume of data • Dirty and noisy data in the varied nature of Big data • Real-time processing in velocity/speedy data generation • Data uncertainty in case of veracity behavior of data |
• S. R. Sukumar,R. Natarajan, and R. K. Ferrell [19] | • Automation of data processing technologies • Healthcare analytical methods | • Sources of errors discussed • Data quality assurance | • Data quality issues • Automation in data handling, data processing, and data storage • Data quality rule engines • Customized software for data quality evaluation |
• García et al. [27] | • Data preprocessing techniques • TF-IDF (Term Frequency-Inverse Document Frequency) • Discretization and Normalization | • The connection between Big data and data processing • Big data framework | • New technologies • Scaling data preprocessing techniques (missing value imputation, noise treatment) • Big data learning paradigm (semi-supervised, data stream, real-time processing) |
• M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani [43] | • IoT Big data analytics • IoT streaming data analytics • Deep learning (DL) techniques for IoT data analytics | • IoT applications, DL approach • IoT characteristics • Summary of the DL model • Framework for designing deep learning (DL) | • Lack of precise deep learning (DL) method • Training data overload • Specific hardware required for a defined system |
• T. Steckel et al., [23] | • Data acquisition for different industries • Anomaly detection (PCA-based, distance-based approach) • Regression-based anomaly detection • Outliers • Self-organizing map | • Application cases for the chemical industry, process control for an agricultural harvester • Failure detection • Anomalies detection • Optimization process | • Data acquisition has a problem in- • Data integration • Heterogeneous manufacturing process • Time synchronization |
• P. Matta and A. Tayal [55] | • AHP (Analytical hierarchy process) and PCA (principal component analysis) based methodology • Correlation analysis • Clustering | • Big data for supplier selection problems in industries • Supplier evaluation for a manufacturing firm | • Lack of optimization model for industries • Highly un-structured • Time-consuming |
• S. Akter, S. F. Wamba, A. Gunasekaran, R. Dubey, and S. J. Childe [56] | • Big data analytical capability model (BDAC) • BDA talent capability (BDATLC) • BDA technology capability (BDATEC) • BDA management capability (BDAMAC) • Resource-Based Theory (RBT) | • BDAC-FPER (firm performance) relationship • BDAC and its three primary dimensions (technology, management, and talent capability) and 11 sub-dimensions • Data collection | • Lack of business process agility • Process-oriented dynamic capabilities • Analytics climate • Analytics privacy |
• Fernández, S. del Río, N. V. Chawla, and F. Herrera, [57] | • Data preprocessing • Cost-sensitive learning • Big data classification using MapReduce | • Standard preprocessing techniques • Analysis of preprocessing techniques | • Imbalanced classification in big data problem • Design of novel algorithm for a different level of the partitioning of classification • Imbalance ratio between classes |
• A. Waldherr, D. Maier, P. Miltner, and E. Günther [58] | • Filtering strategies • Classifying documents with a machine-learning algorithm • Extraction of the core network | • Web discourse in the era of Big data • Crawled webpage of USA and German | • Cleaning and reducing data during online discourses • Noise problem |
• Giovanni Azzone [59] | • Completeness, • Timeliness, • Personalized policies, • Efficiency and effectiveness | • Public policies | • Data accessing and arithmetic computing procedures |
• M. Habib, C. Sun, and L. Assad [31] | • Data generation and acquisition • Relationship between Cloud Computing and Big data • Relationship between IoT and Big data • Datacenter • Relationship between HADOOP and Big data. | • Big data storage, • Big data analysis and • Big data applications | • Data representations, • Redundancy reduction • Data compression, • Data life cycle management, • Analytical mechanism • Data confidentiality • Energy management • Expendability and scalability etc. |
• X. Chu, I. F. Ilyas, S. Krishnan, and J. Wang [60] | • Rule-based data cleaning technique • Data cleaning from a statistical perspective • Missing values | • Error detection • Error repairing • Business intelligence • Automation with tools | • Scalability • User engagement • Semi-structured and unstructured data • New applications for streaming data • Privacy and security concerns |
• V. N. Gudivada, A. Apon, and J. Ding [32] | • Data quality life cycle • Data quality analytics • TIA Process (Transformation, Integration, Aggregation) | • Nature of data quality issues in the context of Big data • Data governance-driven framework • Data quality dimension | • Implementation of data quality lifecycle framework • A new algorithm is required to identify the original data element and source. |
• X. Deng, P. Jiang, X. Peng, and C. Mi [33] | • Support tensor data description • Standard support vector data description (SSVDD) • Kernel support tensor data description (KSTDD) • Outlier detection algorithm | • Reduce high dimensional data | • It dealt with only tensor data directly. |
• D. Guan et al. [61] | • Novel noise filtering mechanism called Enhanced soft majority voting by exploiting unlabeled data (ESMVU) • Multiple soft majority voting methods (MSMV) | • Effective use of unlabeled data • Improve noise filtering performance. • Noise handling • Worked for mislabeled data filtering | • Noise correction & comparison concerning Big data and its heterogeneous type |
• D. Henry [62] | • Data cleaning methodology proposed on hashtags context (time, artificial and recent context) | • More general data cleaning tasks and preprocessing • Suitable for parallel computing | • It is required for text mining tasks such as text classification, sentiment analysis, opinion mining, or text clustering • It required work on a large no. of tweets. |
• K. Kenda and D. Mladenić [63] | • Data cleaning algorithm • Kalman Filter • Streaming sensors data platform with data cleaning | • Meta classification method of prediction • Lower noise ratio | • Improvement required of Kalman filter parameter fine-tuning procedure, • Cleaning behavior • Usability of the algorithm • Fail to deal with a large number of sensors data |
• C. S. Kruse, R. Goswamy, Y. Raval, and S. Marawi, [64] | • Big data medicine • Big data in healthcare • EHR (Electronic health record) | • Data collection through the monitoring system • Clinical documentation | • Data aggregation • Unstructured data analyzing • Priority utilization of data • Data protection |
• M. Yang, M. Kiang, and W. Shang [65] | • Automated adverse drug reaction (ADR) related posts filtering mechanism • Supervised classification approach | • Framework for tackling the problem of filtering big data from social media in general and • Consumer adverse drug reaction (ADR) messages identification in a specific application. | • Not suitable for unsupervised data • consumer ADR • Related messages are usually sparse and highly distributed • Reduction of high dimensionality required |
• H. Asri, H., H. Al Moatassime, and T. Noel [66] | • Survey paper • Different product details including MCOT, HRS-I • e-HPA • ELCR • Realty mining | • Healthcare and big data • Realty mining and healthcare • Big data and realty mining • Impact of big data analytics in the healthcare industry (right living, proper care, right provider, promising innovation, the correct value, etc.) | • The Source of data acquisition is not synchronized • Data quality is an issue that is in the form of unstructured, nonstandard, improper • Lack of data scientists, resource availabilities, data analytics tools, • Constraints in data accessibility |
• J. Wang, W. Zhang, Y. Shi, S. Duan, and J. Liu [67] | • Industrial data ingestion-integration • Repository • Data management • Industrial data analysis • Industrial data governance | • Highly distributed data source (large-scale devices data) • Production life cycle data • Business operation data • Manufacturing value chain • Collaboration data | • Production efficiency • Production quality • Minimize energy consumption • Cost minimization |
• Y. Hu, K. Duan, Y. Zhang, M. S. Hossain, S. M. Mizanur Rahman, and A. Alelaiwi [68] | • Simultaneously Aided Diagnosis Mode (SADM) framework • Data preprocessing (data extraction, data cleaning, eliminating redundancy) • Machine learning algorithm (SVM) | • Focused on a disease like heart, diabetes, and cancer database of healthcare • Performance measurement with accuracy, precision, recall, and F1-measure | • Diagnosis efficiency improvement required • Deep learning (DL) required for diseases risk assessment |
• Kaur, Pavleen, Kumar, Ravinder Kumar, Munish [34] | • IoT-based disease predictive system for heart, diabetes, and breast cancer patients • Random forest machine learning algorithm (RFML) technology used | • Dataset used of • heart, breast cancer, diabetes, thyroid, liver disorder, etc. • Results compared with k-NN, Linear SVM, Decision tree, MLP, random forest | • Accuracy can be increased further on an extensive database. • Data security is a big concern in IoT-based system • It can be applied to other applications like weather, forecasting, etc. |
• S. Oueida, M. Aloqaily, and S. Ionescu [35] | • Maximum Reward Algorithm (MRA) - An optimization-based algorithm | • Enhances healthcare resources • Multimedia technologies are a booster for healthcare services • It improves efficiency and reliability from 50.1–77.2% | • Integration of multimedia technologies with mobile health care services and facilities is complex in some context • The heterogeneous network exists for multimedia technologies |