A systematic review on big data applications and scope for industrial processing and healthcare sectors

Rahul, Kumar; Banyal, Rohitash Kumar; Arora, Neeraj

doi:10.1186/s40537-023-00808-2

Journal of Big Data

Table 2 Review of Big data analytics in the industrial and healthcare sector

From: A systematic review on big data applications and scope for industrial processing and healthcare sectors

Authors	Methodologies	Features	Challenges
• A. L. Heureux and G. S. Member [7]	• Machine learning (ML) mechanism for Big data • Data analytics stages • Data manipulation techniques PCA, dimensionality reduction	• Manipulation for Big data • Processing manipulation • Data manipulation • Algorithm manipulation • Suitable for decision making	• Processing performances in a large volume of data • Dirty and noisy data in the varied nature of Big data • Real-time processing in velocity/speedy data generation • Data uncertainty in case of veracity behavior of data
• S. R. Sukumar,R. Natarajan, and R. K. Ferrell [19]	• Automation of data processing technologies • Healthcare analytical methods	• Sources of errors discussed • Data quality assurance	• Data quality issues • Automation in data handling, data processing, and data storage • Data quality rule engines • Customized software for data quality evaluation
• García et al. [27]	• Data preprocessing techniques • TF-IDF (Term Frequency-Inverse Document Frequency) • Discretization and Normalization	• The connection between Big data and data processing • Big data framework	• New technologies • Scaling data preprocessing techniques (missing value imputation, noise treatment) • Big data learning paradigm (semi-supervised, data stream, real-time processing)
• M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani [43]	• IoT Big data analytics • IoT streaming data analytics • Deep learning (DL) techniques for IoT data analytics	• IoT applications, DL approach • IoT characteristics • Summary of the DL model • Framework for designing deep learning (DL)	• Lack of precise deep learning (DL) method • Training data overload • Specific hardware required for a defined system
• T. Steckel et al., [23]	• Data acquisition for different industries • Anomaly detection (PCA-based, distance-based approach) • Regression-based anomaly detection • Outliers • Self-organizing map	• Application cases for the chemical industry, process control for an agricultural harvester • Failure detection • Anomalies detection • Optimization process	• Data acquisition has a problem in- • Data integration • Heterogeneous manufacturing process • Time synchronization
• P. Matta and A. Tayal [55]	• AHP (Analytical hierarchy process) and PCA (principal component analysis) based methodology • Correlation analysis • Clustering	• Big data for supplier selection problems in industries • Supplier evaluation for a manufacturing firm	• Lack of optimization model for industries • Highly un-structured • Time-consuming
• S. Akter, S. F. Wamba, A. Gunasekaran, R. Dubey, and S. J. Childe [56]	• Big data analytical capability model (BDAC) • BDA talent capability (BDATLC) • BDA technology capability (BDATEC) • BDA management capability (BDAMAC) • Resource-Based Theory (RBT)	• BDAC-FPER (firm performance) relationship • BDAC and its three primary dimensions (technology, management, and talent capability) and 11 sub-dimensions • Data collection	• Lack of business process agility • Process-oriented dynamic capabilities • Analytics climate • Analytics privacy
• Fernández, S. del Río, N. V. Chawla, and F. Herrera, [57]	• Data preprocessing • Cost-sensitive learning • Big data classification using MapReduce	• Standard preprocessing techniques • Analysis of preprocessing techniques	• Imbalanced classification in big data problem • Design of novel algorithm for a different level of the partitioning of classification • Imbalance ratio between classes
• A. Waldherr, D. Maier, P. Miltner, and E. Günther [58]	• Filtering strategies • Classifying documents with a machine-learning algorithm • Extraction of the core network	• Web discourse in the era of Big data • Crawled webpage of USA and German	• Cleaning and reducing data during online discourses • Noise problem
• Giovanni Azzone [59]	• Completeness, • Timeliness, • Personalized policies, • Efficiency and effectiveness	• Public policies	• Data accessing and arithmetic computing procedures
• M. Habib, C. Sun, and L. Assad [31]	• Data generation and acquisition • Relationship between Cloud Computing and Big data • Relationship between IoT and Big data • Datacenter • Relationship between HADOOP and Big data.	• Big data storage, • Big data analysis and • Big data applications	• Data representations, • Redundancy reduction • Data compression, • Data life cycle management, • Analytical mechanism • Data confidentiality • Energy management • Expendability and scalability etc.
• X. Chu, I. F. Ilyas, S. Krishnan, and J. Wang [60]	• Rule-based data cleaning technique • Data cleaning from a statistical perspective • Missing values	• Error detection • Error repairing • Business intelligence • Automation with tools	• Scalability • User engagement • Semi-structured and unstructured data • New applications for streaming data • Privacy and security concerns
• V. N. Gudivada, A. Apon, and J. Ding [32]	• Data quality life cycle • Data quality analytics • TIA Process (Transformation, Integration, Aggregation)	• Nature of data quality issues in the context of Big data • Data governance-driven framework • Data quality dimension	• Implementation of data quality lifecycle framework • A new algorithm is required to identify the original data element and source.
• X. Deng, P. Jiang, X. Peng, and C. Mi [33]	• Support tensor data description • Standard support vector data description (SSVDD) • Kernel support tensor data description (KSTDD) • Outlier detection algorithm	• Reduce high dimensional data	• It dealt with only tensor data directly.
• D. Guan et al. [61]	• Novel noise filtering mechanism called Enhanced soft majority voting by exploiting unlabeled data (ESMVU) • Multiple soft majority voting methods (MSMV)	• Effective use of unlabeled data • Improve noise filtering performance. • Noise handling • Worked for mislabeled data filtering	• Noise correction & comparison concerning Big data and its heterogeneous type
• D. Henry [62]	• Data cleaning methodology proposed on hashtags context (time, artificial and recent context)	• More general data cleaning tasks and preprocessing • Suitable for parallel computing	• It is required for text mining tasks such as text classification, sentiment analysis, opinion mining, or text clustering • It required work on a large no. of tweets.
• K. Kenda and D. Mladenić [63]	• Data cleaning algorithm • Kalman Filter • Streaming sensors data platform with data cleaning	• Meta classification method of prediction • Lower noise ratio	• Improvement required of Kalman filter parameter fine-tuning procedure, • Cleaning behavior • Usability of the algorithm • Fail to deal with a large number of sensors data
• C. S. Kruse, R. Goswamy, Y. Raval, and S. Marawi, [64]	• Big data medicine • Big data in healthcare • EHR (Electronic health record)	• Data collection through the monitoring system • Clinical documentation	• Data aggregation • Unstructured data analyzing • Priority utilization of data • Data protection
• M. Yang, M. Kiang, and W. Shang [65]	• Automated adverse drug reaction (ADR) related posts filtering mechanism • Supervised classification approach	• Framework for tackling the problem of filtering big data from social media in general and • Consumer adverse drug reaction (ADR) messages identification in a specific application.	• Not suitable for unsupervised data • consumer ADR • Related messages are usually sparse and highly distributed • Reduction of high dimensionality required
• H. Asri, H., H. Al Moatassime, and T. Noel [66]	• Survey paper • Different product details including MCOT, HRS-I • e-HPA • ELCR • Realty mining	• Healthcare and big data • Realty mining and healthcare • Big data and realty mining • Impact of big data analytics in the healthcare industry (right living, proper care, right provider, promising innovation, the correct value, etc.)	• The Source of data acquisition is not synchronized • Data quality is an issue that is in the form of unstructured, nonstandard, improper • Lack of data scientists, resource availabilities, data analytics tools, • Constraints in data accessibility
• J. Wang, W. Zhang, Y. Shi, S. Duan, and J. Liu [67]	• Industrial data ingestion-integration • Repository • Data management • Industrial data analysis • Industrial data governance	• Highly distributed data source (large-scale devices data) • Production life cycle data • Business operation data • Manufacturing value chain • Collaboration data	• Production efficiency • Production quality • Minimize energy consumption • Cost minimization
• Y. Hu, K. Duan, Y. Zhang, M. S. Hossain, S. M. Mizanur Rahman, and A. Alelaiwi [68]	• Simultaneously Aided Diagnosis Mode (SADM) framework • Data preprocessing (data extraction, data cleaning, eliminating redundancy) • Machine learning algorithm (SVM)	• Focused on a disease like heart, diabetes, and cancer database of healthcare • Performance measurement with accuracy, precision, recall, and F1-measure	• Diagnosis efficiency improvement required • Deep learning (DL) required for diseases risk assessment
• Kaur, Pavleen, Kumar, Ravinder Kumar, Munish [34]	• IoT-based disease predictive system for heart, diabetes, and breast cancer patients • Random forest machine learning algorithm (RFML) technology used	• Dataset used of • heart, breast cancer, diabetes, thyroid, liver disorder, etc. • Results compared with k-NN, Linear SVM, Decision tree, MLP, random forest	• Accuracy can be increased further on an extensive database. • Data security is a big concern in IoT-based system • It can be applied to other applications like weather, forecasting, etc.
• S. Oueida, M. Aloqaily, and S. Ionescu [35]	• Maximum Reward Algorithm (MRA) - An optimization-based algorithm	• Enhances healthcare resources • Multimedia technologies are a booster for healthcare services • It improves efficiency and reliability from 50.1–77.2%	• Integration of multimedia technologies with mobile health care services and facilities is complex in some context • The heterogeneous network exists for multimedia technologies

Back to article page