Skip to main content

Table 2 Review of Big data analytics in the industrial and healthcare sector

From: A systematic review on big data applications and scope for industrial processing and healthcare sectors

Authors

Methodologies

Features

Challenges

• A. L. Heureux and G. S. Member [7]

• Machine learning (ML) mechanism for Big data

• Data analytics stages

• Data manipulation techniques PCA, dimensionality reduction

• Manipulation for Big data

• Processing manipulation

• Data manipulation

• Algorithm manipulation

• Suitable for decision making

• Processing performances in a large volume of data

• Dirty and noisy data in the varied nature of Big data

• Real-time processing in velocity/speedy data generation

• Data uncertainty in case of veracity behavior of data

• S. R. Sukumar,R. Natarajan, and R. K. Ferrell [19]

• Automation of data processing technologies

• Healthcare analytical methods

• Sources of errors discussed

• Data quality assurance

• Data quality issues

• Automation in data handling, data processing, and data storage

• Data quality rule engines

• Customized software for data quality evaluation

• García et al. [27]

• Data preprocessing techniques

• TF-IDF (Term Frequency-Inverse Document Frequency)

• Discretization and Normalization

• The connection between Big data and data processing

• Big data framework

• New technologies

• Scaling data preprocessing techniques (missing value imputation, noise treatment)

• Big data learning paradigm (semi-supervised, data stream, real-time processing)

• M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani

[43]

• IoT Big data analytics

• IoT streaming data analytics

• Deep learning (DL) techniques for IoT data analytics

• IoT applications, DL approach

• IoT characteristics

• Summary of the DL model

• Framework for designing deep learning (DL)

• Lack of precise deep learning (DL) method

• Training data overload

• Specific hardware required for a defined system

• T. Steckel et al., [23]

• Data acquisition for different industries

• Anomaly detection (PCA-based, distance-based approach)

• Regression-based anomaly detection

• Outliers

• Self-organizing map

• Application cases for the chemical industry, process control for an agricultural harvester

• Failure detection

• Anomalies detection

• Optimization process

• Data acquisition has a problem in-

• Data integration

• Heterogeneous manufacturing process

• Time synchronization

• P. Matta and A. Tayal [55]

• AHP (Analytical hierarchy process) and PCA (principal component analysis) based methodology

• Correlation analysis

• Clustering

• Big data for supplier selection problems in industries

• Supplier evaluation for a manufacturing firm

• Lack of optimization model for industries

• Highly un-structured

• Time-consuming

• S. Akter, S. F. Wamba, A. Gunasekaran, R. Dubey, and S. J. Childe [56]

• Big data analytical capability model (BDAC)

• BDA talent capability (BDATLC)

• BDA technology capability (BDATEC)

• BDA management capability (BDAMAC)

• Resource-Based Theory (RBT)

• BDAC-FPER (firm performance) relationship

• BDAC and its three primary dimensions (technology, management, and talent capability) and 11 sub-dimensions

• Data collection

• Lack of business process agility

• Process-oriented dynamic capabilities

• Analytics climate

• Analytics privacy

• Fernández, S. del Río, N. V. Chawla, and F. Herrera,

[57]

• Data preprocessing

• Cost-sensitive learning

• Big data classification using MapReduce

• Standard preprocessing techniques

• Analysis of preprocessing techniques

• Imbalanced classification in big data problem

• Design of novel algorithm for a different level of the partitioning of classification

• Imbalance ratio between classes

• A. Waldherr, D. Maier, P. Miltner, and E. Günther [58]

• Filtering strategies

• Classifying documents with a machine-learning algorithm

• Extraction of the core network

• Web discourse in the era of Big data

• Crawled webpage of USA and German

• Cleaning and reducing data during online discourses

• Noise problem

• Giovanni Azzone

[59]

• Completeness,

• Timeliness,

• Personalized policies,

• Efficiency and effectiveness

• Public policies

• Data accessing and arithmetic computing procedures

• M. Habib, C. Sun, and L. Assad [31]

• Data generation and acquisition

• Relationship between Cloud Computing and Big data

• Relationship between IoT and Big data

• Datacenter

• Relationship between HADOOP and Big data.

• Big data storage,

• Big data analysis and

• Big data applications

• Data representations,

• Redundancy reduction

• Data compression,

• Data life cycle management,

• Analytical mechanism

• Data confidentiality

• Energy management

• Expendability and scalability etc.

• X. Chu, I. F. Ilyas, S. Krishnan, and J. Wang [60]

• Rule-based data cleaning technique

• Data cleaning from a statistical perspective

• Missing values

• Error detection

• Error repairing

• Business intelligence

• Automation with tools

• Scalability

• User engagement

• Semi-structured and unstructured data

• New applications for streaming data

• Privacy and security concerns

• V. N. Gudivada, A. Apon, and J. Ding [32]

• Data quality life cycle

• Data quality analytics

• TIA Process (Transformation, Integration, Aggregation)

• Nature of data quality issues in the context of Big data

• Data governance-driven framework

• Data quality dimension

• Implementation of data quality lifecycle framework

• A new algorithm is required to identify the original data element and source.

• X. Deng, P. Jiang, X. Peng, and C. Mi [33]

• Support tensor data description

• Standard support vector data description (SSVDD)

• Kernel support tensor data description (KSTDD)

• Outlier detection algorithm

• Reduce high dimensional data

• It dealt with only tensor data directly.

• D. Guan et al. [61]

• Novel noise filtering mechanism called Enhanced soft majority voting by exploiting unlabeled data (ESMVU)

• Multiple soft majority voting methods (MSMV)

• Effective use of unlabeled data

• Improve noise filtering performance.

• Noise handling

• Worked for mislabeled data filtering

• Noise correction & comparison concerning Big data and its heterogeneous type

• D. Henry [62]

• Data cleaning methodology proposed on hashtags context (time, artificial and recent context)

• More general data cleaning tasks and preprocessing

• Suitable for parallel computing

• It is required for text mining tasks such as text classification, sentiment analysis, opinion mining, or text clustering

• It required work on a large no. of tweets.

• K. Kenda and D. Mladenić [63]

• Data cleaning algorithm

• Kalman Filter

• Streaming sensors data platform with data cleaning

• Meta classification method of prediction

• Lower noise ratio

• Improvement required of Kalman filter parameter fine-tuning procedure,

• Cleaning behavior

• Usability of the algorithm

• Fail to deal with a large number of sensors data

• C. S. Kruse, R. Goswamy, Y. Raval, and S. Marawi,

[64]

• Big data medicine

• Big data in healthcare

• EHR (Electronic health record)

• Data collection through the monitoring system

• Clinical documentation

• Data aggregation

• Unstructured data analyzing

• Priority utilization of data

• Data protection

• M. Yang, M. Kiang, and W. Shang [65]

• Automated adverse drug reaction (ADR) related posts filtering mechanism

• Supervised classification approach

• Framework for tackling the problem of filtering big data from social media in general and

• Consumer adverse drug reaction (ADR) messages identification in a specific application.

• Not suitable for unsupervised data

• consumer ADR

• Related messages are usually sparse and highly distributed

• Reduction of high dimensionality required

• H. Asri, H., H. Al Moatassime, and T. Noel [66]

• Survey paper

• Different product details including MCOT, HRS-I

• e-HPA

• ELCR

• Realty mining

• Healthcare and big data

• Realty mining and healthcare

• Big data and realty mining

• Impact of big data analytics in the healthcare industry (right living, proper care, right provider, promising innovation, the correct value, etc.)

• The Source of data acquisition is not synchronized

• Data quality is an issue that is in the form of unstructured, nonstandard, improper

• Lack of data scientists, resource availabilities, data analytics tools,

• Constraints in data accessibility

• J. Wang, W. Zhang, Y. Shi, S. Duan, and J. Liu [67]

• Industrial data ingestion-integration

• Repository

• Data management

• Industrial data analysis

• Industrial data governance

• Highly distributed data source (large-scale devices data)

• Production life cycle data

• Business operation data

• Manufacturing value chain

• Collaboration data

• Production efficiency

• Production quality

• Minimize energy consumption

• Cost minimization

• Y. Hu, K. Duan, Y. Zhang, M. S. Hossain, S. M. Mizanur Rahman, and A. Alelaiwi

[68]

• Simultaneously Aided Diagnosis Mode (SADM) framework

• Data preprocessing (data extraction, data cleaning, eliminating redundancy)

• Machine learning algorithm (SVM)

• Focused on a disease like heart, diabetes, and cancer database of healthcare

• Performance measurement with accuracy, precision, recall, and F1-measure

• Diagnosis efficiency improvement required

• Deep learning (DL) required for diseases risk assessment

• Kaur, Pavleen, Kumar, Ravinder Kumar, Munish [34]

• IoT-based disease predictive system for heart, diabetes, and breast cancer patients

• Random forest machine learning algorithm (RFML) technology used

• Dataset used of

• heart, breast cancer, diabetes, thyroid, liver disorder, etc.

• Results compared with k-NN, Linear SVM, Decision tree, MLP, random forest

• Accuracy can be increased further on an extensive database.

• Data security is a big concern in IoT-based system

• It can be applied to other applications like weather, forecasting, etc.

• S. Oueida, M. Aloqaily, and S. Ionescu

[35]

• Maximum Reward Algorithm (MRA) - An optimization-based algorithm

• Enhances healthcare resources

• Multimedia technologies are a booster for healthcare services

• It improves efficiency and reliability from 50.1–77.2%

• Integration of multimedia technologies with mobile health care services and facilities is complex in some context

• The heterogeneous network exists for multimedia technologies