Optical electrocardiogram based heart disease prediction using hybrid deep learning

The diagnosis and categorization of cardiac disease using the low-cost tool electrocardiogram (ECG) becomes an intriguing study topic when contemplating intelligent healthcare applications. An ECG-based cardiac disease prediction system must be automated, accurate, and lightweight. The deep learning methods recently achieved automation and accuracy across multiple domains. However, applying deep learning for automatic ECG-based heart disease classification is a challenging research problem. Because using solely deep learning approaches failed to detect all of the important beats from the input ECG signal, a hybrid strategy is necessary to improve detection efficiency. The main objective of the proposed model is to enhance the ECG-based heart disease classification efficiency using a hybrid feature engineering approach. The proposed model consists of pre-processing, hybrid feature engineering, and classification. Pre-processing an ECG aims to eliminate powerline and baseline interference without disrupting the heartbeat. To efficiently classify data, we design a hybrid approach using a conventional ECG beats extraction algorithm and Convolutional Neural Network (CNN)-based features. For heart disease prediction, the hybrid feature vector is fed successively into the deep learning classifier Long Term Short Memory (LSTM). The results of the simulations show that the proposed model reduces both the number of diagnostic errors and the amount of time spent on each one when compared to the existing methods.


Introduction
Identifying heart diseases from an electrocardiogram (ECG) increasingly necessitates the use of computer-aided diagnosis (CAD) software.Numerous established methods for the immediate detection of cardiac abnormalities have been offered [1].ECG signals are analyzed by auto-correlation work, frequency area features, time-frequency study, and wavelet transformation to spot these abnormalities.Separating abnormal ECG signals and meritoriously organizing them is still in progress [2][3][4], notwithstanding successful patient classification.As an ECG signal may comprise a wide variety of heartbeats or waves, feature extraction from the data is essential.After the ECG signal has been processed, it comprises several different waves that, taken together, largely represent human Page 2 of 13 Golande and Pavankumar Journal of Big Data (2023) 10:139 cardiac issues.These waves include the Q wave, the R wave, the S wave, the P wave, the T wave, and so on.The Q, R, and S waves form the QRS complex, which is concerned with ventricular depolarization.Atrial depolarization is managed by the P wave, whereas ventricular repolarization is the responsibility of the T wave.ECG-based heart disease prediction relies on feature extraction and characterization after pre-processing.The QRS complex controls ventricular depolarization.P waves depolarize atriums, whereas T waves repolarize ventricles.Time-frequency area to display ECG frequency and time components concurrently, or frequency space to compare QRS-complex power spectra between normal and arrhythmia waveforms.In the second phase, SVM, NB, ANN, KNN, and others were utilized to categorize.Features extraction is harder than categorization since QRS complexes are hard to get.Waveform, Hermite, wavelet, and statistical characteristics are feature removal techniques [5,6].Decision trees, support vector machines, key-value networks, linear discriminants, and artificial neural networks may organize recovered characteristics [7,8].Most automated ECG identification techniques use pattern-matching to describe the ECG signal as random patterns [9,10].They need advanced feature removal and high sample numbers, making them laborious.Many operations must use fewer characteristics and a lower sample rate to adopt real-time in the hospital affordably.Perfect cardiac disease categorization has scalability, computational complexity, and efficiency challenges [11].
The QRS complex identification system had proposed in [12] based on the estimation of the time-dependent entropy of an ECG signal.To improve the accuracy of QRS detection, entropy was computed at various temporal resolutions.Another unique technique for detecting QRS complexes using deterministic finite automata is given in [13].They extracted QRS complexes and interpreted normalized ECG signals using regular grammar.The hybrid filtering technique is used in [14] to improve the accuracy of QRS beat detection.They created derivative and maximum mean minimum hybrid filtering methods.ICA pre-processed raw ECG data before chaos analysis recovered the QRS complex [15].Some cardiovascular disease categorization approaches use manmade features.DWT and Principal Component Analysis are used in automated cardiac disease diagnosis [16].(PCA).DWT extracted beat characteristics and PCA reduced dimensionality in their dynamic segmentation strategy to account for HRV.In [17], authors have reported another DWTbased ECG wave identification and feature extraction approach.The Probabilistic Neural Network sorted (PNN).Adaptive transformations and rational functions may extract individual heartbeats from an ECG [18].Raw ECG data yielded T, P, and QRS waves.
Image verification, object classification, object identification, voice recognition, and action recognition are all common applications of deep learning algorithms.It allows such processes to be fully automated and improved upon in terms of precision.Unsupervised learning and recognition systems are the focus of current research, and deep learning techniques are being explored to construct a multistage architecture.CNN's ECG-based heart disease diagnosis is one example of how deep learning techniques have improved accuracy.By performing automated feature learning and extraction on certain ECG datasets, these approaches have increased accuracy; nonetheless, they suffer from issues with scalability and resilience.A cardiologist's ability to correctly categorize cardiac disease using an ECG signal's beat classification is crucial for guiding treatment decisions.Information is not extracted using heartbeat-specific characteristics but rather multi-layer automatic features used by CNN.Recently various methods have been proposed for ECG-based disease classification using deep learning.
ECG arrhythmia classification using 2D convolutional neural networks (CNNs) is proposed in [19].ECG data input using STFT.A 2D CNN classified heart disorders using STFT spectrograms.Officially, the model was STFT-CNN.In [20], an effective deep-learning model using time-frequency and convolutional unit presentations had proposed.The CNN learned to automatically classify ECG data into two categories.A CNN-based architecture for ECGbased cardiac disease categorization was suggested in [21].The Grasshopper Optimization Algorithm created a hybrid CNN model to eliminate artifacts and noise (GOA).Instead of CNN features, the GOA-CNN model uses pre-processing and DWT-based feature extraction.The CNN from [22] had used in the innovative heart illness classification architecture.A 5L-CNN automatically extracted and classified raw ECG data.Recently, [23] designed a deep learning algorithm to automatically identify and categorize illnesses using ECG data.They predicted cardiovascular disease using 18C-CNN and raw ECG data.A state-of-the-art convolutional neural network approach for segmenting and identifying ECG heartbeats had proposed in [24].The rapid R-CNN model allowed simultaneous segmentation and classification.
Another study [25] employed Restricted Boltzmann Machine deep learning to classify ECGbased arrhythmias (RBM).For ECG multi-class classification, a unique hybrid technique of deep neural network-integrated with linear and nonlinear characteristics collected from ECG and heart rate variability (HRV) had proposed in [26].In [27], morphological filtering had used to build a revolutionary classification method based on Extreme Learning Machine with Recurrent Neural Networks (RNN).In [28], a CAD system utilizing an auto-encoder deep learning approach had developed to automatically classify various forms of arrhythmias from ECG data.CNN had developed in [29] to identify depression based on ECG patterns.
Deep learning approaches have surpassed semi-automatic ones in popularity, although utilizing electrocardiograms to identify heart disease is still difficult.Using real-time health behaviors, automatic feature extraction from raw ECG data yields erroneous conclusions.Directly using CNN may also loss of cardiac wave-specific and vital heartbeat features.Also, existing methods [19][20][21][22][23][24][25][26][27][28][29] perform the deep learning-based classification directly using the raw ECG signals.It leads to erroneous classification results.We use this as motivation to provide a new framework for ECG-based cardiac disease diagnosis that makes use of a hybrid process of feature extraction.To begin, we devise a method for cleaning up the raw ECG data of any artifacts.We then propose a hybrid approach to feature extraction by bringing together the Stationary Wavelet Transform (SWT) and automated CNN features.Before feeding the hybrid data into the deep learning classifier LSTM, manifold learning is used to minimize and normalize the features.Our hybrid feature extraction, pre-processing, and deep learning classifier solve these problems for ECG-based heart disease identification.Contributions are summarised here.
• We designed a pre-processing technique that removes baseline drift, power line interference, and disturbances from ECG signals without losing heartbeat data.The mechanism for ECG-based disease identification is presented in Section Proposed system.The section Simulation results contains simulation results and discussions.Section Conclusion and future works discusses the conclusion and future suggestions.

Proposed system
Figure 1 shows the proposed system architecture for ECG-based heart illness diagnosis.Baseline drift, powerline interference, and other forms of noise are all eliminated from the raw ECG signal input before further processing.In this research, we build an adaptive and secure filtering system to achieve this goal.When the data has been cleaned and prepared, a mixture of automated and human-driven CNN-based features are used to execute hybrid feature extraction.The manifold learning method is employed to reduce the high-dimensional features prior to normalisation, and the handcrafted and CNN features are combined.The output of this block is the hybrid feature extraction performed by the proposed architecture.With sequential input of normalised feature vectors, the LSTM classifier may make early predictions about heart illness.Based on the likelihood score, the LSTM output layer identifies the input ECG signal of heart disease (the class with a high probability score).Scalability, precise QRS-complex extraction, timely disease detection, and reliable classification are all issues that need to be addressed by the proposed integrated system.

Signal pre-processing
The pre-processing method uses 1D median filters of various widths and 2D notch filters.1D median filters eliminate baseline wander artefacts from input signal I to avoid data loss.Two median filters of 200 ms and 600 ms are used to input ECG data to correct baseline wander induced by low-incidence breathing components.Instantly removing the output of both filters from the original ECG signal yields the baseline-free ECG signal.A simple and effective second-order notch filter applied to the 60 Hz frequency component may reduce powerline interference, another ECG artefact.Notch cutoff is 35 Hz.Two median filters with notch filtering eliminate noise, baseline drift, and powerline interference with little processing.We've summarised pre-processing here.

Hybrid features extraction
Heartbeat segmentation, CNN feature extraction, and feature optimization are all part of this process.We apply an adaptive transform-domain function to segment the ECG data to obtain QRS beats.The CNN is designed to extract the automated characteristics from the pre-processed ECG signal.Finally, the features of handcrafted and CNN are combined in the final optimization step.Dimensionality reduction and feature normalization are performed on the ensembled feature vector.

Dynamic heartbeat segment
Third-level SWT decomposition is used for segmenting pre-processed ECG data and extracting QRS characteristics.Using Normalized SWT, Algorithm 1 illustrates the effectiveness of the proposed handmade features vector creation (NSWT).Before applying the SWT decomposition, pre-processed ECG signals are normalised, which helps with the challenge of reliable wave detection for complex ECG data.Normalized and denoted by the symbol IN, the pre-processed ECG signal IP consists of: When the signal has been normalised, a third-level NSWT decomposition is performed using the Haar wavelet to provide an approximation and detailed coefficients.In this study, we use the third-level approximation (AAA) and detailed coefficients (DDD) (1) to identify AQRS and DQRS, respectively."The AQRS and DQRS beats are extracted using the dynamic thresholding method.QRS extraction's adaptive technique takes care of scalability and data loss concerns.Using this method, the QRS complex is represented by the fusion of the approximation and detailed coefficients, FAD, allowing for accurate estimate of QRS characteristics of the input normalised ECG signal for the first time.
After obtaining the 3rd level coefficients, AAA and DDD, the dynamic thresholds T A and T D for each coefficient are calculated as follows:" Golande and Pavankumar Journal of Big Data (2023) 10:139

CNN-features extraction
An automatic method for extracting lightweight CNN features was developed once the manual method was exhausted.Automatically extracted CNN features, or F-CNN, are produced using an IP pre-processed ECG signal as input.To extract features from preprocessed ECG data at minimal computational costs, we propose a 3L-CNN model, as shown in Table 1.The first convolutional layer has a 40-element kernel, whereas the second and third layers each have a 3-element kernel.Because of this, a lot less computation is required."We employed batch normalisation after each 1D convolutional layer to address the problems of parameter growth and disappearing gradients.For efficient feature extraction in 3L-CNN, we employed 1D ReUL followed by max-pooling.The 1 × 10,000 input ECG signal is used to generate 128 × 4 features at the third layer.
Using the aforementioned 3L-CNN framework, we were able to extract the features vector.Layers of batch normalisation, ReUL, and max pooling are combined with the 1D convolutional layer to form the squashing layer.The final outcome of this additive bias in the max pooling layer's output is as follows: where, • F l j : are the feature maps produced by the ReUL l of j th max-pooling kernel • f l−1 j : are the feature maps of the previous ReUL l − 1, • kij: are the i trained convolution kernels • b l j : the additive bias • pool max (•) : the max-pooling operation • tanh(•) : the hyperbolic activation function.
The final layer F l j stored into the output variable F CNN as 2D feature vector."

Features selection and normalization (FSN)
In this stage, the recovered feature vectors F AD and F CNN are enlarged to the standard size for the full dataset.F AD and F CNN are then merged to produce a 256 × 1 feature vec- tor into variable F Ens .
Because the number of the extracted features F Hyb is large, features selection becomes critical in order to improve prediction accuracy while minimising computational effort. (2) On extracted features, we used manifold learning for feature selection.When compared to other feature selection strategies, this technique creates trustworthy and unique features.Using the manifold technique, the features from each vector were picked up to 50 times.
Following feature reduction, we normalise them using the log 10 technique, as follows: This method considerably increases the performance of heart disease diagnosis while also assisting in reducing space and time complexity.

Classifiers
We created the LSTM sequential classifier for disease prediction and CNN for automated feature extraction in this work.Memory cells, I/O gates, forget gates, and peephole connections make up covert LSTM units.LSTM, fully connected, and softmax layer operations classify input data into two heart diseases.LSTM design uses 150 hidden layers and 27 epochs.Evaluation utilised other classifiers.We used 70% training data and 30% test data to get the final total.

Simulation results
On a computer running Windows 10 with an I3 processor and 4 GB RAM, we put the proposed model into use and evaluated it.Every experiment is run on the PTB Diagnostic ECG Database, a publicly accessible research dataset [30].This dataset is made up of ECG data gathered from 290 people.The ECG recordings were categorized into different categories, such as "myocardial dead tissue, myocardial infarction, and myocardial infarction.Cardiomyopathy/Heart Disappointment, Pack branch square, Dysrhythmia, Myocardial hypertrophy, Valvular heart disease, Myocarditis, Miscellaneous, and Healthy Control" are all conditions that may be seen in the human body.Every ECG is (6) data having a total of 15 signals which consist of 12 conservative leads and 3 Frank leads at 1000 Hz sampled frequency.We prepared the training ECG data in 6 major classes like Bundle Branch Block (BBB), Cardiomyopathy (CMP), Hypertrophy (HPT), Myocardial Infarction (MCI), Other Heart Diseases (OHD), and Healthy Control (HC) of a total of 530 ECG samples.Table 2 shows the number of ECG samples collected for each class.
The OHD class contains other heart diseases such as Dysrhythmia, heart failure, myocarditis, valvular heart disease, etc.For performance evaluation, the total dataset has been divided into the ratio of 70% (371) training and 30% (159) testing ECG samples for each classifier.Using various classifiers, including LSTM, Ensemble Classifier, ANN, SVM, and KNN, we first assess the effectiveness of the suggested hybrid method of feature extraction.Then, using cutting-edge techniques, we give the comparison analysis.Performance metrics including precision, recall, accuracy, F1-score, and prediction time are measured.
The results of heart disease classification accuracy, precision, recall, and F1-score using various classifiers are displayed in Tables 3, 4, 5, 6.These findings suggest that FSN with LSTM outperforms other classifiers in terms of performance.It's because, in comparison to other classifiers, LSTM successfully solves the issues of gradient exploding, overfitting, and class imbalance.The probabilistic LSTM classifier with sequential hybrid characteristics as input for the early prediction of heart disease was primarily responsible for the performance improvement.The EC classifier outperforms the SVM, KNN, ANN, and EC classifiers for all of these parameters because it can reduce misclassification more effectively than the other classifiers.Among all the classifiers, the KNN classifier has the worst performance.Another method of classifier predictability analysis is shown by the F1-score parameter.
On the other hand, we contrasted the FSN features with the raw hybrid features to demonstrate how feature normalization and selection might improve classification performance.Using the FSN for each classifier considerably enhances the results.The hybrid feature engineering strategy used CNN-based and handcrafted features for disease classification efficiency.Thus, the proposed model improves classification accuracy and F1-score parameters overall current techniques.By choosing significant and distinctive characteristics from the high-dimensional hybrid feature vector, manifold learning may decrease duplicate features.Multiple learning has normalized the outcome to further decrease categorization errors.The benefits of feature normalization and manifold learning decrease training and misclassification problems.Additionally, in machine learning approaches, feature normalization performs the straightforward procedure of distance computation.Effective weight computation during training and classification is guided by the normalized range of all characteristics.Table 7 shows the impact of using manifold learning in the proposed model.It shows that applying manifold learning has improved the overall classification performance approximately by 4%.
We conclude by comparing the proposed model to state-of-the-art deep learning algorithms for ECG-based cardiac disease classification based on their performance.We've employed several cutting-edge methods including STFT-CNN [20], GOA-CNN [21], 5L-CNN [22], and 18C-CNN [23].Table 8 displays the average training-detection time, accuracy, and F1 score.This comparison suggests that the proposed cardiac disease classification model achieves ECG-based efficiency with substantially lower computational overhead than any other deep learning-based approach now in use.The proposed model overcomes its limits by adopting an efficient pre-processing method, extracting adaptive QRS complexes, lightweight CNN features, manifold learning and normalization, and LSTM for classification.Existing deep learning-based models such as STFT-CNN, GOA-CNN, 5L-CNN, and 18C-CNN only relied on automatic feature extraction only which neglects some of the vital heart beats related features.The proposed hybrid approach of feature engineering considered handcrafted and CNN-based features for disease classification efficiency.The achieved results using the proposed model, therefore, show a significant classification accuracy improvement compared to all existing methods in terms of accuracy and F1-score parameters.Another reason for performance improvement using the proposed model is the pre-processing of raw ECG signals.The existing deep learning-based methods directly worked on raw ECG signal which consists of redundant noisy beats and artifacts which affect the feature extraction and classification performances.The proposed CNN-based model also takes lower computational complexity compared to existing methods due to the reduced number of layers and kernel sizes.

Conclusion and future works
The purpose of this article was to offer a unique method for classifying heart disease using ECG data.We created an integrated system that allows for automation while preserving important cardiac wave data.With CNN's automatic feature learning, adaptive heartbeats segmentation results in an accurate depiction of heart performance.It aids in the reduction of misclassification mistakes.As the features vector is constructed utilising CNN and QRS complex features, the FSN method offers a more effective and reliable feature set for the precise categorization of cardiac illnesses.The experimental results show that the suggested model performs better than the prior deep learning-based

Fig. 1
Fig. 1 ECG-based heart disease diagnosis framework proposed
Combining, reducing, and normalizing CNN and handcrafted characteristics improves categorization.•The sequential deep learning classifier called LSTM is designed for automatic heart disease classification from the hybrid features.• We extensively test the suggested model using state-of-the-art methods on the publically available dataset.
• Dynamic hand-crafted and comprehensive CNN feature extraction model proposed as a hybrid feature engineering approach.Dynamic threshold-based QRS complex extraction with third-level SWT decomposition extracts user-defined features.A lightweight 3-layer CNN model extracts CNN-based automated features from pre-processed ECG data.

Table 1
3L-CNN configuration for automatic feature extraction

Table 2
ECG heart disease dataset classification

Table 3
Accuracy performance analysis

Table 4
Precision performance analysis

Table 5
Recall performance analysis

Table 6
F1-score performance analysis

Table 7
Proposed model analysis with manifold and without manifold learning

Table 8
Performance analysis with state-of-art methods Page 12 of 13 Golande and Pavankumar Journal of Big Data (2023) 10:139methods.We recommend looking at more data sets in the future to see how reliable the proposed model's performance is.