CORONARY HEART DISEASE CLASSIFICATION USING IMPROVED PENGUIN EMPEROR OPTIMIZATION-BASED LONG SHORT TERM MEMORY NETWORK

: Ventricular fibrillation (VF) is the most life-threatening and dangerous type of Cardiac Arrhythmia (CA), with a mortality rate of 10-15% in a year. Therefore, early detection of cardiac arrhythmia is important to reduce the mortality rate. Many machine learning algorithms have been proposed and have proven their usefulness in the classification and detection of heart problems. In this research manuscript, a novel Long Short Term Memory (LSTM) classifier with Improved Penguin Optimization (IPEO) is implemented for VF classification. The IPEO is used in finding optimal hyperparameters that overcome the overfitting problem. The presented model is tested, trained, and validated using two standard datasets that are available publicly: Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) and the China Physiological Signal Challenge (CPSC) 2018 dataset. Both of them consist of ECG recordings for five seconds of coronary heart disease (CHD) patients. Furthermore, Fuzzy C-Means and Enhanced Fuzzy Rough Set method (FCM-ETIFRST) are used for feature selection to extract informative features and to cluster membership degree, non-membership degree, and hesitancy degree. On the MIT-BIH dataset, the proposed model achieved accuracy, sensitivity, specificity, precision, and Matthews’s correlation coefficient (MCC) of


INTRODUCTION
As per the World Health Organization (WHO), coronary heart disease (CHD) is a substantial global epidemic and the most common cause of death worldwide, accounting for one in every three deaths. Coronary heart disease (CHD) is a substantial global epidemic and the most common cause of death worldwide, accounting for one in every three deaths. In 2002, 16.7 million cardiac disease (CD)-related deaths were reported, and this is expected to rise to 23.3 million by 2030 [1]. Age, obesity, smoking, exercise, hypertension, diabetes, and high blood cholesterol are all risk factors for CD. Heart disease diagnosis is conducted using a variety of medical tests, a medical history, and an examination of the patient's lifestyle. There are many variables to consider when diagnosing heart diseases, thus a specialist is usually involved [2][3]. The contraction of the ventricles that exceeds 100 bpm causes ventricular tachycardia (VT), and when the ventricles' contraction exceeds 500 bpm, it is referred to as ventricular fibrillation (VF), which is the most abnormal cardiac arrest. VT and VF are the two subsets of ventricular arrhythmias (VAs) [4]. Nearly all countries face a significant coronary artery disease (CAD) burden, due to limited resource availability for providing comprehensive health care of CAD patients, and insufficient awareness campaigns on such diseases [5]. Various machine learning methods are developed to detect CHD at an early stage using different software libraries and platforms to extract huge amounts of information from large datasets. The predictive analysis of CHD includes preprocessing the data collected from datasets, preparing the data for implementing suitable algorithms, training, testing, and validating the chosen model, and making final predictions with fine-tuned parameters [6,7]. Atrial fibrillation (AF) detection has been done in many types of research depending on the activities of the atrial region of the heart, which also includes the detection of P-wave, the entropy of wavelet samples, and the detection of Fwaves [8]. To detect P-wave or R-R wave irregularities with manual instructions on longterm ECG recordings, AF uses currently available algorithms. Any change to P waveform in shape or blunt from Lead 2 and 4 results in irregularity, and it directly relates to the condition of atrial activity since it is known as a depolarization wave of the atrial region [9][10]. In this work, a design of methodology to detect and classify CHD is developed by using ECG signals of MIH-BIT and CPSC 2018 dataset. The major contributions of the paper are described as follows: • To resolve the problem of class imbalance and dimensionality curse, the method of FCM-ETIFRST has been included in the process of feature selection.
• For accurate classification of MIT-BIH arrhythmia and CPSC 2018 datasets with reduced overfitting issues, an efficient LSTM classification has been considered. https://doi.org/10.31436/iiumej.v24i2.2698 • An Improved Penguin Emperor Optimization (IPEO) has been proposed in this research for finding the optimal hyperparameters to operate LSTM without overfitting issues. This is done by improving the exploitation and exploration stage of PEO by levy flight and Gaussian mutation mechanism.

LITERATURE REVIEW
Li et al. [11] developed a novel feature fusion framework that used phonocardiogram (PCG) signals of 175 subjects to identify CAD features. A sum of 110 features from different domains were extracted, reduced, and selected. The obtained images were taken as CNN input for feature learning. The feature selection and the features of deep learning were then combined and led to a multilayer perceptron for classification purposes. This proposed framework outperformed deep learning features and multi-domain features with the highest accuracy, sensitivity, and specificity. Too many features inhibit the model learning process and result in poor generalization, which is a limitation of this work. Future work can be focused on improving the detection accuracy of CAD by using dynamic features and multichannel PCG signals. Nguyen et al. [12] developed a stacking method to predict atrial fibrillation (AF) from ECG signals and perform statistical segment-based feature recognition, which was done by Support Vector Machine (SVM), the segmentation units produced by a convolutional neural network. To validate this method, the ECG dataset from the PhysioNet/Computing in Cardiology Challenge 2017 was used, which contained 8528 ECG recordings. The proposed method applied to the same dataset and metric outperformed state-of-the-art methods with a high F1 score. The limitation of this method was the nature of the dataset that was chosen. The AF signal segment was not included with AF due to the absence of information about AF or the occurrence of other ECG signals. The proposed method was applied to other medical signal-related problems. Jahmunah et al. [13] proposed an automated system (AS) using automated categorization of ECG signals with CNN, which classified signals into normal, CAD, myocardial infarction (MI), and congestive heart failure (CHF) classes. The proposed GaborCNN achieved better performance with 98.5% accuracy. The use of GaborCNN resulted in low computational complexity which was an added advantage for this work. Furthermore, the system was validated with a large database and was seen to exhibit the highest potential to assist clinicians in screening for CVDs using ECG signals. Only a few subjects and smaller datasets were used for CAD and CHF, which was the limitation of this work. To improve the classification accuracy of GaborCNN, more data needs to be trained to the network in the future, so that the onset of CAD is detected early and prevented from progressing to MI or CHF.
Tseng et al. [14] developed a method for VF prediction which was a novel deeplearning method. For training and validation of the data, ECGs from MIT-BIH datasets were used. The estimated results demonstrated that the proposed two-dimensional short-time Fourier transform (2D STFT)/continuous wavelet transform (CWT) convolutional neural network (CNN) model achieved the highest recall and accuracy. The proposed model was also compared with 1D CNN and 2D time-domain models and again achieved the highest accuracy. The database did not identify concomitant diseases, and the influence of these diseases on ECG signals was unknown, which was one of the limitations of this work. In the future, larger datasets should be used to build a more accurate model. Additional data from conventional and portable hospital vital sign monitors, as well as wearable devices should be collected. Panigrahy et al. [15] proposed a novel approach to detect the VF rhythm that involved algorithms namely: Support Vector Machine (SVM), Adaptive Boosting (AdaBoost) and Differential Evolution (DE). These algorithms were implemented with the help of optimal variable combinations. The training and test datasets were taken and validated from three databases namely the arrhythmia database, CUDB database, and MIT-BIH malignant ventricular arrhythmia databases for this methodology. The proposed approach achieved the highest accuracy, sensitivity, and specificity in detecting VF rhythm. Further, the proposed method was suitable for implementation during real time detection of VF rhythm. Çınar et al. [16] developed a deep learning architecture with SVM for the classification of Normal Sinus Rhythm (NSR) which was based on hybrid Alex-net along with ECG signals of Abnormal Arrhythmia (ARR) and Congestive Heart Failure (CHF). This approach was implemented on 192 ECG signals, out of which 96 were arrhythmias, 30 were CHFs, and 36 were NSRs. SVM and KNN algorithms were implemented on ARR, CHR, and NSR signals for classification purposes. Later on, these signals were classified using Long Short-Time Memory (LSTM). Chen et al. [17] proposed a CNN model for CA classification from ECG signals of the China Physiological Signal Challenge (CPSC) 2018 dataset. The dataset provided 6,877 recordings with a 12-lead ECG dataset, in which different types of diagnoses for 476 patients were predicted. The proposed technique obtained an overall highest F1-score, and first rank in the classification-and-challenge competition. This approach still needed to be filtered to achieve an equal level of performance on other datasets, which is a limitation of this work. Various CA types and wearable ECG devices are possible to be developed for use with this approach in the future.

METHODOLOGY
The framework for the proposed work is a combination of feature selection with FCM-ETIRST and LSTM classification with Improved Emperor Penguin Optimization (IEPO) hyperparameter optimization for reducing the problems of dimensionality curse and overfitting in CHD analysis and it is illustrated in Fig. 1. Because of the nature of time-step order in CNN with its several hidden layers, it is used to extract the features from the complex data of ECG signals. The characteristics of time will be taken by the layers of LSTM while classification is skipped at the stage of feature extraction. To optimize the computational resources and make more efficient detection of CHD, the proposed IPEO algorithm is utilized to optimize hyperparameters of LSTM. The steps of the proposed methodology workflow are designed as shown in Fig. 1.

Data Pre-processing
ECG recordings of VF patients are considered from two datasets namely MIT-BIH and CPSC 2018. The MIT-BIH dataset consists of 23 publicly available ECG recordings which include four types of rhythms: AF, VF, AV junction, and Normal. However, in this work, only VF signals are classified and sampled at a frequency of 250Hz. The proposed algorithm is implemented for detecting VF by pre-processing ECG signals in a window length of 5 seconds. The CPSC 2018 also consists of ECG signals in which all forms of arrhythmias are available. All signals are segmented into a number of episodes, where each episode is labeled by the clinician and divided into training, test, and validation datasets. Two leads are selected for this work from the dataset, and the data is divided into VF and non-VF categories. The ECG signals from the CPSC dataset are sampled at a frequency of 500Hz and segmented into 5-second episodes. A lead 1 sample signal from the CPSC dataset is considered, as shown in Fig. 2, and is segmented into normal signal and VF signal as shown in Fig. 3(a) and Fig. 3(b). Lead 2 normal signal and VF signal are shown in Fig. 4(a) and Fig. 4(b) respectively.

Feature Extraction
Here, the extraction of features is done using CNN layer architecture which is based on the 1-D framework. Timing characteristics and sequence of data can be retained with the representation of CNN-related features in the proposed framework. The following features are to be extracted from the proposed CHD framework.

1-D Approximation and Details Coefficients
By use of low pass and high pass filtering operations on the transform coefficients group, a set of coefficients namely, 'approximation from low pass' and 'detail from high pass' is obtained. Consider the parameters of coefficients for approximation and details to be and correspondingly, in case of a stationary signal, the transform their properties for the level of as mentioned in Eq. (1) and Eq. (2).
Where and are parameters of coefficients at the level of , represents the wavelet transform of the signals at -level coefficients, ( ) and ( ) represents the wavelet coefficient transformation functions. N represents the total number of wavelets obtained from the signal.

Hjorth Activity
Signal power is represented using an activity parameter that is time function variance, as given in Eq. (3). This activity denotes the power spectrum surface in the frequency domain.
where the signal is denoted using ( ).

Hjorth Mobility
The mean frequency or power spectrum of standard deviation proportion is represented as a mobility parameter. A first derivative variance of the square root of signal ( ) divided by signal variance ( ), which is explained in Eq. (4).

Hjorth Complexity
The frequency change is denoted using the complexity parameter. The signal's similarity and a pure sine wave are compared using the parameter, where the value converges to 1 if the signal is similar, as given in Eq. (5).

Mean Curve Length (MCL)
Katz fractal dimensions are measured using MCL and this measures EEG signals activity, as given in Eq. (6).
where the epoch last sample is denoted as , the window length is denoted as , and the EEG time series is denoted as [ ].

Mean Teager Energy (MTE)
EEG research was widely carried out using MTE feature, as given in Eq. (7).
where the last epoch sample is denoted as , the window length is denoted as , and the EEG time series is denoted as [ ].

Zero Crossing Rate
The rate of signal changes from negative to zero to positive or positive to zero to negative is denoted as Zero-Crossing Rate (ZCR), as in Eq. (8).
where the indicator function is denoted as 1 <0 and the length of the signal is denoted as .
The above-mentioned features are extracted and forwarded to the process of feature selection for further enhancement of dimensionality and to enable better output on the overfitting of classifiers.

Feature Selection
The FCM -ETIFRST feature selection method is applied to the extracted features to select relevant features for arrhythmia classification.

Fuzzy C Means
Feature selection on the proposed method is done based on the clustering of Fuzzy C-Means and Enhanced Fuzzy Rough Set method. The FCM method has the advantage of producing better results for overlapped datasets than the k-means method. The data point is an assigned membership to every cluster to form the appropriate cluster [18]. The data point is applied with membership based on the distance between the data point and cluster center . If more data is near to , then the cluster has more data point membership.
The estimated membership is increased for every iteration. An enhanced fuzzy C-means algorithm is implemented in the proposed method. The algorithm is given as follows: Step 1: Clusters number varies from 2 to . A certain value is considered and initial class prototypes are selected and use > 0.
Step 2: Image of new linearly weighted sum in terms of is calculated using Eq. (9).
where ℎ pixel gray value of are and (8-bit resolution is usually encoded for gray value) respectively. The neighbor and the set of neighbours is present in a window which is nearby . The mean-filtered pixel value is ∑ ∈ where neighbor terms' effect controls with a value approximately near to 1.
The ℎ cluster prototype is and is gray value of fuzzy membership that is with respect to cluster . The value varies from 1 to the number of gray levels denoted as (gray level has a maximum value of 256 levels) of the image and each fuzzy membership of weighted component parameter that estimates the final classification of fuzziness.

Enhanced Fuzzy Rough Set Feature Selection
The FCM has a wide range of applications and is significantly applied in numerous fuzzy clustering methods. Every sample point of membership degree to the class center is used as objective function optimization [19]. Euclidean distance is applied for the objective function of the sample point and clustering center. Solving the non-similarity index value function is the minimum value of every clustering center. The generalization is given in Eq. (12).
where the weighted index number is denoted as , the ℎ sample point of ℎ clustering center of Euclidean distance is = − , the fuzzy set of clustering center, and is between 0 and 1. The constraint formula of the Lagrangian multiplier is constructed based on the derivate of input parameters in Eq. (13) and Eq. (14) to reach the minimum. https://doi.org/10.31436/iiumej.v24i2.2698 The FCM outputs are centers and membership matrix . Every object degree contains , which belongs to centers . A fuzzy rough set of lower approximations is given in Eq. (15).
where object of hesitancy degree, non-membership, and membership are denoted as ( ), ( ) and ( ), respectively. The weighted factors are , and .
Each pixel membership degree is normalized and initialized as in Eq. (16).
where the image pixel of minimum and maximum intensity is denoted as ( ) and ( ) . Complexity is reduced in the normalization process, which calculates in between 0 and 1.
Non-membership value is present in the new method in uncertain presence. A high grade of certainty is measured based on observations when the membership value is near 0 or 1 and the membership value is near 0.5 for a high grade of uncertainty. Non-membership value is measured in Eq. (17) where ( ( )) membership value measures standard deviation , which is in the range of 0.39 -0.41. The hesitation degree is measured using ( ) = 1 − ( ) − ( ).
Every feature of fuzzy clustering gets every object membership degree, as given in Eq. (18).
Model equality is based on equivalence relations; approximate similarity or equality is measured based on fuzzy equivalence. Fuzzy equivalence relation is used, as in Eq. (19).
The size of the opening on one side is denoted as value . If is equal to 0, then the function makes sure to balance two objects, otherwise it is not able to balance the two objects.

Improved Emperor Penguin Optimization for Hyperparameter Optimization
In IEPO, three strategies are incorporated into EPO to maintain a suitable balance between exploration and exploitation to solve the drawback of the standard EPO [20]. These strategies are opposition-based learning, Levy flight, and Gaussian mutation. Equation (20) contains the fundamental formula for the EPO algorithm, which moves the penguins toward the center to locate the ideal value by measuring their distance from one another in the population. However, it is simple to get caught in the local optimal problem when the EPO algorithm optimizes the hyperparameters of LSTM. The strategy strengthens the bonds between penguins and improves each penguin's capacity for jumping to address this issue. The hyperparameters taken for LSTM have been given in the following Table 1. The distance between the penguins is represented mathematically, as shown in Eq. (20), with the term Depth. The proposed method incorporates the Gaussian Mutation (GM) into Eq. (20) because the EPO's unpredictability is a vulnerability. The GM has excellent randomization capabilities, which it can use to strengthen the penguin relationships and accomplish the goal of leaving the local optimal. The following formula can be used to represent the updated mathematical model: The individual penguins are then observed to move slowly while readily guiding the swarm to the regional ideal. The suggested strategy enhances the individual penguin's update position formula by utilizing the Levy flight's strong stochastics. The Levy flight expands the swarm's search area while simultaneously improving each penguin's capacity to jump. The best penguin can therefore swiftly arrive at the greatest answer. This is how it is formulated: where, represents the updated position of the th solution. Through the Levy flight mechanism, a new candidate solution is generated, which increases the knowledge diffusion across objects and yields a better solution, as Levy flight is a random process and the jump size leads to the Levy probability distribution function. The best penguin may also leave the search arena due to the addition of Levy flight and the strong randomness of Gaussian mutation. By doing this, the swarm might avoid falling into the infinite cycle and also the best solution from being updated. The Opposition Based Learning (OBL) is added to the EPO algorithm to reduce the upgraded algorithm's ability to jump. The OBL broadens the search domain's exploration, which in turn diversifies the swarm. It can be created using Eq. (22) where is the ℎ opposite penguin's location within the search domain. The ℎ variables lower and upper bounds are and respectively. is the location of the best penguin, and r is an element-filled random vector within the range[0,1](0,1). P is the th penguin in the population's position vector. Then, based on the fitness of the opposing sites, the best penguin is modified. To make the penguin change more quickly and discover the best value for the chosen hyperparameters, the new technique delivers greater performance and higher convergence velocity.

Classification using LSTM
Due to its outstanding performance in the extraction of temporal and spatial variables, the classification network used in the framework is an LSTM and its architecture is shown in Fig. 5. According to Fig. 5, the model is made up of an input layer, a single LSTM cell, a hidden layer, a normalization layer, and a classification layer.
Performance is improved by the idea of integrating the most recent data along with older data to predict the subjects one step ahead. With the benefits of a hidden layer selffeedback mechanism, the LSTM model solves the long-term dependence problem [21]. Gates of input state, output state, and forget state in the LSTM model are known to be three unique gates that are used to update data that are meant to be stored in the memory cell.
2) The gate of the input state modifies the memory state value depending upon the data input being used at the time; the function activation of sigmoid, matrix of weight, and bias are shown in Eq (24).
3) By using past data, the gate of forget state modifies the memory state value; the bias gate is indicated by the weight matrix which is given in Eq (25).
4) As shown in Eq (26), the value of the most recent LSTM unit is calculated by measuring the value of the current memory state.
where the dot product is represented by "*". Depending upon the state values of the last and candidate cells, the input state and forget state gates regulate the updated memory cell.
6) As shown in Eq. (28), the output of the LSTM is measured as: According to the 3 gates of control and memory state, the LSTM continuously stores, reads, resets, and updates the data. Because the LSTM internal parameter is shared, the weight matrix's dimensions are changed to regulate the size of the output unit. The LSTM's input and feedback are separated by a significant amount of time. The internal state of the memory cell's architecture maintains a constant error flow, and the gradient doesn't explode or vanish [22]. The LSTM classifier predicts the type of arrhythmia present in the CHD and those results are evaluated in the following section.

RESULTS
In this work, the proposed modified model is replicated using enhanced LSTM with the system requirements. The performance of the modified model is analyzed by means of sensitivity, specificity, accuracy, F-score, and MCC on the actual educational dataset. The feature length of 506 is obtained after feature selection. Specificity is defined as the test to appropriately recognize the areas that are negatively less in learning skills. Sensitivity is defined as the test to correctly identify the regions that are positively high in learning skills. Further, accuracy is the most important performance measure that is utilized in the LMS, where it is the ratio of correctly predicted observations to the total observations. Accuracy, sensitivity, specificity, and MCC are mathematically defined in the Eqs.

Quantitative Analysis
The performance analysis of enhanced LSTM without augmentation in accuracy, sensitivity, specificity, F-score, and MCC, with and without feature selection, are shown in Tables 2, 3, 4 and 5 respectively. Table 2 demonstrates the quantitative analysis of different classifiers with and without feature selection for the MIT-BIH dataset. Table 3      The performance measure of classifiers on CPSC dataset with feature selection is highest when compared with the values of performance metrics without feature selection.  The FCM -ETIFRST method selects the unique features to represent the class that solves the problem of imbalance and overfitting. The modified non-membership function helps to provide exploitation, and the modified membership function performs exploration. The similarity measure helps to maintain the exploration and exploitation of the feature selection process. LSTM shows higher sensitivity than other classifiers due to its efficiency in handling the features.  Table 4 describes the performance of optimizing the hyperparameters of LSTM using various multi-objective optimization methods like Particle Swarm Optimization (PSO), Fruit Fly Optimization (FFO), Anti Colony Optimization (ACO), Salt Swarm optimization Algorithm (SSA) with the selected features obtained from FCM-ETIRST on MIT-BIH arrhythmia dataset. In the observation, the results infer that the proposed IEPO algorithm has chosen the best set of hyperparameters for LSTM to perform at its better stage, when compared to the resultant parameters obtained from other optimization algorithms. It also shows an improvement of around 3% in accuracy compared to the existing optimization algorithm SSA. The performance comparison of different hyperparameter optimization techniques on MIT-BIH dataset, with and without feature selection is graphically represented in Fig. 10 and Fig. 11. Table 5 represents the performance analysis of K-fold validation in terms of performance metrics like accuracy, sensitivity, specificity, precision, and MCC on the MIT-BIH dataset and its graphical representation is specified in Fig. 12.

COMPARATIVE ANALYSIS
The comparative analysis of the proposed VF classification using IPEO based LSTM to the existing Boosted SVM based DE algorithm and Hybrid CNN-SVM deep neural networks using the LSTM model in terms of accuracy, specificity, sensitivity, and precision is shown in Table 6. The proposed model is trained, tested, and validated using two publicly available standard datasets namely the Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) and the China Physiological Signal Challenge CPSC 2018 dataset and both consist of ECG recordings for 5 seconds of coronary heart disease (CHD) patients. The accuracy was improved to 99.75%, with a specificity of 98.29%, a sensitivity of 98.39% and the precision held was 98.35%. The proposed IPEO was evaluated for efficient classification using the VF technique by overcoming overfitting problems and dataset classification.

CONCLUSION
In this paper, a novel classification method for classifying VF rhythm has been developed on the MIT-BIH and CPSC datasets consisting of ECG recordings of CHD patients. The method uses an LSTM classifier to classify VF and FCM-ETIFRST for feature selection and performs clustering of membership, non-membership, and hesitancy degrees. The method uses a window size of 5 seconds that requires less memory and obtained the highest accuracy of 99.75%. The proposed algorithm, on the other hand, combines traditional machine learning algorithms and CNN algorithms. The deep CNN was able to successfully extract useful deep features for ECG classification from raw ECG signals without prior knowledge of ECG signals or cardiac rhythm disorders. The proposed method outperformed the existing classification models on CHD by improving the efficiency of arrhythmia classification with suitable hyperparameter optimization in the LSTM algorithm. The improved performance of the proposed method on different test datasets suggests that it has the potential for clinical application. Future work can be focused on improving classification accuracy for both VF and AF by considering a window size of more than 5 seconds and it will be analyzed in real-time events.