A Brief Review on Electrocardiogram Analysis and Classification Techniques with Machine Learning Approaches

Electrocardiogram captures the electrical activity of the heart. The signal obtained can be used for various purposes such as emotion recognition, heart rate measuring and the main one, cardiac disease diagnosis. But ECG analysis and classification require experienced specialists once it presents high variability and suffers interferences from noises and artefacts. With the increase of data amount on long term records, it might lead to long term dependencies and the process become exhaustive and error prone. Automated systems associated with signal processing techniques aim to help on these tasks by improving the quality of data, extracting meaningful features, selecting the most suitable and training machine learning models to capture and generalize its behaviour. This review brings a brief stage sense of how data flows into these approaches and somewhat techniques are most used. It ends by presenting some of the countless applications that can be found in the research community.


Introduction
Biomedical signals are a fundamental source of information about human health since they provide a preview of body's systems functions. Specifically, on the cardiac system, the most common information used by clinicians to infer about healthiness is related to electrical activity of the heart. An anomaly in one or more steps of cardiac cycle pattern, whether in the electrical conduction or tissue malformation, can lead to arrhythmias or another issues. An arrhythmia is said to occur if the heart beats too early, too late, slowly, quickly, or at irregular intervals. Among the many different shapes of arrhythmias, atrial fibrillation (AF) is the most common, afflicting about 1% of world population. Even though most arrhythmias themselves aren't life-threatening, they are also related to other disorders like stroke, heart failures and coronary artery disease. In addition, the symptoms associated to arrhythmias like chest pain, fatigue and indisposition, decrease life quality and can affect the performance of simple daily tasks. Early diagnosis is essential to avoid or mitigate severe conditions (Hagiwara et al. 2018;Borghi, Borges, and Teixeira 2021;Borghi 2020;Kaplan Berkaya et al. 2018). In order to diagnose arrhythmias, some heart's resources might be used such as stress test, echocardiogram, and the standard electrocardiogram (ECG). ECG shows the electrical activity of the heart by placing electrodes over the patient skin and a proper electronic system associated. Inspecting the ECG records, the specialists are able to detect abnormal patterns. However, since some arrhythmias are related to a long-time behaviour and/or may occur unpredictably, the records should be extended to several hours, turning the inspect process tedious and human error prone. Moreover, the diagnosis is highly dependent on the analyser experience becoming susceptible to inter observer variabilities whom must be proof of the patients particularities (Hagiwara et al. 2018;Borghi, Borges, and Teixeira 2021;Borghi 2020;Li and Boulanger 2020). Due to these challenges and the high amount of data to manually inspect, automated approaches by using signal processing and machine learning techniques emerges as a good alternative, being reported several proposes at this application field in recent years. In this brief review, the aim is to describe some of the main stages of processing and learning methods that currently are used for ECG analysis enabling heartbeat and patient classification. A real-time monitoring using portable and wearable devices to analyse heart's electrical activity is also an interesting application that could help on treatment monitoring and emergencies calls (Lyon et al. 2018;Borghi, Borges, and Teixeira 2021;Kaplan Berkaya et al. 2018;Wasimuddin et al. 2020;Borghi 2020). This paper is structure as follow. First, in Section 2, is presented ECG analysis context showing the main databases, preprocessing techniques, and some feature crafting. Secondly, in Section 3, are presented some of the most popular learning model used to classify features samples. In Section 4, are proposed some discussion about the processing, learning and applications, and in Section 5 a brief conclusion.

ECG Analysis
An electrocardiogram (ECG) is simple, quick, safe and painless way recording the electrical activity generated and conducted in the heart. It is an effective non-invasive tool for various biomedical applications such as measuring the heart rate, examining the rhythm of heartbeats, diagnosing heart abnormalities, emotion and physical recognition and biometric identification (Kaplan Berkaya et al. 2018;Satija, Ramkumar, and Sabarimalai Manikandan 2018;Li and Boulanger 2020). Different patterns of electrodes' placement over patient's skin provide different observations of heart, called leads. Some arrhythmias are easier detected inspecting a certain region of the heart, so is desirable to have a full 12-lead ECG record to comprise all possible observations. Obviously, it requires storing a greater amount of data than a single lead record and so it could not be suitable for all applications e.g., mobile, and real-time monitoring. Figure 1 presents a comparation example between a normal and an AF segment of ECG (Borghi 2020).

Database
The main public databases are available through PhysioNet (Goldberger et al. 2000) digital platform, being MIT-BIH (The Massachusetts Institute of Technology-Beth Israel Hospital) the widely used among main studies (Kaplan Berkaya et al. 2018;Borghi 2020). This dataset comprises sub datasets which are specific to certain behaviour. However, there is overlapping data from patients in different sub datasets once they might have more than one anomaly. Regarding MIT-BIH characteristics, Table 1 summarizes the principals.

Preprocessing
ECG recordings are usually contaminated by different types of noise, artifacts and may present silent periods due to misplacing or disconnecting of electrodes. Preprocessing stage of ECG analysis aims to attenuate the noise and artifacts and find a solution to deal with non-information of silent periods. These are crucial steps on fiducial points estimation (annotation of ECG's events such as waves begin, peak and end, and inter waves intervals) and can also help on comparation between different patient records by removing offset (Kaplan Berkaya et al. 2018).

Noise Types
In practice, the ECG signals are often corrupted with different types of noises such as baseline wander (0.15 up to 0.30 Hz; results from electrode contact noise and electrode motion artifacts like patient inhale), power line interference (50 or 60 Hz), electromyogram (EMG; results from other muscles activities apart heart) noise, electrode contact (results from a deficiency in the contiguity between the electrode and skin, sometimes providing silence periods) and instrumentation noise (100 kHz and 1 MHz; results from medical apparatus). Hence, applications or analysis like morphological feature extraction or detection of ECG events could benefit from improving signals quality by removing noise contamination (Satija, Ramkumar, and Sabarimalai Manikandan 2018;Kaplan Berkaya et al. 2018).

Filtering Techniques
Some filtering techniques may be applied over noisier ECG signals to strongly attenuate the amount of non-desirable information. Before converting analogue sensors signals to digital domain it is highly recommended to apply a bandpass (or at least a lowpass) filter to limit the frequency band avoiding the aliasing phenomenon. In digital domain several works reported the usage of bandpass, lowpass, highpass, notch, median, adaptive, Hilbert and Wavelet transform, among others. All applied for noise and artifact removal purpose. The most used bandwidth and function are respectively, 0.1-100 Hz bandpass, allowing to vanish muscle noise, baseline wander, power line interference, and low-and high-frequency noise components. For artifact removal, wavelet transform in different forms was applied to the signals and satisfactory results were obtained (Kaplan Berkaya et al. 2018;Borghi 2020). For dealing with silence periods, it's possible to use a threshold strategy based on duration between heartbeats to note if there are "connect or not". Connected beats are those which are closer than threshold value. Upper this limit, beats can be considered too far to be adjacent and them a silence period is highlighted. On feature extraction or segmentation progress, these beats are considered as being end and begin of different recordings from the same patient (Borghi 2020).

Features
Since ECG is a time record, approximately periodic and non-stationary, some alternative representations can be intentionally obtained in order to construct an observation that highlights certain types of desirable behaviours for distinguish patterns. Thus, various feature extraction techniques have been proposed to expose the distinctive information from ECG signals. Those features can be used individually or in combination with other features. Here are presented three of many possible classes: morphological, temporal and statistical (Kaplan Berkaya et al. 2018;Borghi, Borges, and Teixeira 2021;Borghi 2020).

Morphological
Morphological features are those resources related to the shape of signal in terms of amplitude and can be described individually (one sample) or within a context. Defining morphological features can be done through calculation of entropies such as Shannon, Renyi and Logarithmic Energy over well located regions of heartbeat, for instance, the waves P, T and QRS complex ( Figure 2). It would result in several indicators of heartbeat's information related to the morphology of those regions. In that sense, was build a new behaviour representation that can be higher descriptive than ECG itself (Kaplan Berkaya et al. 2018;Borghi 2020;Wasimuddin et al. 2020). Methods as Hermitt transform, Wavelet transform and discrete cosine transform can be used individually or in combination to reach a better representation of ECG's waveform shapes (Lyon et al. 2018). Another morphological features that can be considered are the amplitude and slope of P, R and T waves and QRS complex, potentially highlighting the strength of atriums and ventricles activations. By reducing the full time-record of heartbeats, for some set of points highly representatives, it's possible to gather enough information to describe the pattern of heartbeat with much less data. This meaning is also extended to temporal and statistical features (Borghi 2020;Lyon et al. 2018).

Temporal
Temporal features are those extracted from ECG records by measuring distance between samples. Here it's necessary to set reference points or regions such as P, R and T wave peaks , for instance. By measuring these distances is possible to evaluate the variability of heart rate, the disturbances of signal periodicity (Jitter (Teixeira and Gonçalves 2016)) and the duration of events (waves) (Figure 2). These features are also used on fiducial points annotations, where the begin, end and peak of the waves are defined by a certain algorithm (Borghi, Borges, and Teixeira 2021;Kaplan Berkaya et al. 2018). Due to arrhythmia's temporal variability characteristic, temporal features are the most relevant ones since they can highlight the time behaviour of the signals. A main feature commonly used on AF detection systems is the R-R interval where it's the period between two consecutive R-peaks ( Figure 2). Grouping several adjacent R-R intervals, is possible to generate a descriptive example of long-term, highlighting long-term time dependency of ECG (Borghi 2020).

Statistical
It's possible to describe one or more heartbeats or ECG segments extracting statistical features as energy, mean, standard deviation, maximum, minimum, kurtosis and skewness. The features generated can be widely used to analyse data according to their type of distribution and variability within certain population or disease. Combining these features and morphological features, is possible to build an enriched dataset of representations. At the same time depending on how morphological features were defined, it may result on data redundance or without true improvement (Kaplan Berkaya et al. 2018; Borghi, Borges, and Teixeira 2021; Borghi 2020).

Feature Selection
Considering the high dimensional space provided by the combination of extracted features from ECG signals or other sources (like convolutional layers), a feature selection stage might be necessary. This process consists of sorting the information relevance on a datas et to reduce data redundancy and dimensionality (e.g. using principal component analysis). As expected, the resulting dataset becomes smaller and sometimes denser, once the information remaining are supposed to be more relevant than the discarded. Performing a feature selection also helps in speeding the learning process once data amount is reduced and more cost-effective (Kaplan Berkaya et al. 2018;Lyon et al. 2018).
The main feature selectors used are based on filter, wrapper, and embedded methods. The first is related to a score scheme (such as features' correlation and Fisher Score) that is independent to the learning model, scalable and fast. The wrapper methods use a search algorithm (such as genetic algorithms) through the learning model to get the most suitable set. However, it's more time consuming than the first one. The last one, the embedded method runs at the same time of training process of the learning model, becoming exclusively fitted to it (Kaplan Berkaya et al. 2018;Hagiwara et al. 2018).

Learning Models
Learning models, or in this case, machine learning (ML) models, are computational algorithm based on several types of basis structures. From what kind of basis are being used and how they relate to each other, the model becomes more suitable to perform certain task such as classification, prediction and encoding, and dealing with different types of input data such as time series, functions, and images. For ECG recordings, which are a time series, approaches using Support Vector Machine (SVM), Long Short-Term Memory (LSTM), and Convolutional Neural Network (CNN) are the most common applied nowadays due to their great capacity of generalization over high dimensional, long, and variable data. Despite of high accuracy shown, the use of these models is very time consuming and requires a lot of computation resources and amount of data to train so that are their main limitations to implement at all solutions (Kaplan Berkaya et al. 2018;Borghi 2020;Wasimuddin et al. 2020). Basically, learning can be supervised (when the model learns through some dataset's examples and is tested on unknown data), unsupervised (the algorithm learns the data structure by itself), or reinforced (continuously trains by trial-and-error based on goaloriented interaction with environment) (Lyon et al. 2018;Hagiwara et al. 2018). The learning process is carried out according to the type of the model and the availability of computational resources. During this process, the model is updated iteratively in order to approximate the expected behaviour. For this, several techniques and their variants can be used, such as Levenberg-Marquardt, Resilient Backpropagation and Stochastic Descending Gradient. The model resulting from the training must be able to satisfactorily perform the task for which it was trained on entirely new data sets (Borghi 2020;Kaplan Berkaya et al. 2018).
Regarding machine learning applications (functionalities which model are trained) using ECG signals, classification of heartbeats and ECG segments are the most proposed. It focuses on the detection of abnormalities that may occur at unpredicted times and helps to detect arrhythmias. Other studies focus on patient classification, based on the overall behaviour of the ECG (Lyon et al. 2018;Borghi 2020;Kaplan Berkaya et al. 2018;Hagiwara et al. 2018).

Traditional Machine Learning
Traditional and kernel-based neural network (NN) methods use handcrafted features as input to analyze the ECG behaviour on prediction, detection, and classification tasks. Firstly design as a mathematical approach of a human neuron, and after associated with other to compose networks, the Multilayer Perceptron (MLP), or Feed-Forward Network, is able to solve the most series prediction and classification tasks through a proper training. A recent research was able to reach over 91% accuracy on AF detection using various handcrafted features extracted from ECG segments as input set to a MLP network (Borghi 2020). SVM is a popular supervised learning algorithm that aims to find a suitable hyperplane that separates input classes. It shows a good performance on generalization and classification problems and through changing kernel function is possible to fit the model in linear and nonlinear problem. Optimization of this method consist of speeding the classification process, reducing overfitting and improve the choice of it parameters (Kaplan Berkaya et al. 2018;Lyon et al. 2018;Wasimuddin et al. 2020). It is even possible to improve models' quality of classification by making more representatives examples, there are several limitations especially with high time dependency data and the increase of input complexity. Other types of networks emerged trying to address some of the limitations, such as the developing of Recurrent Neural Network (RNN). However, RNN also shown strong limitations, as the vanishing of gradient descent, turning high demanding applications unviable. The solution widely used nowadays relies on Deep Learning Techniques (Wasimuddin et al. 2020;Kaplan Berkaya et al. 2018).

Deep Learning
Deep learning methods have addressed various difficulties of traditional approaches by creating neuron cells much more complexes and stacking several layers, greatly increasing the abstraction capability of networks. Now, features are automatically learnt from raw data without needing for features engineering (Wasimuddin et al. 2020). Many works have shown accuracy levels between 98,0 and 99,9%, indicating a clear superiority of these models over traditional ones. However, they need a huge amount of data to get trained and are heavily time consuming (Wasimuddin et al. 2020;Parvaneh et al. 2019;Yildirim 2018;Borghi 2020). LSTM network is a type of RNN that uses a different neuron structure as a cell. Within each of these cells in network carry a context updated at each iteration. The input information, preferably time series, is analysed along the time and this new information is pounded with previous contexts and the internal state of the cells. The way it works is the cell controlling the flow of information through decision gates that apply element wise multiplications over their inputs (Figure 3). Moreover, LSTM networks may be built in such way it is able to analyze input context forward and backward propagation in time, increasing the potential of network to extract meaningful time dependencies from data. An advantage of LSTM is it ability to deal with ECG in raw form, i.e., without any preprocessing (Lyon et al. 2018;Borghi 2020). CNN is another supervised artificial neural network that has many applications over ECG signals. Basically, it consists of the convolutional layer, pooling layer, and the fully connected layer. The input data is applied on convolutional layer which transforms input's representation to a higher dimensional abstraction based on the layer's kernel. The data becomes a 1D representation within pooling layer. The last layer is simply a Perceptron layer fully connected used to estimate the outputs. The learning process updates the convolutional kernel iteratively. Stacking several of these convolutional layers, higher dimensionally becomes the final representation of data, and at same time, heavier and time consuming to process (Hagiwara et al. 2018;Wasimuddin et al. 2020;Parvaneh et al. 2019).

Assessment
The metrics to evaluate ECG analysis and classification tasks are mainly based on general success on the pattern recognition field. Here are presented the basics variables:  TP: True Positive -occur when the reference is true, and the output's model is true;  TN: True Negative -occur when the reference is false, and the output's model is false;  FP: False Positive -occur when the reference is false, and the output's model is true;  FN: False Negative -occur when the reference is true, and the output's model is false. After performing all the classification tasks, the previous variables have accumulated values that can then be compared according to the following parameters.
 Accuracy -ratio of correct classifications to the total classified samples ;  Precision (positive prediction) -proportion of the actual positives inside the total of positive count;  Recall (or sensitivity) -proportion of positively predicted samples to the total number of actually positive samples;  F-measure -harmonic mean between the precision and recall;  Specificity -proportion of negatively predicted samples to the total number of actually negative samples;  Matthews Correlation Coefficient -measure the correlation between the actual classes and predicted classes;  Area Under Curve (Receiver Operating Characteristics) -plot of the true positive rate against the false positive rate. Among the metrics presented, accuracy and recall are the most used in the literature review (Kaplan Berkaya et al. 2018).

Applications
Much of the research has been presented ECG signals applications on emotion recognition, biometric identification, and stress level detection. In addition, many other signals such as the electroencephalogram, skin temperature, blood pressure, electromyogram, heart rate variability, cortisol levels, thermal imaging features , etc., are been gather to promote a comprehensive behaviour observation (Kaplan Berkaya et al. 2018;Warrick, Lostanlen, and Nabhan Homsi 2019). In this section are presented 6 applications of ECG analysis, mainly related to detection of abnormalities. Using polysomnographic signals (PSG) (aggregation of biomedical signals from different sources, such as some leads of electroencephalogram (EEG), electro-oculogram (EOG), ECG, SaO2 (saturation), etc.), Warrick, Lostanlen, and Nabhan Homsi (2019) proposed a sleep loss detection system for patients using feature extraction from the second order Scattering (ST) transform. The obtained representation is introduced in a layer called convolutional deeply separable that maintains the temporal aspect while adapting the dimensionality to the BLSTM network. The great contribution of this work was the demonstration of the improvement of the representation of the data using the TS compared to the previous work of the authors. Faust et al. (2018) developed a system for assisting medical diagnosis based on a bilateral LSTM network (BLSTM). In their system atrial fibrillation was search in windows with 100 heartbeats. The data were acquired from the PhysioNet MIT-BIH Atrial Fibrillation platform. Apart from segmentation, no pre-processing has been implemented. The accuracy obtained on the test set after applying the 10-fold cross-validation approach was 98.51%. Dang et al. (2019) proposed the combination of CNN and BLSTM for automated ECG analysis in order to detect atrial fibrillation. For that, they used the PhysioNet data set MIT-BIH Atrial Fibrillation, and the recordings went through stages of event detection, segmentation, and Zscore normalization. There wasn't feature handcrafted. The accuracy obtained for the proposed model was 96.59%. Yildirim et al. (2019) proposed an arrhythmia classification system based on the encoding of heartbeats using Convolutional Auto Encoders (CAE). The objective was to reduce the number of samples at learning model's input (LSTM) and at the same time to promote the differentiation between classes, in this case, five classes were used. The measure of loss of information in the reconstruction by the CAE was 0.70%. The classification accuracy obtained in the study was 99.11% in the model using the CAE and 99.23% in the model without. However, there was a reduction in the processing time (training and testing) by 7 times when the encoding was applied (already considering the time of its execution), which allows for an in-depth study of its implementation in real time. Borghi (2020) proposed the development and performance comparison between MLP and LSTM models based on the use of several sets of morphological, temporal, statistical and timefrequency features extracted from the signals of the MIT-BIH Atrial Fibrillation database. As an innovative spot, the methods of analysis of periodic signals Jitter and Shimmer were used to compose the feature set, as well as feature maps resulting from the Scattering transform, up to now not tested on ECG signals. As result, was obtained top accuracy performance of 91.96% for MLP and of 98.17% for LSTM. Borghi (2020) has also developed a new methodology for detecting the R peaks of ECG signals based on the Hilbert transform. Before the detection process, the author applied digital filters on the database signals based on the window design with auto-convoluted Hamming window, improving the detection result. With the annotations updated by the new method, the results in the learning model were also improved, reflecting a solid improvement in performance in all stages of the system.

Conclusions
The usage of machine learning techniques on biomedical signal applications, especially the ones related to ECG analysis and classification have shown great advances and better reliability in the last years since deep learning has become a solid approach over traditional learning. Meanwhile, the availability the test of new feature combinations provided by scientific research from long time can make traditional learning survives and still a good option due to low complexity and lightweight. After all, one point remains the same long ti me, the lack of data and references to training machine learning algorithms and once trained how to effectively test their robustness. Handcrafted features and data augmentation techniques should keep attention from researchers for a good time yet, since new methods and their combinations has allowed to build lightweight systems to fit on mobile and wearable devices. At the same time, deep learning researchers have tried successfully to optimize the performance and resource management of their models, trying to keep the high accuracy together with faster and denser structures.