1 Introduction
Biosignals are electrical, mechanical, thermal, or other signals measured over the time from the human body or from other organic tissue. They became applicable for medical diagnoses in 1895 when Willem Einthoven invented electrocardiography (ECG) as a clinical usable, non-invasive device. An ECG device measures the electrical activity of the heart muscle and depicts the complete cardiac cycle on an individual heartbeat using electrical polarization-depolarization patterns of the heart[1 ]. Since then, a huge variety of signals have been discovered that can be derived from the surface (skin) or from inside the human body. Prominent examples include electroencephalography (EEG), that depicts the activity of the brain recording voltage fluctuations from the scalp that result from ionic current within the neurons of the brain[2 ]; electromyography (EMG), that records the electric potential generated by muscle cells when these cells are electrically or neurologically activated[3 ]; photoplethys-mography (PPG), that depicts the volumetric changes of an organ (e.g., the microvascular bed under the skin) over the time by recording changes in light absorption[4 ]; or ballistocardiography (BCG), that monitors the heart activity recording ballistic forces (acceleration) on the chest[5 ].
Initially, analysis of biosignals was done purely manually. In the early 1980s, low-level signal processing was applied for noise reduction and filtering. Then, feature extraction and classification were implemented. However, these early systems were time-consuming and suffered from an unreliable accuracy[6 ].
Later from the 1990s, time-series models and supervised expert systems were used for feature extraction, and statistical classifiers were applied to support diagnosis. Over the last few decades, automated analysis of biosignals has turned into a core component for computer-aided diagnosis (CAD) and clinical decision-making. However, existing approaches are not effective for high-dimensional, more complex, and real-world noisy data that is continuously monitored using portable devices[7 ]. Therefore, the major goal of current research is to increase accuracy and speed of diagnostic systems towards event prediction from real-time signal analysis[7 ]
[8 ].
Artificial intelligence and machine learning help in automated and effective analysis of medical data[9 ]. Neural networks are one of the well-known techniques used to develop high-level expert systems for solving a wide range of medical tasks such as clustering, detection, and recognition of diseases[10 ]. Traditionally, most expert systems rely on hand-crafted features. As in many papers[10 ]
[11 ]
[12 ], we refer to “hand-crafted features” when the raw data is transformed before it is entered to the input layer of the neural network, and this transformation is performed or decided by a human. However, biosignals are generally non-linear, non-stationary, dynamic, and complex in nature[13 ]. Handcrafted or manually selected features are time-consuming, not optimal, domain-specific, and they require specific expert knowledge[6 ].
Neurons are the basic processing units in a neural network and they perform a non-linear transformation of the data input from neurons connected in the previous layer. Such a structure is incapable of processing raw biosignals[14 ]. Therefore, automated extraction and selection of task-specific as well as robust features are necessary to solve the complex real-world problems[15 ].
Deep learning is a machine learning approach that is based on a deep network architecture composed of multiple hidden layers. We have considered machine learning, disregarding whether it is performed supervised or unsupervised, as “traditional” if it is composed of five or less hidden layers. Contrarily with deep learning, feature extraction and selection are performed within the network that is fed with raw (or low level-processed) data but not with handcrafted features. Each hidden layer transforms the data into representations that are learned automatically using a general learning procedure[16 ]. Outstanding performance has been obtained on a various number of benchmark datasets. In particular, convolutional neural networks (CNNs) have been designed for solving complex image analysis tasks[15 ]
[17 ]. Such networks may be composed of several millions of neurons, which are interconnected in a two-dimensional (2-D) matrix-like structure of neurons and hence, can perform spatial convolutions within their internal structure. However, in supervised learning, a huge number of training data is required for the millions of parameters, which are usually not available in the medical domain. Medical applications solve that problem using pre-trained networks from other domains, and they have demonstrated outstanding results[18 ]. Inherent to this concept, the ilter coeficients of convolution operation that have been used previously for the handcrafting of features are determined intrinsically by the network.
However, most biosignals do not provide any 2-D structure, and as a result, deep learning models have not been used much in biosignal analytics. Some preliminary research has achieved positive outcomes for the analysis of biomedical signals using deep learning approaches. Recently, Kira-nyanz et aL,[19 ] have proposed 1-D CNN for ECG signal analysis. Similarly, recurrent neural networks (RNNs) are used to describe time-dependency in time-series data, namely phonocardiography (PCG) signals.
This survey offers a comprehensive overview of deep learning models applied to 1-D biosignals in both a methodology-driven and an application-focused perspective. In many papers, EEG is considered as a 2-D signal. The same holds for biosignals such as functional magnetic resonance imaging (fMRI) and magnetoencephalog-raphy (MEG) signals. To focus the review on 1-D biosignals, we have excluded such matrix-based spatial measures.
4 Deep Learning on Biosignals
In this section, we describe the method implemented to select relevant papers, the categories used to classify the papers, biosignals and their applications, and the clustering of papers according to the dimension and types of biosignals. Three clinical applications are particularly highlighted.
4.1 Selection of Papers
In this survey, 437 research papers were reviewed ([Fig. 2 ]). Existing databases (PubMed, Scopus and ACM) were queried with search terms for title, keywords, and abstract (see [Appendix 1 ]). Only papers published from January 2010 to Dec 2017 were considered. After duplicates were removed, a total of 382 records were obtained. Based on the title and the abstract of each paper, contributions that did not relate to deep learning (deined as having more than ive hidden layers) or 1-D biosignals were excluded. Based on a full text assessment, work related to EEG, fMRI, MEG as source signal and review papers were excluded. After careful inspection on the architecture of the deep learning models, 35 papers were excluded because the number of hidden layers was less than ive. This process yielded a final collection of 71 research papers.
Fig. 2 Paper selection process.
4.2 Categories to Classify Papers
Our analysis of the literature identified several criteria to categorize papers and approaches. The most important is the biosignals to which deep learning is applied. Besides ECG and EMG, some papers use a combination of multiple signals as input for the neural network. Moreover, biosignals can be 1-D (single lead) or composed of multi leads. In case of EEG, for instance, the multiple leads are arranged in a spatial matrix, which makes CNNs directly applicable. A 2-D spatial structure can also be generated using 2-D frequency transforms. Therefore, the origin, dimension, and type of biosignals are coded as B for “biosignal”, and denoted B(origin, dimension, type). We use simple numbers to indicate the instances in each of the criteria ([Fig. 3 ]).
Fig. 3 Classification of the parameters used for the selection of deep learning models. The dependencies are color coded. Note that A(..x) = N(x..) for all x in {1,2}.
The second category used to distinguish the various approaches is the application domain. When deep learning is applied to a biosignal, it can be used for simple signal enhancement, detection of uncertain patterns (computer-aided detection, CADe), clustering of the signal or parts of the signal, recognition of given patterns (computer-aided diagnostics, CADx), or prediction of future signal alterations or events. We call this the goal of the application. To train the network, data is needed. Such datasets are sometimes quite small (less than 100 records or less than 5 hours of total recording time), medium (up to 1,000 records or 50 hours of recording time), but sometimes relatively large (up to 10,000 records or 500 hours of recording time). They may have a label to indicate the ground truth (GT) or not. Therefore, the application is coded A for “application”, and denoted A(goal, GT-size, GT-type), and again simple numbers are used within the criteria to code the instances ([Fig. 3 ]).
Finally, the networks that are used for biosignal analysis differ. For instance, the learning type of the network may be supervised or unsupervised. Note that this criterion is strictly correlated with the type of GT data, which can be labeled or unlabeled, respectively. The training of the network can be scheduled offline, online, or in real time. Of course, the topology of the network is another important criterion, and the instances we have chosen here correspond to Section 2.2. Consequently, network categories are denoted N for “network”, and coded N(L-type, L-schedule, topology) ([Fig. 3 ]).
In summary, three categories of deep learning on biosignals have been identified, each comprised of three criteria. Since the type of GT data is directly linked to the type of network learning, only eight effective criteria remain. In total, 37 different instances are suggested. These instances have been used to code the papers retrieved from the literature review. For example, the paper of Rahhal et al.,[44 ] on active classification of ECG signals is coded as B(121)A(322)N(212).
4.3 Biosignals and their Application
Supervised learning is used in most applications: N(2..). Supervised learning is the ability of deep learning models to learn data with annotation. However, annotation (labeling) of the physiological signals requires expert knowledge and is often expensive and time-consuming. Unsupervised learning, coded as N(1..), is sometimes ineffective for multivariate inputs and ambulatory monitoring due to long-term time dependencies[6 ]. Based on the analysis of physiological signals, selection of generative and discriminative network topology is considered. Discriminative models are coded as A(2..) - A(4..). They are effective for the detection, clustering, and diagnostics of physiological signals. A discriminative model is capable of modeling the noisy data for training. Generative models are mainly used for the enhancement and prediction of the physiological signals. Models coded as N(..4) can predict and synthesize new partial input data at time t+1 based on the previous data at time t by learning the data. Generative models are also more robust to analyze noisy data. The characteristics of physiological signals play a vital role in the selection of deep learning models. If the physiological signal has a spatiotemporal structure, the model selected must incorporate both spatial and temporal coherence of the physiological signals using regularization. A CNN is considered as a good choice to handle both temporal and spatial data. However, selection criterions for the deep learning model should be more application-oriented and robust for input data types.
[Table 2 ] shows the codes for all the 71 papers that have been considered in this survey. There are only a few duplicates showing the diversity of research as well as demonstrating that our code is suitable for different approaches. Counting the total number of topology yields 15, 12, 34, and 3 for RBMs, auto-encoders, CNNs, and RNNs, respectively. In addition, there are 7 “Other” types of network topology, where the authors have combined several deep learning networks to improve performance.
Table 2
Coding schemes for the 71 papers selected
Code
Reference
Code
Reference
Code
Reference
B(111)A(212)N(213)
[19 ]
[45 ]
[46 ]
[47 ]
[48 ]
[49 ]
B(121)A(112)N(212)
[50 ]
[51 ]
[52 ]
[53 ]
B(311)A(312)N(214)
[67 ]
B(111)A(312)N(213)
[58 ]
[59 ]
[60 ]
B(121)A(212)N(213)
[61 ]
B(311)A(332)N(213)
[65 ]
B(111)A(312)N(214)
[62 ]
B(121)A(222)N(213)
[56 ]
B(311)A(512)N(214)
[73 ]
B(112)A(212)N(211)
[72 ]
B(121)A(222)N(215)
[12 ]
B(312)A(312)N(212)
[82 ]
B(112)A(212)N(214)
[74 ]
B(121)A(322)N(212)
[44 ]
B(312)A(412)N(213)
[92 ]
B(112)A(212)N(213)
[77 ]
B(121)A(332)N(213)
[64 ]
B(312)A(422)N(213)
[89 ]
B(112)A(312)N(215)
[80 ]
B(122)A(212)N(211)
[66 ]
B(312)A(422)N(215)
[86 ]
B(112)A(312)N(211)
[68 ]
B(122)A(311)N(111)
[70 ]
B(313)A(332)N(213)
[98 ]
[99 ]
[100 ]
B(112)A(122)N(214)
[63 ]
B(122)A(312)N(211)
[71 ]
B(412)A(112)N(211)
[106 ]
B(112)A(222)N(213)
[94 ]
B(122)A(312)N(215)
[69 ]
B(412)A(212)N(211)
[54 ]
B(113)A(211)N(113)
[93 ]
B(122)A(412)N(212)
[83 ]
B(413)A(211)N(123)
[57 ]
B(122)A(422)N(213)
[30 ]
B(413)A(212)N(213)
[107 ]
B(122)A(522)N(213)
[87 ]
B(512)A(212)N(213)
[76 ]
B(122)A(532)N(215)
[90 ]
B(612)A(212)N(213)
[79 ]
B(123)A(312)N(212)
[75 ]
B(711)A(412)N(212)
[23 ]
B(211)A(221)N(111)
[78 ]
B(812)A(512)N(215)
[96 ]
[108 ]
[109 ]
B(211)A(322)N(213)
[81 ]
B(d11)A(112)N(215)
[101 ]
B(212)A(312)N(211)
[84 ]
[85 ]
B(d11)A(312)N(211)
[102 ]
B(213)A(222)N(215)
[88 ]
B(d11)A(312)N(213)
[55 ]
B(213)A(312)N(213)
[91 ]
B(d11)A(412)N(223)
[103 ]
B(213)A(412)N(213)
[97 ]
B(d11)A(512)N(215)
[10 ]
B(213)A(422)N(213)
[95 ]
B(d12)A(412)N(212)
[104 ]
[105 ]
4.4 Clustering by Application and Biosignals
[Table 3 ] visualizes six clusters of current research with respect to the goal of application and the biosignal considered in the paper. A more comprehensive table with respect to the network architecture and the optimizers and regularizers used is given in [Appendix 2 ].
Table 3
Deep learning on biosignals with respect to the goal of the application and the origin of the biosignal (colors indicate the six clusters).
Application
n-ECG
1-ECG
EMG
PCG
PPG
Others
Multiple Sources
B(121)
B(122)
B(123)
B(111)
B(112)
B(113)
B(211)
B(212)
B(213)
B(311)
B(312)
B(313)
B(411)
B(412)
B(413)
B(c11)
B(c12)
B(c13)
B(d11)
B(d12)
B(d13)
Enhancement A(1..)
[50 ]
[51 ]
[52 ]
[53 ]
[63 ]
[106 ]
[101 ]
Detection A(2..)
[61 ]
[12 ]
[56 ]
[45 ]
[46 ]
[19 ]
[47 ]
[48 ]
[49 ]
[62 ]
[72 ]
[74 ]
[66 ]
[68 ]
[77 ]
[94 ]
[93 ]
[78 ]
[88 ]
[82 ]
[54 ]
[57 ]
[107 ]
[76 ]
1
[79 ]
2
Clustering A(3..)
[44 ]
[64 ]
[70 ]
[69 ]
[71 ]
[75 ]
[58 ]
[60 ]
[59 ]
[80 ]
[81 ]
[84 ]
[85 ]
[91 ]
[67 ]
[65 ]
[98 ]
[99 ]
[100 ]
[102 ]
[55 ]
Diagnostics A(4..)
[30 ]
[83 ]
[95 ]
[97 ]
[86 ]
[92 ]
[89 ]
[23 ]
3
[103 ]
[104 ]
[105 ]
Prediction A(5..)
[87 ]
[90 ]
[73 ]
[96 ]
4
[108 ]
4
[109 ]
4
[10 ]
1 B(512);2 B(612); 3 B(711); 4 B(812)
4.4.1 Multi-lead ECG: B(12.)
Generally, multi-lead ECG signals are highly complex and have a large volume in size. Also, the multi-lead structure may support CNN and RNN architectures to apply directly.
Deep Learning Applied to Multi-lead ECG RawData: B(121)
We have identified nine papers that apply deep learning methods to enhancement[50 ]
[51 ]
[52 ]
[53 ], detection[12 ]
[56 ]
[61 ], and clustering[44 ]
[64 ]. This is due to the capability of deep learning methods to extract strong features and better data representation. All approaches used raw data as input, but different deep learning methods for processing. For instance, Pourbbabee et al.,[61 ] and Zhou et al.,[12 ] suggested applying CNN and LSTM on biosignals, while others used different types of auto-encoders for the analysis. Pourbbabee et al., divided 30 min signals into six segments of 5 min, which were used as input to the CNN. In the case of Zhou et al., individual heartbeats were extracted from the ECG and fed to the LSTM model.
All papers, except one[61 ], used the publicly available standard ECG databases to evaluate the performance of their proposed methods. For example, Zhou et al., applied lead CNN and LSTM on the Massachusetts Institute of Technology, Beth Israel Hospital (MIT-BIH) database[110 ] and they used the trained model for the validation of the Chinese cardiovascular disease database (CCDD)[111 ]. Similarly, Rahhal et al.,[44 ] validated their method on two other databases, namely the St. Petersburg Institute of Cardiological Technics (INCART) database[112 ] and the supraventricular arrhythmia database (SVDB)[113 ]. Liu et al.,[56 ] used a multi-lead CNN to detect myocardial infarction on ECG signals obtained from the German Physikalisch Technische Bundesanstalt (PTB) database[114 ]. [Appendix 2 ] provides a more comprehensive list of the databases used in the papers for the evaluation of their methods as well as the results and performances obtained, as reported by the authors.
It is observed that CNN and LSTM are used for the clustering task while auto-encoders are used for the application of signal enhancement and reconstruction. Besides, Gogna et al.,[50 ] used the Split-Bregman optimization technique to overcome the issues of error backpropagation for unambiguous auto-encoders learning, while the other authors have applied the normal auto-encoder model. Jin and Dong[64 ] developed and implemented a multi-lead CNN model for the classification of normal/abnormal signals. Further, the rule inference approach is applied to improve the performance of the training model. In general, the results obtained by the nine papers were above 90% accuracy. Zhou et al., obtained an average accuracy of 99.41% and 98.03% on MIT-BIH and CCDD, respectively, by combining CNN and LSTM as base classifier in the proposed network.
Deep Learning Applied to Features from Multi-lead ECG: B(122)
A cluster of seven papers applied features derived from multi-lead ECG to deep learning networks. Such features may be RR interval[70 ], the QRS complex[69 ], or morphological and temporal features[83 ]. In contrast to raw data-based approaches, enhancement is not a target application but it improved detection, clustering, and diagnostics. Features extracted from biosignals are used as a key element for the identification of boundary conditions to separate classes in the multi-dimensional feature space. Five applications were evaluated on publicly available databases[30 ]
[69 ]
[70 ]
[71 ]
[83 ], while the others recorded private data[87 ]
[90 ]. All papers applied to clustering used deep belief networks for processing[66 ]
[75 ]
[76 ]. The results reported by the papers were all above 85%. In particular, Majumdar and Ward[69 ] showed an average accuracy of 97.0% for classification. Zhang et al.,[30 ] considered eight diverse ECG databases to evaluate the performance of their system and obtained 93.5%, 96.5%, and 90.5% for overall human identification task, normal subject identification, and abnormal subject identification, respectively.
4.4.2 Single-lead Biosignals: B(x1.)
This type of signal is formed from a single sampled measurement, usually derived as an electrical potential, and usually of pseudo-periodic nature, disregarding whether the biosignal is cardiac, respiration, or blood pressure. Therefore, our analysis is pooled with x = {1, 2, 3, 4, 5, 6, 7, 8, 9} where x represents different physiological signals as shown in [Table 3 ].
Deep Learning Applied to Single-lead Bio-signal Raw Data: B(x11)
In this group of 16 papers[19 ]
[23 ]
[45 ]
[46 ]
[47 ]
[48 ]
[49 ]
[58 ]
[59 ]
[60 ]
[62 ]
[65 ]
[67 ]
[73 ]
[78 ]
[81 ], single-lead raw data is fed as input to the deep learning methods for applications that focus on detection; for instance, the paper of Acharya et al.,[46 ] aimed at automated detection of abnormalities in ECG signals. The approach of Kiranyaz et al.,[19 ] focused on a personalized monitoring system for arrhythmias. The high number of research papers may come from the fact that single-channel biosignals are simple, easy to acquire, and effective in decision making system. Out of the 16 papers, 12 used 1-D CNN models[19 ]
[45 ]
[46 ]
[47 ]
[48 ]
[49 ]
[58 ]
[59 ]
[60 ]
[62 ]
[65 ]
[81 ]; two were based on auto-encoders[23 ]
[73 ]; one used an RNN model[67 ]; and one used an RBM model[78 ].
Out of 10 papers using single lead ECG inputs, seven[19 ]
[47 ]
[49 ]
[58 ]
[59 ]
[60 ]
[62 ], two[45 ]
[46 ] , and one[114 ] were evaluated on the MIT-BIH, the German Physikalisch Technische Bundesanstalt (PTB), and the Fantasia database, respectively (see [Appendix 2 ]).
In general, the overall results were above 94% accuracy. In particular, Kiranyaz et al.,[60 ] obtained an average accuracy of 99.00% on the MIT-BIH database and Lei et al.,[45 ] reported 99.33% accuracy on the PTB database[114 ]. In an interesting study, Schozel and Dominik[73 ] used an RNN model to study PCG signals and they generated an artificial ECG signals from PCG. Papers describing works from biosignals other than ECG obtained an overall accuracy above 79%. For instance, Zhang et al.,[23 ] reported an average accuracy of 80.22% for emotion recognition using respiration signal. Similarly, Ryu et al.,[65 ] obtained an overall accuracy of 79.55% for the classification of heart sounds using PCG signals, which may be due to the unbalanced datasets used to validate the proposed model. In another study, Atzori et al.,[81 ] proposed to classify the movements of a prosthetic hand using a CNN model. EMG signals from amputee subjects were recorded and validated using the proposed models. They obtained the lowest accuracy of38.09% for the amputee's datasets.
Deep Learning Applied to Features from Single-lead Biosignals: B(x12)
The number of scientific papers with features as input to the deep learning method is high. In this cluster, features were extracted from the single-lead biosignals: 21 papers have been retrieved on this topic covering applications such as enhancement[63 ]
[106 ], detection[54 ]
[66 ]
[68 ]
[72 ]
[74 ]
[76 ]
[77 ]
[79 ]
[82 ]
[94 ], clustering[80 ]
[84 ]
[85 ], diagnostics[86 ]
[89 ]
[92 ], and prediction[96 ]
[108 ]
[109 ]. Examples of features were RR interval[72 ]
[74 ], mean absolute value of signal[85 ], or frequency spectral coeficients[92 ]. Nine out of 21 papers used unsupervised learning and deep belief networks to perform detection[54 ]
[66 ]
[68 ]
[72 ]
[82 ]
[94 ], clustering[84 ]
[85 ], diagnostics[86 ], or prediction[96 ]
[108 ]
[109 ], while the others used supervised learning[63 ]
[74 ]
[76 ]
[79 ]
[80 ]
[89 ]
[92 ].
In 10 papers, the authors conducted experiments on public databases[66 ]
[68 ]
[76 ]
[77 ]
[80 ]
[84 ]
[85 ]
[86 ]
[94 ]
[96 ]. In two papers, signals were simulated[63 ]
[79 ]. In four papers, signals were acquired from the 2016 Physionet Challenge database[116 ]. Only four papers presented an evaluation using the authors' own data[54 ]
[74 ]
[89 ]
[92 ].
The results were diverse. In particular, Mohebi et al.,[79 ] generated continuous glucose monitoring (CGM) signals to identify type 2 diabetes patients using a CNN model. Lee and Chang[96 ] combined a deep belief network with bootstrapping to estimate blood pressure; bootstrapping enhanced the performance by about 10%. In general, the results were above 83% in accuracy for the various applications. Specifically, Jindal et al.,[54 ] obtained an accuracy of 96.1% for biometric identification using PPG signals. Some papers with inputs other than ECG signals obtained an accuracy of 71.39%, e.g., for congestive heart failure detection[72 ].
In an interesting study, Lee and Chang[108 ]
[109 ] developed an ensemble of deep belief network-deep neural network (DBN-DNN) approach for the estimation of oscillometric blood pressure with low data samples. Further, artificial features were synthesized to overcome the issue of low data samples for training deep learning models. To improve the prediction rate of cardiovascular events, Kim et al.,[106 ] used a DBN model on blood pressure signals to reduce artifacts. Similarly, Zhenjie et al.,[77 ] proposed a multiscale CNN model for the detection of atrial fibrillation from ECG signals and achieved 98.18% accuracy on the MIT-BIH database[110 ].
Deep Learning Applied to 2D-transformed Biosignals: B(x13)
Ten papers describe utilizing a 2-D spectrum as input to the deep learning system[57 ]
[88 ]
[91 ]
[93 ]
[95 ]
[97 ]
[98 ]
[99 ]
[100 ]
[107 ]. It is reported that 2-D spectra of transformed biosignals describe the spatial and temporal information of the signals. Also, it allows the direct use of CNNs. All the papers used supervised learning to train a CNN and the applications are detection[57 ]
[88 ]
[93 ]
[107 ], classification[91 ]
[98 ]
[99 ]
[100 ], and diagnostics[95 ]
[97 ].
Out of the 10 papers, five used publicly available databases to evaluate the performance of their systems[91 ]
[97 ]
[98 ]
[99 ]
[100 ] and the other ive used experimental protocols for signal acquisition[57 ]
[88 ]
[93 ]
[95 ]
[107 ] (see [Appendix 2 ]). In general, the results obtained were above 90% of accuracy. In particular, Dominguez-Morales et al.,[98 ] obtained 97% of accuracy using a modified version of the AlexNet model[40 ]. Interestingly, the authors converted the PCG signal obtained from a neuromorphic auditory sensor into address-event representations, which were later plotted as 2-D sonogram for deep learning. In order to improve the prediction ability of their deep learning model, Xia et al.,[88 ] used a hybrid model combining CNN with RNN.
Most of the papers in this cluster utilized supervised learning. Out of eight papers, six papers used publicly available databases to validate the model proposed. Cote-Allard et al.,[95 ] developed and tested a transfer-learning-based hand gesture learning system. A trained model, obtained from other datasets, was used to detect and improve different hand gestures. Xia et al.,[93 ] used 2-D-trans-formed ECG signals as input for the detection of pathological conditions with an average accuracy of 98.63%.
4.4.3 Multiple Source Biosignals
Seven papers proposed approaches to manage multiple source biosignals. Except two papers[104 ]
[105 ], all used the raw data to feed the deep learning system for signal enhancement[101 ], clustering[55 ]
[102 ], diagnostics[103 ]
[104 ]
[105 ], and prediction[10 ]. For instance, Bengio et al.,[10 ] presented a major breakthrough applying a CNN for affect classification from raw physiological signals. The accuracy obtained by all papers was above 85%. The performance of the systems was dependent on the combination of the biosignals used for the analysis. For example, Zhang et al.,[102 ] obtained 98.49% of accuracy for the automated classification of sleep stages using sparse DBNs. The authors used EMG, EEG, and ECG collectively to investigate the impact of different biosignals on the algorithm performance. In[10 ], the authors used a CNN model on spe-cific signals from the database for emotion analysis using physiological signals (DEAP)[13 ], namely skin conductance and blood volume pulse. Out of the seven papers, three used a CNN model[10 ]
[55 ]
[103 ], two were based on an ensemble approach[104 ]
[105 ], one used a DBN model[102 ], and another used a DNN model[101 ]. Belo et al.,[101 ] proposed an interesting approach for generalized biosignals learning and synthesis using DNN models. Chow[103 ] proposed an online biometric recognition system using ECG and EDA signals. In this approach, physiological signals are acquired and trained in an online mode using pre-trained networks. Three papers[10 ]
[104 ]
[105 ] have used the publicly available DEAP database for the validation (see [Appendix 2 ]).
4.5 Clinical Application
Although most of the papers are focused on the methodology of biosignal analysis using deep learning, there are some examples that present a real-world application of such methods. We have selected three of them as examples.
HeartID[30 ] used a multiresolution CNN for ECG-based biometric identification of humans in smart health applications. First, the ECG stream was blindly split into segments of two seconds disregarding the R-wave positions. Then, the segments were transformed to the Wavelet domain to reveal more detailed time and frequency characteristics in multiple resolutions. An auto-correlation was performed to each wavelet to remove the blind-segmentation-based phase shift. Despite using wavelet-transformed output (image) as input to the CNN approach, a 1-D-CNN was applied to each individual wavelet component to clearly learn the local patterns. Furthermore, each wavelet component was considered as a 1-D-image (feature vector) and was fed as input into the 1-D-CNN to learn the intrinsic pattern of the individuals. To evaluate the system, several publicly available databases were re-sampled to 360 Hz yielding 100 normal and 120 abnormal datasets (arrhythmia, malignant ventricular ectopy, ST depression). The correct identification rate yielded 96.5% and 93.5%, respectively. The system could be generalized to other quasi-periodic biosignals such as PPG, BCG, or a multi-modal combination of them.
Another remarkable system focused on personalized event prediction to detect the occurrences of arrhythmias or abnormal beats in the ECG signal[19 ]. Facing the challenge that an ECG of a healthy person without any history of cardiac arrhythmias should exhibit no abnormal beats, the authors developed a set of 464 ilters performing regularized least-squares optimization to synthesize individual pathologic patterns out of healthy ECG cycles. For the early detection of cardiac arrhythmias, a personalized training dataset was created synthetically over the subject's average normal beat. A one-dimensional (1-D) CNN dedicated to that specific subject was trained and used to monitor streamed real-time recordings. The system's evaluation was based on the MIT-BIH database[110 ], where the first 5 minutes of recordings were used to form the average normal beats. Overall, 34 patient records with a total of 63,341 beats were selected for the evaluation. The probability of detecting at least one among the first three abnormal beats was 99.4%, supporting a meaningful clinical application of the system.
Generally, deep learning is considered to be computationally expensive. Pourbabaee et al.,[61 ] developed a computationally efficient screening system for patients with paroxysmal atrial ibrillation (PAF). The authors used a large volume of ECG time-series data and a deep CNN to identify unique features patterns for screening patients. The 30-min-ute raw ECG signals were first divided into six equal segments. Later, these short ECG segments were used as input to the CNN to obtain robust deep features which were then passed through standard classifiers namely end-to-end CNN, k-nearest neighbors (kNN), support vector machine (SVM), and multilayer perceptron. To evaluate the performance of the system, signals were taken from the PAF prediction challenge database[117 ]. The authors showed that the CNN-learned features can be classified conventionally, which is computationally inexpensive and eficient. The CNN network did not require any prior domain knowledge for learning the features and, therefore, the approach could potentially be utilized for other biosignals. The network yielded an accuracy of 93.60% and 92.96% using an end-to-end CNN and a Gaussian kernel SVM, respectively.
5 Discussion and Conclusion
A comprehensive literature research was performed to identify works combining biosignals of any nature (except EEG due to its 2-D spatial structure) with deep learning networks. The examined bibliographic databases (PubMed, Scopus, and ACM) provided relevant papers including works publisehd in IEEE or SPIE conference proceedings. We have developed a scheme to differentiate existing research by biosignals, applications, and networks that yields 30 instances from seven independent categories. Such a graph may yield more than 25,000 different codes, but due to internal dependencies (network topology and application goal), that figure reduces to 6,480. As it is shown in [Table 2 ], research is diverse, but there are two codes occurring multiple (more than three) times, B(111) A(212)N(213) and B(121)A(112)n(212). The code B(111)A(212)N(213) is found to be used six times on single lead ECG signals to perform pattern detection. Similarly, B(121)A(112)N(212) is used four times for multi-lead ECG targeting signal enhancement for better clinical decision supports.
The overall accuracy obtained by methods using raw ECG signals is higher than those with features as input. This may be due to the ability of deep learning to capture features which optimally represent the biosignal specific to the problem. The overall performance of the papers that utilized the authors’ own signal acquisition protocol was lower than those experimenting with publicly available databases. This may be due to the presence of noise and artifacts in the recordings. To date, most work has been performed on relatively small datasets. Five papers used 3,126 recordings from the 2016 Physionet Challenge database for heart sound analysis, and one paper mentioned utilizing 1,075 records[92 ]. This inding is in line with Deserno and Marx[7 ], who reported the need for more realistic reference databases.
Most of the papers (n = 30) use features which are pre-extracted from the biosignals as input to the deep learning models. This may result from the assumption that the performance of machine learning is improved when using pre-computed features as compared to noisy raw data since a feature vector is much smaller than the signal itself and hence this reduces the required volume of training data. However, the basic idea of deep learning is to automatically identify patterns from large volumes of biosignals without human involvement. On the other hand, only 11 papers (less than 15%) applied a 2-D transform to take advantage of the CNN matrix-like architecture. This igure was much lower than anticipated, but may be the result of the fact that 1-D CNN performs well.
In conclusion, after having shown their impact on image and video analysis, deep learning approaches have become successfully applied to the analysis of 1-D biosignals. The number of published work increases continuously, and the results are promising. However, there is a large variety of signals, applications, and reference data, which makes an objective comparison of the published approaches difficult.
In future, standardization of network topology and parameters is expected. Reference data recorded with novel devices is required, which represents not only pathology but also normal recordings from healthy subjects, as well as outliers, drop out sequences, and noise.