1 Introduction
Biosignals are electrical, mechanical, thermal, or other signals measured over the
time from the human body or from other organic tissue. They became applicable for
medical diagnoses in 1895 when Willem Einthoven invented electrocardiography (ECG)
as a clinical usable, non-invasive device. An ECG device measures the electrical activity
of the heart muscle and depicts the complete cardiac cycle on an individual heartbeat
using electrical polarization-depolarization patterns of the heart[1 ]. Since then, a huge variety of signals have been discovered that can be derived
from the surface (skin) or from inside the human body. Prominent examples include
electroencephalography (EEG), that depicts the activity of the brain recording voltage
fluctuations from the scalp that result from ionic current within the neurons of the
brain[2 ]; electromyography (EMG), that records the electric potential generated by muscle
cells when these cells are electrically or neurologically activated[3 ]; photoplethys-mography (PPG), that depicts the volumetric changes of an organ (e.g.,
the microvascular bed under the skin) over the time by recording changes in light
absorption[4 ]; or ballistocardiography (BCG), that monitors the heart activity recording ballistic
forces (acceleration) on the chest[5 ].
Initially, analysis of biosignals was done purely manually. In the early 1980s, low-level
signal processing was applied for noise reduction and filtering. Then, feature extraction
and classification were implemented. However, these early systems were time-consuming
and suffered from an unreliable accuracy[6 ].
Later from the 1990s, time-series models and supervised expert systems were used for
feature extraction, and statistical classifiers were applied to support diagnosis.
Over the last few decades, automated analysis of biosignals has turned into a core
component for computer-aided diagnosis (CAD) and clinical decision-making. However,
existing approaches are not effective for high-dimensional, more complex, and real-world
noisy data that is continuously monitored using portable devices[7 ]. Therefore, the major goal of current research is to increase accuracy and speed
of diagnostic systems towards event prediction from real-time signal analysis[7 ]
[8 ].
Artificial intelligence and machine learning help in automated and effective analysis
of medical data[9 ]. Neural networks are one of the well-known techniques used to develop high-level
expert systems for solving a wide range of medical tasks such as clustering, detection,
and recognition of diseases[10 ]. Traditionally, most expert systems rely on hand-crafted features. As in many papers[10 ]
[11 ]
[12 ], we refer to “hand-crafted features” when the raw data is transformed before it
is entered to the input layer of the neural network, and this transformation is performed
or decided by a human. However, biosignals are generally non-linear, non-stationary,
dynamic, and complex in nature[13 ]. Handcrafted or manually selected features are time-consuming, not optimal, domain-specific,
and they require specific expert knowledge[6 ].
Neurons are the basic processing units in a neural network and they perform a non-linear
transformation of the data input from neurons connected in the previous layer. Such
a structure is incapable of processing raw biosignals[14 ]. Therefore, automated extraction and selection of task-specific as well as robust
features are necessary to solve the complex real-world problems[15 ].
Deep learning is a machine learning approach that is based on a deep network architecture
composed of multiple hidden layers. We have considered machine learning, disregarding
whether it is performed supervised or unsupervised, as “traditional” if it is composed
of five or less hidden layers. Contrarily with deep learning, feature extraction and
selection are performed within the network that is fed with raw (or low level-processed)
data but not with handcrafted features. Each hidden layer transforms the data into
representations that are learned automatically using a general learning procedure[16 ]. Outstanding performance has been obtained on a various number of benchmark datasets.
In particular, convolutional neural networks (CNNs) have been designed for solving
complex image analysis tasks[15 ]
[17 ]. Such networks may be composed of several millions of neurons, which are interconnected
in a two-dimensional (2-D) matrix-like structure of neurons and hence, can perform
spatial convolutions within their internal structure. However, in supervised learning,
a huge number of training data is required for the millions of parameters, which are
usually not available in the medical domain. Medical applications solve that problem
using pre-trained networks from other domains, and they have demonstrated outstanding
results[18 ]. Inherent to this concept, the ilter coeficients of convolution operation that have
been used previously for the handcrafting of features are determined intrinsically
by the network.
However, most biosignals do not provide any 2-D structure, and as a result, deep learning
models have not been used much in biosignal analytics. Some preliminary research has
achieved positive outcomes for the analysis of biomedical signals using deep learning
approaches. Recently, Kira-nyanz et aL,[19 ] have proposed 1-D CNN for ECG signal analysis. Similarly, recurrent neural networks
(RNNs) are used to describe time-dependency in time-series data, namely phonocardiography
(PCG) signals.
This survey offers a comprehensive overview of deep learning models applied to 1-D
biosignals in both a methodology-driven and an application-focused perspective. In
many papers, EEG is considered as a 2-D signal. The same holds for biosignals such
as functional magnetic resonance imaging (fMRI) and magnetoencephalog-raphy (MEG)
signals. To focus the review on 1-D biosignals, we have excluded such matrix-based
spatial measures.
4 Deep Learning on Biosignals
In this section, we describe the method implemented to select relevant papers, the
categories used to classify the papers, biosignals and their applications, and the
clustering of papers according to the dimension and types of biosignals. Three clinical
applications are particularly highlighted.
4.1 Selection of Papers
In this survey, 437 research papers were reviewed ([Fig. 2 ]). Existing databases (PubMed, Scopus and ACM) were queried with search terms for
title, keywords, and abstract (see [Appendix 1 ]). Only papers published from January 2010 to Dec 2017 were considered. After duplicates
were removed, a total of 382 records were obtained. Based on the title and the abstract
of each paper, contributions that did not relate to deep learning (deined as having
more than ive hidden layers) or 1-D biosignals were excluded. Based on a full text
assessment, work related to EEG, fMRI, MEG as source signal and review papers were
excluded. After careful inspection on the architecture of the deep learning models,
35 papers were excluded because the number of hidden layers was less than ive. This
process yielded a final collection of 71 research papers.
Fig. 2 Paper selection process.
4.2 Categories to Classify Papers
Our analysis of the literature identified several criteria to categorize papers and
approaches. The most important is the biosignals to which deep learning is applied.
Besides ECG and EMG, some papers use a combination of multiple signals as input for
the neural network. Moreover, biosignals can be 1-D (single lead) or composed of multi
leads. In case of EEG, for instance, the multiple leads are arranged in a spatial
matrix, which makes CNNs directly applicable. A 2-D spatial structure can also be
generated using 2-D frequency transforms. Therefore, the origin, dimension, and type
of biosignals are coded as B for “biosignal”, and denoted B(origin, dimension, type).
We use simple numbers to indicate the instances in each of the criteria ([Fig. 3 ]).
Fig. 3 Classification of the parameters used for the selection of deep learning models.
The dependencies are color coded. Note that A(..x) = N(x..) for all x in {1,2}.
The second category used to distinguish the various approaches is the application
domain. When deep learning is applied to a biosignal, it can be used for simple signal
enhancement, detection of uncertain patterns (computer-aided detection, CADe), clustering
of the signal or parts of the signal, recognition of given patterns (computer-aided
diagnostics, CADx), or prediction of future signal alterations or events. We call
this the goal of the application. To train the network, data is needed. Such datasets
are sometimes quite small (less than 100 records or less than 5 hours of total recording
time), medium (up to 1,000 records or 50 hours of recording time), but sometimes relatively
large (up to 10,000 records or 500 hours of recording time). They may have a label
to indicate the ground truth (GT) or not. Therefore, the application is coded A for
“application”, and denoted A(goal, GT-size, GT-type), and again simple numbers are
used within the criteria to code the instances ([Fig. 3 ]).
Finally, the networks that are used for biosignal analysis differ. For instance, the
learning type of the network may be supervised or unsupervised. Note that this criterion
is strictly correlated with the type of GT data, which can be labeled or unlabeled,
respectively. The training of the network can be scheduled offline, online, or in
real time. Of course, the topology of the network is another important criterion,
and the instances we have chosen here correspond to Section 2.2. Consequently, network
categories are denoted N for “network”, and coded N(L-type, L-schedule, topology)
([Fig. 3 ]).
In summary, three categories of deep learning on biosignals have been identified,
each comprised of three criteria. Since the type of GT data is directly linked to
the type of network learning, only eight effective criteria remain. In total, 37 different
instances are suggested. These instances have been used to code the papers retrieved
from the literature review. For example, the paper of Rahhal et al.,[44 ] on active classification of ECG signals is coded as B(121)A(322)N(212).
4.3 Biosignals and their Application
Supervised learning is used in most applications: N(2..). Supervised learning is the
ability of deep learning models to learn data with annotation. However, annotation
(labeling) of the physiological signals requires expert knowledge and is often expensive
and time-consuming. Unsupervised learning, coded as N(1..), is sometimes ineffective
for multivariate inputs and ambulatory monitoring due to long-term time dependencies[6 ]. Based on the analysis of physiological signals, selection of generative and discriminative
network topology is considered. Discriminative models are coded as A(2..) - A(4..).
They are effective for the detection, clustering, and diagnostics of physiological
signals. A discriminative model is capable of modeling the noisy data for training.
Generative models are mainly used for the enhancement and prediction of the physiological
signals. Models coded as N(..4) can predict and synthesize new partial input data
at time t+1 based on the previous data at time t by learning the data. Generative
models are also more robust to analyze noisy data. The characteristics of physiological
signals play a vital role in the selection of deep learning models. If the physiological
signal has a spatiotemporal structure, the model selected must incorporate both spatial
and temporal coherence of the physiological signals using regularization. A CNN is
considered as a good choice to handle both temporal and spatial data. However, selection
criterions for the deep learning model should be more application-oriented and robust
for input data types.
[Table 2 ] shows the codes for all the 71 papers that have been considered in this survey.
There are only a few duplicates showing the diversity of research as well as demonstrating
that our code is suitable for different approaches. Counting the total number of topology
yields 15, 12, 34, and 3 for RBMs, auto-encoders, CNNs, and RNNs, respectively. In
addition, there are 7 “Other” types of network topology, where the authors have combined
several deep learning networks to improve performance.
Table 2
Coding schemes for the 71 papers selected
Code
Reference
Code
Reference
Code
Reference
B(111)A(212)N(213)
[19 ]
[45 ]
[46 ]
[47 ]
[48 ]
[49 ]
B(121)A(112)N(212)
[50 ]
[51 ]
[52 ]
[53 ]
B(311)A(312)N(214)
[67 ]
B(111)A(312)N(213)
[58 ]
[59 ]
[60 ]
B(121)A(212)N(213)
[61 ]
B(311)A(332)N(213)
[65 ]
B(111)A(312)N(214)
[62 ]
B(121)A(222)N(213)
[56 ]
B(311)A(512)N(214)
[73 ]
B(112)A(212)N(211)
[72 ]
B(121)A(222)N(215)
[12 ]
B(312)A(312)N(212)
[82 ]
B(112)A(212)N(214)
[74 ]
B(121)A(322)N(212)
[44 ]
B(312)A(412)N(213)
[92 ]
B(112)A(212)N(213)
[77 ]
B(121)A(332)N(213)
[64 ]
B(312)A(422)N(213)
[89 ]
B(112)A(312)N(215)
[80 ]
B(122)A(212)N(211)
[66 ]
B(312)A(422)N(215)
[86 ]
B(112)A(312)N(211)
[68 ]
B(122)A(311)N(111)
[70 ]
B(313)A(332)N(213)
[98 ]
[99 ]
[100 ]
B(112)A(122)N(214)
[63 ]
B(122)A(312)N(211)
[71 ]
B(412)A(112)N(211)
[106 ]
B(112)A(222)N(213)
[94 ]
B(122)A(312)N(215)
[69 ]
B(412)A(212)N(211)
[54 ]
B(113)A(211)N(113)
[93 ]
B(122)A(412)N(212)
[83 ]
B(413)A(211)N(123)
[57 ]
B(122)A(422)N(213)
[30 ]
B(413)A(212)N(213)
[107 ]
B(122)A(522)N(213)
[87 ]
B(512)A(212)N(213)
[76 ]
B(122)A(532)N(215)
[90 ]
B(612)A(212)N(213)
[79 ]
B(123)A(312)N(212)
[75 ]
B(711)A(412)N(212)
[23 ]
B(211)A(221)N(111)
[78 ]
B(812)A(512)N(215)
[96 ]
[108 ]
[109 ]
B(211)A(322)N(213)
[81 ]
B(d11)A(112)N(215)
[101 ]
B(212)A(312)N(211)
[84 ]
[85 ]
B(d11)A(312)N(211)
[102 ]
B(213)A(222)N(215)
[88 ]
B(d11)A(312)N(213)
[55 ]
B(213)A(312)N(213)
[91 ]
B(d11)A(412)N(223)
[103 ]
B(213)A(412)N(213)
[97 ]
B(d11)A(512)N(215)
[10 ]
B(213)A(422)N(213)
[95 ]
B(d12)A(412)N(212)
[104 ]
[105 ]
4.4 Clustering by Application and Biosignals
[Table 3 ] visualizes six clusters of current research with respect to the goal of application
and the biosignal considered in the paper. A more comprehensive table with respect
to the network architecture and the optimizers and regularizers used is given in [Appendix 2 ].
Table 3
Deep learning on biosignals with respect to the goal of the application and the origin
of the biosignal (colors indicate the six clusters).
Application
n-ECG
1-ECG
EMG
PCG
PPG
Others
Multiple Sources
B(121)
B(122)
B(123)
B(111)
B(112)
B(113)
B(211)
B(212)
B(213)
B(311)
B(312)
B(313)
B(411)
B(412)
B(413)
B(c11)
B(c12)
B(c13)
B(d11)
B(d12)
B(d13)
Enhancement A(1..)
[50 ]
[51 ]
[52 ]
[53 ]
[63 ]
[106 ]
[101 ]
Detection A(2..)
[61 ]
[12 ]
[56 ]
[45 ]
[46 ]
[19 ]
[47 ]
[48 ]
[49 ]
[62 ]
[72 ]
[74 ]
[66 ]
[68 ]
[77 ]
[94 ]
[93 ]
[78 ]
[88 ]
[82 ]
[54 ]
[57 ]
[107 ]
[76 ]
1
[79 ]
2
Clustering A(3..)
[44 ]
[64 ]
[70 ]
[69 ]
[71 ]
[75 ]
[58 ]
[60 ]
[59 ]
[80 ]
[81 ]
[84 ]
[85 ]
[91 ]
[67 ]
[65 ]
[98 ]
[99 ]
[100 ]
[102 ]
[55 ]
Diagnostics A(4..)
[30 ]
[83 ]
[95 ]
[97 ]
[86 ]
[92 ]
[89 ]
[23 ]
3
[103 ]
[104 ]
[105 ]
Prediction A(5..)
[87 ]
[90 ]
[73 ]
[96 ]
4
[108 ]
4
[109 ]
4
[10 ]
1 B(512);2 B(612); 3 B(711); 4 B(812)
4.4.1 Multi-lead ECG: B(12.)
Generally, multi-lead ECG signals are highly complex and have a large volume in size.
Also, the multi-lead structure may support CNN and RNN architectures to apply directly.
Deep Learning Applied to Multi-lead ECG RawData: B(121)
We have identified nine papers that apply deep learning methods to enhancement[50 ]
[51 ]
[52 ]
[53 ], detection[12 ]
[56 ]
[61 ], and clustering[44 ]
[64 ]. This is due to the capability of deep learning methods to extract strong features
and better data representation. All approaches used raw data as input, but different
deep learning methods for processing. For instance, Pourbbabee et al.,[61 ] and Zhou et al.,[12 ] suggested applying CNN and LSTM on biosignals, while others used different types
of auto-encoders for the analysis. Pourbbabee et al., divided 30 min signals into
six segments of 5 min, which were used as input to the CNN. In the case of Zhou et
al., individual heartbeats were extracted from the ECG and fed to the LSTM model.
All papers, except one[61 ], used the publicly available standard ECG databases to evaluate the performance
of their proposed methods. For example, Zhou et al., applied lead CNN and LSTM on
the Massachusetts Institute of Technology, Beth Israel Hospital (MIT-BIH) database[110 ] and they used the trained model for the validation of the Chinese cardiovascular
disease database (CCDD)[111 ]. Similarly, Rahhal et al.,[44 ] validated their method on two other databases, namely the St. Petersburg Institute
of Cardiological Technics (INCART) database[112 ] and the supraventricular arrhythmia database (SVDB)[113 ]. Liu et al.,[56 ] used a multi-lead CNN to detect myocardial infarction on ECG signals obtained from
the German Physikalisch Technische Bundesanstalt (PTB) database[114 ]. [Appendix 2 ] provides a more comprehensive list of the databases used in the papers for the evaluation
of their methods as well as the results and performances obtained, as reported by
the authors.
It is observed that CNN and LSTM are used for the clustering task while auto-encoders
are used for the application of signal enhancement and reconstruction. Besides, Gogna
et al.,[50 ] used the Split-Bregman optimization technique to overcome the issues of error backpropagation
for unambiguous auto-encoders learning, while the other authors have applied the normal
auto-encoder model. Jin and Dong[64 ] developed and implemented a multi-lead CNN model for the classification of normal/abnormal
signals. Further, the rule inference approach is applied to improve the performance
of the training model. In general, the results obtained by the nine papers were above
90% accuracy. Zhou et al., obtained an average accuracy of 99.41% and 98.03% on MIT-BIH
and CCDD, respectively, by combining CNN and LSTM as base classifier in the proposed
network.
Deep Learning Applied to Features from Multi-lead ECG: B(122)
A cluster of seven papers applied features derived from multi-lead ECG to deep learning
networks. Such features may be RR interval[70 ], the QRS complex[69 ], or morphological and temporal features[83 ]. In contrast to raw data-based approaches, enhancement is not a target application
but it improved detection, clustering, and diagnostics. Features extracted from biosignals
are used as a key element for the identification of boundary conditions to separate
classes in the multi-dimensional feature space. Five applications were evaluated on
publicly available databases[30 ]
[69 ]
[70 ]
[71 ]
[83 ], while the others recorded private data[87 ]
[90 ]. All papers applied to clustering used deep belief networks for processing[66 ]
[75 ]
[76 ]. The results reported by the papers were all above 85%. In particular, Majumdar
and Ward[69 ] showed an average accuracy of 97.0% for classification. Zhang et al.,[30 ] considered eight diverse ECG databases to evaluate the performance of their system
and obtained 93.5%, 96.5%, and 90.5% for overall human identification task, normal
subject identification, and abnormal subject identification, respectively.
4.4.2 Single-lead Biosignals: B(x1.)
This type of signal is formed from a single sampled measurement, usually derived as
an electrical potential, and usually of pseudo-periodic nature, disregarding whether
the biosignal is cardiac, respiration, or blood pressure. Therefore, our analysis
is pooled with x = {1, 2, 3, 4, 5, 6, 7, 8, 9} where x represents different physiological
signals as shown in [Table 3 ].
Deep Learning Applied to Single-lead Bio-signal Raw Data: B(x11)
In this group of 16 papers[19 ]
[23 ]
[45 ]
[46 ]
[47 ]
[48 ]
[49 ]
[58 ]
[59 ]
[60 ]
[62 ]
[65 ]
[67 ]
[73 ]
[78 ]
[81 ], single-lead raw data is fed as input to the deep learning methods for applications
that focus on detection; for instance, the paper of Acharya et al.,[46 ] aimed at automated detection of abnormalities in ECG signals. The approach of Kiranyaz
et al.,[19 ] focused on a personalized monitoring system for arrhythmias. The high number of
research papers may come from the fact that single-channel biosignals are simple,
easy to acquire, and effective in decision making system. Out of the 16 papers, 12
used 1-D CNN models[19 ]
[45 ]
[46 ]
[47 ]
[48 ]
[49 ]
[58 ]
[59 ]
[60 ]
[62 ]
[65 ]
[81 ]; two were based on auto-encoders[23 ]
[73 ]; one used an RNN model[67 ]; and one used an RBM model[78 ].
Out of 10 papers using single lead ECG inputs, seven[19 ]
[47 ]
[49 ]
[58 ]
[59 ]
[60 ]
[62 ], two[45 ]
[46 ] , and one[114 ] were evaluated on the MIT-BIH, the German Physikalisch Technische Bundesanstalt
(PTB), and the Fantasia database, respectively (see [Appendix 2 ]).
In general, the overall results were above 94% accuracy. In particular, Kiranyaz et
al.,[60 ] obtained an average accuracy of 99.00% on the MIT-BIH database and Lei et al.,[45 ] reported 99.33% accuracy on the PTB database[114 ]. In an interesting study, Schozel and Dominik[73 ] used an RNN model to study PCG signals and they generated an artificial ECG signals
from PCG. Papers describing works from biosignals other than ECG obtained an overall
accuracy above 79%. For instance, Zhang et al.,[23 ] reported an average accuracy of 80.22% for emotion recognition using respiration
signal. Similarly, Ryu et al.,[65 ] obtained an overall accuracy of 79.55% for the classification of heart sounds using
PCG signals, which may be due to the unbalanced datasets used to validate the proposed
model. In another study, Atzori et al.,[81 ] proposed to classify the movements of a prosthetic hand using a CNN model. EMG signals
from amputee subjects were recorded and validated using the proposed models. They
obtained the lowest accuracy of38.09% for the amputee's datasets.
Deep Learning Applied to Features from Single-lead Biosignals: B(x12)
The number of scientific papers with features as input to the deep learning method
is high. In this cluster, features were extracted from the single-lead biosignals:
21 papers have been retrieved on this topic covering applications such as enhancement[63 ]
[106 ], detection[54 ]
[66 ]
[68 ]
[72 ]
[74 ]
[76 ]
[77 ]
[79 ]
[82 ]
[94 ], clustering[80 ]
[84 ]
[85 ], diagnostics[86 ]
[89 ]
[92 ], and prediction[96 ]
[108 ]
[109 ]. Examples of features were RR interval[72 ]
[74 ], mean absolute value of signal[85 ], or frequency spectral coeficients[92 ]. Nine out of 21 papers used unsupervised learning and deep belief networks to perform
detection[54 ]
[66 ]
[68 ]
[72 ]
[82 ]
[94 ], clustering[84 ]
[85 ], diagnostics[86 ], or prediction[96 ]
[108 ]
[109 ], while the others used supervised learning[63 ]
[74 ]
[76 ]
[79 ]
[80 ]
[89 ]
[92 ].
In 10 papers, the authors conducted experiments on public databases[66 ]
[68 ]
[76 ]
[77 ]
[80 ]
[84 ]
[85 ]
[86 ]
[94 ]
[96 ]. In two papers, signals were simulated[63 ]
[79 ]. In four papers, signals were acquired from the 2016 Physionet Challenge database[116 ]. Only four papers presented an evaluation using the authors' own data[54 ]
[74 ]
[89 ]
[92 ].
The results were diverse. In particular, Mohebi et al.,[79 ] generated continuous glucose monitoring (CGM) signals to identify type 2 diabetes
patients using a CNN model. Lee and Chang[96 ] combined a deep belief network with bootstrapping to estimate blood pressure; bootstrapping
enhanced the performance by about 10%. In general, the results were above 83% in accuracy
for the various applications. Specifically, Jindal et al.,[54 ] obtained an accuracy of 96.1% for biometric identification using PPG signals. Some
papers with inputs other than ECG signals obtained an accuracy of 71.39%, e.g., for
congestive heart failure detection[72 ].
In an interesting study, Lee and Chang[108 ]
[109 ] developed an ensemble of deep belief network-deep neural network (DBN-DNN) approach
for the estimation of oscillometric blood pressure with low data samples. Further,
artificial features were synthesized to overcome the issue of low data samples for
training deep learning models. To improve the prediction rate of cardiovascular events,
Kim et al.,[106 ] used a DBN model on blood pressure signals to reduce artifacts. Similarly, Zhenjie
et al.,[77 ] proposed a multiscale CNN model for the detection of atrial fibrillation from ECG
signals and achieved 98.18% accuracy on the MIT-BIH database[110 ].
Deep Learning Applied to 2D-transformed Biosignals: B(x13)
Ten papers describe utilizing a 2-D spectrum as input to the deep learning system[57 ]
[88 ]
[91 ]
[93 ]
[95 ]
[97 ]
[98 ]
[99 ]
[100 ]
[107 ]. It is reported that 2-D spectra of transformed biosignals describe the spatial
and temporal information of the signals. Also, it allows the direct use of CNNs. All
the papers used supervised learning to train a CNN and the applications are detection[57 ]
[88 ]
[93 ]
[107 ], classification[91 ]
[98 ]
[99 ]
[100 ], and diagnostics[95 ]
[97 ].
Out of the 10 papers, five used publicly available databases to evaluate the performance
of their systems[91 ]
[97 ]
[98 ]
[99 ]
[100 ] and the other ive used experimental protocols for signal acquisition[57 ]
[88 ]
[93 ]
[95 ]
[107 ] (see [Appendix 2 ]). In general, the results obtained were above 90% of accuracy. In particular, Dominguez-Morales
et al.,[98 ] obtained 97% of accuracy using a modified version of the AlexNet model[40 ]. Interestingly, the authors converted the PCG signal obtained from a neuromorphic
auditory sensor into address-event representations, which were later plotted as 2-D
sonogram for deep learning. In order to improve the prediction ability of their deep
learning model, Xia et al.,[88 ] used a hybrid model combining CNN with RNN.
Most of the papers in this cluster utilized supervised learning. Out of eight papers,
six papers used publicly available databases to validate the model proposed. Cote-Allard
et al.,[95 ] developed and tested a transfer-learning-based hand gesture learning system. A trained
model, obtained from other datasets, was used to detect and improve different hand
gestures. Xia et al.,[93 ] used 2-D-trans-formed ECG signals as input for the detection of pathological conditions
with an average accuracy of 98.63%.
4.4.3 Multiple Source Biosignals
Seven papers proposed approaches to manage multiple source biosignals. Except two
papers[104 ]
[105 ], all used the raw data to feed the deep learning system for signal enhancement[101 ], clustering[55 ]
[102 ], diagnostics[103 ]
[104 ]
[105 ], and prediction[10 ]. For instance, Bengio et al.,[10 ] presented a major breakthrough applying a CNN for affect classification from raw
physiological signals. The accuracy obtained by all papers was above 85%. The performance
of the systems was dependent on the combination of the biosignals used for the analysis.
For example, Zhang et al.,[102 ] obtained 98.49% of accuracy for the automated classification of sleep stages using
sparse DBNs. The authors used EMG, EEG, and ECG collectively to investigate the impact
of different biosignals on the algorithm performance. In[10 ], the authors used a CNN model on spe-cific signals from the database for emotion
analysis using physiological signals (DEAP)[13 ], namely skin conductance and blood volume pulse. Out of the seven papers, three
used a CNN model[10 ]
[55 ]
[103 ], two were based on an ensemble approach[104 ]
[105 ], one used a DBN model[102 ], and another used a DNN model[101 ]. Belo et al.,[101 ] proposed an interesting approach for generalized biosignals learning and synthesis
using DNN models. Chow[103 ] proposed an online biometric recognition system using ECG and EDA signals. In this
approach, physiological signals are acquired and trained in an online mode using pre-trained
networks. Three papers[10 ]
[104 ]
[105 ] have used the publicly available DEAP database for the validation (see [Appendix 2 ]).
4.5 Clinical Application
Although most of the papers are focused on the methodology of biosignal analysis using
deep learning, there are some examples that present a real-world application of such
methods. We have selected three of them as examples.
HeartID[30 ] used a multiresolution CNN for ECG-based biometric identification of humans in smart
health applications. First, the ECG stream was blindly split into segments of two
seconds disregarding the R-wave positions. Then, the segments were transformed to
the Wavelet domain to reveal more detailed time and frequency characteristics in multiple
resolutions. An auto-correlation was performed to each wavelet to remove the blind-segmentation-based
phase shift. Despite using wavelet-transformed output (image) as input to the CNN
approach, a 1-D-CNN was applied to each individual wavelet component to clearly learn
the local patterns. Furthermore, each wavelet component was considered as a 1-D-image
(feature vector) and was fed as input into the 1-D-CNN to learn the intrinsic pattern
of the individuals. To evaluate the system, several publicly available databases were
re-sampled to 360 Hz yielding 100 normal and 120 abnormal datasets (arrhythmia, malignant
ventricular ectopy, ST depression). The correct identification rate yielded 96.5%
and 93.5%, respectively. The system could be generalized to other quasi-periodic biosignals
such as PPG, BCG, or a multi-modal combination of them.
Another remarkable system focused on personalized event prediction to detect the occurrences
of arrhythmias or abnormal beats in the ECG signal[19 ]. Facing the challenge that an ECG of a healthy person without any history of cardiac
arrhythmias should exhibit no abnormal beats, the authors developed a set of 464 ilters
performing regularized least-squares optimization to synthesize individual pathologic
patterns out of healthy ECG cycles. For the early detection of cardiac arrhythmias,
a personalized training dataset was created synthetically over the subject's average
normal beat. A one-dimensional (1-D) CNN dedicated to that specific subject was trained
and used to monitor streamed real-time recordings. The system's evaluation was based
on the MIT-BIH database[110 ], where the first 5 minutes of recordings were used to form the average normal beats.
Overall, 34 patient records with a total of 63,341 beats were selected for the evaluation.
The probability of detecting at least one among the first three abnormal beats was
99.4%, supporting a meaningful clinical application of the system.
Generally, deep learning is considered to be computationally expensive. Pourbabaee
et al.,[61 ] developed a computationally efficient screening system for patients with paroxysmal
atrial ibrillation (PAF). The authors used a large volume of ECG time-series data
and a deep CNN to identify unique features patterns for screening patients. The 30-min-ute
raw ECG signals were first divided into six equal segments. Later, these short ECG
segments were used as input to the CNN to obtain robust deep features which were then
passed through standard classifiers namely end-to-end CNN, k-nearest neighbors (kNN),
support vector machine (SVM), and multilayer perceptron. To evaluate the performance
of the system, signals were taken from the PAF prediction challenge database[117 ]. The authors showed that the CNN-learned features can be classified conventionally,
which is computationally inexpensive and eficient. The CNN network did not require
any prior domain knowledge for learning the features and, therefore, the approach
could potentially be utilized for other biosignals. The network yielded an accuracy
of 93.60% and 92.96% using an end-to-end CNN and a Gaussian kernel SVM, respectively.
5 Discussion and Conclusion
A comprehensive literature research was performed to identify works combining biosignals
of any nature (except EEG due to its 2-D spatial structure) with deep learning networks.
The examined bibliographic databases (PubMed, Scopus, and ACM) provided relevant papers
including works publisehd in IEEE or SPIE conference proceedings. We have developed
a scheme to differentiate existing research by biosignals, applications, and networks
that yields 30 instances from seven independent categories. Such a graph may yield
more than 25,000 different codes, but due to internal dependencies (network topology
and application goal), that figure reduces to 6,480. As it is shown in [Table 2 ], research is diverse, but there are two codes occurring multiple (more than three)
times, B(111) A(212)N(213) and B(121)A(112)n(212). The code B(111)A(212)N(213) is
found to be used six times on single lead ECG signals to perform pattern detection.
Similarly, B(121)A(112)N(212) is used four times for multi-lead ECG targeting signal
enhancement for better clinical decision supports.
The overall accuracy obtained by methods using raw ECG signals is higher than those
with features as input. This may be due to the ability of deep learning to capture
features which optimally represent the biosignal specific to the problem. The overall
performance of the papers that utilized the authors’ own signal acquisition protocol
was lower than those experimenting with publicly available databases. This may be
due to the presence of noise and artifacts in the recordings. To date, most work has
been performed on relatively small datasets. Five papers used 3,126 recordings from
the 2016 Physionet Challenge database for heart sound analysis, and one paper mentioned
utilizing 1,075 records[92 ]. This inding is in line with Deserno and Marx[7 ], who reported the need for more realistic reference databases.
Most of the papers (n = 30) use features which are pre-extracted from the biosignals
as input to the deep learning models. This may result from the assumption that the
performance of machine learning is improved when using pre-computed features as compared
to noisy raw data since a feature vector is much smaller than the signal itself and
hence this reduces the required volume of training data. However, the basic idea of
deep learning is to automatically identify patterns from large volumes of biosignals
without human involvement. On the other hand, only 11 papers (less than 15%) applied
a 2-D transform to take advantage of the CNN matrix-like architecture. This igure
was much lower than anticipated, but may be the result of the fact that 1-D CNN performs
well.
In conclusion, after having shown their impact on image and video analysis, deep learning
approaches have become successfully applied to the analysis of 1-D biosignals. The
number of published work increases continuously, and the results are promising. However,
there is a large variety of signals, applications, and reference data, which makes
an objective comparison of the published approaches difficult.
In future, standardization of network topology and parameters is expected. Reference
data recorded with novel devices is required, which represents not only pathology
but also normal recordings from healthy subjects, as well as outliers, drop out sequences,
and noise.