Design of DNN-Based Low-Power VLSI Architecture To Classify Atrial Fibrillation For Wearable Devices

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

320 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO.

3, MARCH 2023

Design of DNN-Based Low-Power VLSI


Architecture to Classify Atrial Fibrillation
for Wearable Devices
Rushik Parmar , Meenali Janveja , Student Member, IEEE, Jan Pidanic , Senior Member, IEEE,
and Gaurav Trivedi , Member, IEEE

Abstract— Atrial fibrillation (AF) is a recurrent and depression, and chronic kidney disease making a timely
life-threatening disease leading to rapid growth in the mortality diagnosis of AF imperative to increase life expectancy [3],
rate due to cardiac abnormalities. It is challenging to manually [4]. We aim to develop biomedical signal processing architec-
diagnose AF using electrocardiogram (ECG) signals due to
complex and varied changes in its characteristics. In this article, tures for wearable or hand-held devices that can continuously
for the first time, an end-to-end edge-enabled machine learning- analyze and detect AF in real time.
based VLSI architecture is proposed to classify ECG excerpts Previously reported research primarily focuses on the detec-
having AF from normal beats. Researchers have found that tion and classification of different types of cardiac arrhythmia
abnormal atrial activity is confined to the low-frequency range as per Association for the Advancement of Medical Instrumen-
through the decades. Therefore, in the proposed work, this
frequency band is directly analyzed for AF detection, which tation (AAMI) standards, viz. detection of normal beat (N),
has not previously been discussed. The proposed architecture is supraventricular ectopic beat (SVEB), ventricular ectopic beat
implemented using 180-nm bulk CMOS technology consuming (VEB), fusion beat (F), and unknown beat (U). It is important
11.098 µW at 25 kHz and exhibits an accuracy of 92.37% to note that previous works have not included AF while
for class-oriented classification and 81.60% for subject-oriented classifying arrhythmia. Massachusetts Institute of Technology-
classification. The low-power realization of the proposed design,
as compared to the state-of-the-art methods, makes it suitable to Beth Israel Hospital (MIT-BIH) arrhythmia database [23] is
be used for wearable devices. one of the standard datasets used by researchers for detecting
and classifying different types of cardiac arrhythmia. It is
Index Terms— Application specific integrated circuit (ASIC),
atrial fibrillation (AF), deep neural network (DNN), electrocar- worth mentioning that the previously mentioned dataset lacks
diogram (ECG) signal, wavelet transform, wearable devices. subjects with AF episodes and consists of 47 subjects, out
of which only eight have suffered from AF episodes. The
MIT-BIH atrial fibrillation (AF) database [19] was released
I. I NTRODUCTION to support research in AF classification, which consists of

A TRIAL fibrillation (AF) is the most common type of car-


diac arrhythmia due to rising incidences and prevalence
worldwide. AF is an abnormal heart rhythm caused due to the
25 ECG recordings of subjects diagnosed with AF. Our
previous study [24] targeted real time classification of cardiac
arrhythmia but did not include classification of AF, as diagnos-
irregular contraction of the atria. Moreover, the likelihood of ing AF is challenging, especially at an early stage due to its
developing AF increases with age [1]. Contemporary research asymptomatic nature. Note that AF remains undetected during
shows that around 30% of patients with ischemic heart disease the diagnosis with conventional ECG monitoring devices.
are also diagnosed with AF [2]. AF increases the risk of The current method of diagnosis includes a medical expert
other heart-related diseases, such as congenital heart disease, inspecting and interpreting ECG data, which is a tedious
myocardial infarction, congestive heart failure, etc. AF also and time-consuming task. Moreover, the scarcity of medical
can lead to noncardiac diseases, viz. cognitive dysfunction, expertise in developing countries increases the mortality rate.
Therefore, it is imperative to design smart wearable devices
Manuscript received 15 September 2022; revised 21 December 2022;
accepted 6 January 2023. Date of publication 19 January 2023; date of equipped with artificial intelligence to analyze ECG for timely
current version 24 February 2023. This work was supported in part by detection of AF without any medical expert intervention. It is
Electronics and ICT Academy at IIT Guwahati funded by the Ministry of important to mention that these devices should have low-
Electronics and Information Technology, India and in part by the Programme
INTER-EXCELLENCE (LTAIN19100) funded by the Ministry of Education, power consumption, low computations, and a high detection
Youth and Sports, Czech Republic, “Artificial Intelligence Enabled Smart accuracy to operate in real time while avoiding false alarms
Contact-less Technology Development for Smart Fencing” under Project and unnecessary visits to the hospital.
LTAIN19100. (Corresponding author: Rushik Parmar.)
Rushik Parmar, Meenali Janveja, and Gaurav Trivedi are with the Depart- In recent years, researchers have used two approaches to
ment of Electronics and Electrical Engineering, IIT Guwahati, Guwahati detect AF. The first approach involves using handcrafted
781039, India (e-mail: [email protected]; meena176102001@ features from ECG data like a spectrogram [8], P-wave
iitg.ac.in; [email protected]).
Jan Pidanic is with the Department of Electrical Engineering, Faculty morphology [9], RR-interval series [22], entropy [22], and
of Electrical Engineering and Informatics, University of Pardubice, 532 10 heart rate variability (HRV) [10]. With these features, studies
Pardubice, Czech Republic (e-mail: [email protected]). have also explored the use of dominant atrial frequency [5],
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TVLSI.2023.3236530. [6], [7] to detect AF. It is important to note that feature
Digital Object Identifier 10.1109/TVLSI.2023.3236530 extraction plays a vital role in determining the computation
1063-8210 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 27,2023 at 05:59:22 UTC from IEEE Xplore. Restrictions apply.
PARMAR et al.: DESIGN OF DNN-BASED LOW-POWER VLSI ARCHITECTURE TO CLASSIFY AF FOR WEARABLE DEVICES 321

cost of the classifier. Complex features, such as spectrogram,


entropy, power spectral density, and root mean square demand
higher resources leading to higher area utilization and power
consumption. The second approach relies on the capabilities
of ML models and directly uses raw ECG data to detect
AF events [21]. Although ML algorithms can detect different
patterns in a signal with high accuracy, they have various
pitfalls, including high-computational costs and increased
sensitivity to noise. This makes using raw data as a fea-
ture vector for ML model inefficient in smart wearable
devices. On the contrary, a feature-based approach with
selected features and an efficient feature extraction method-
ology can drastically reduce model size and complexity.
This helps developers to reduce computations, realizing
low-power wearable devices. With this, it is important to
mention that previously reported the state-of-art methods
primarily focus on software implementation of the classi-
fier using complex features. These methods exhibit higher
area utilization and power consumption. Andersson et al.
[22] have proposed an application specific integrated circuit Fig. 1. Proposed neural network for AF classification. Ni, j represents jth
(ASIC) for AF detection. The model proposed in [22] accepts neuron of ith layer. X i depicts input feature.
an RR-interval series as input to extract several features and
then uses a threshold detector to classify a beat as normal subtract, and shift. This significantly reduces the com-
or AF. However, this model does not extract the RR-interval putation cost.
series from ECG data and relies on an external processing 4) The neural network architecture employs an improved
unit for ECG data processing. Moreover, the detector uses Mitchell’s multiplier. It uses two error reduction tech-
nontunable hard-coded thresholds, which might not adapt to niques to reduce the average error of Mitchell’s algo-
varied ECG signals in real-world applications. Therefore, the rithm from 3.88% to 1.79%.
need of the hour is an edge-enabled low-power and computa-
tionally efficient VLSI architecture of an AF classifier. Hence, III. P ROPOSED D ESIGN OF THE AF C LASSIFIER
to overcome the above-mentioned issues, we propose an end- A typical system for the detection of AF using ECG consists
to-end AF classifier that accepts raw ECG data to extract of two blocks. The first block is the feature extraction block,
wavelet coefficients using a simplified technique to classify and the second is the classifier block, which can process
AF beats from the normal beats. features and perform the classification. Fig. 1 showcases the
The rest of this article is organized as follows. Section II proposed neural network, and Fig. 2 illustrates the complete
describes the contribution of this article. Section III depicts architecture of the proposed AF classification algorithm. The
the implementation details of feature extraction and the details of the work proposed in this manuscript are described
machine learning model. Experimental analysis and results below.
are presented in Section IV. Finally, Section V concludes the A. Feature Extraction Block
manuscript.
Wearable applications require low-energy dissipation in both
II. N OVELTY OF THE P ROPOSED W ORK idle and active modes. Idle energy is dominated by the
1) In this article, for the first time, an end-to-end VLSI leakage current drawn by memories, whereas active energy
architecture is proposed for AF classification. The previ- is minimized by reducing complexity. Thus, an efficient fea-
ously reported the state-of-the-art methods were primar- ture extraction block is proposed in this article, which does
ily implemented on a software platform or relied on an not require storing the entire ECG samples. Moreover, the
external processing unit for ECG data processing [22]. computational complexity of the feature extraction block is
The proposed classifier accepts raw ECG data for analy- reduced by simplifying the wavelet transform, as explained
sis and classification of AF, making it an end-to-end in the subsequent section. The variations in the ECG due to
classifier. abnormal atrial activity are usually confined to the signal’s
2) The proposed classifier uses low-frequency wavelet low-frequency range (less than 12 Hz) [12]. Discrete wavelet
coefficients for AF classification instead of a complete transform (DWT) is used for the analysis of the low-frequency
spectrum. The use of these coefficients reduces the ECG signal components. It is preferred over Fourier transform
size of the feature vector, which in turn reduces the due to the salient features mentioned below.
complexity and size of the machine learning model. 1) Adaptive time-frequency windows.
3) A novel wavelet transform approach using integer 2) Lower aliasing distortion for signal processing
Haar wavelet is proposed in this article. It simplifies applications.
the implementation of wavelet transform from Mal- 3) Reduced computational complexity.
lat’s decomposition to simple operations, such as add, 4) Efficient VLSI implementation.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 27,2023 at 05:59:22 UTC from IEEE Xplore. Restrictions apply.
322 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 3, MARCH 2023

Fig. 3. Mallat’s wavelet decomposition method.

wavelet [15], is utilized among all the wavelet functions in


the filters. The transfer functions of the low-pass and high-
pass filters for Haar wavelet are illustrated by the following:
1 1
CA[n] = √ X[2n] + √ X[2n + 1] (1)
2 2
1 1
CD[n] = √ X[2n] − √ X[2n + 1]. (2)
2 2
Although the Haar wavelet is the simplest among all the
mother wavelets, but it involves floating-point arithmetic,
making it computationally expensive. To overcome this, inte-
Fig. 2. Proposed architecture of AF classifier. ger Haar wavelet, an approximation to haar wavelet is pro-
posed [15]. Further, the transfer functions of the low-pass and
high-pass filters, CA[n] and CD[n] of the integer Haar wavelet,
DWT is a multiresolution analysis tool initially used for can be stated by the following:
 
image and speech compression. It is now widely used to 1 1
analyze nonstationary signals, such as ECG [15]. Using DWT, CA[n] = X[2n] + X[2n + 1] (3)
2 2
a signal can be decomposed into different subbands to extract CD[n] = X[2n] − X[2n + 1]. (4)
local spectral and temporal information simultaneously using
a mother wavelet. Among all the mother wavelets, “inte- It is to mention that (3) and (4) can be implemented using
ger Haar” is the simplest wavelet function. Our previous simple add and right shift (1) operations represented by (5).
work proposed using DWT with “integer Haar” as a mother This avoids typical floating point arithmetic utilized in the
wavelet [15] for the following reasons: First, it has the conventional Haar wavelet transform
simplest filter coefficients, making it efficient for hardware
CA[n] = (X[2n] + X[2n + 1])  1. (5)
implementation. Second, it is fast and memory efficient. Third,
it does not exhibit edge effects. Hence, in the proposed work, As mentioned above, the approximation of filter coefficients
the analysis of low-frequency ECG signal components, which of the conventional Haar wavelet to the integer Haar wavelet
contain critical information about AF, is performed using DWT helps in reducing the computational complexity, making the
employing “integer Haar” as the mother wavelet. Using DWT integer Haar wavelet an adequate transform for extracting AF
decomposition with integer Haar, we extract wavelet coef- frequency bands. In the proposed work, an ECG signal of a
ficients, which provide information about various frequency length of 500 samples is considered at a time. Its wavelet
bands. Later, the wavelet coefficients of the ECG signal in coefficients up to the 5th scale are calculated, as they contain
the relevant frequency band, having information about AF, are the frequency components essential for AF detection. The
taken as an input feature vector for the deep neural network conventional implementation of Mallat’s algorithm up to the
(DNN) to classify ECG waves as normal or AF. 5th scale requires the realization of seven filters, which is
To extract decomposed wavelet coefficients in different resource intensive. Thus, in the proposed work, we present
frequency bands, the input signal needs to be passed through filter-less feature extraction method employing a simplified
a chain of low-pass and high-pass filters as per Mallat’s wavelet decomposition scheme described below.
decomposition algorithm [16] shown in Fig. 3. The output of Proposed Wavelet Decomposition: Several VLSI architec-
each level is required to be downsampled by two to remove tures for the discrete wavelet transform have been proposed in
redundant information. The coefficients of low- and high-pass the literature [27], [28]. Parhi and Nishitani [27] have proposed
filters are calculated according to the mother wavelet. How- a folded three-level DWT architecture for an efficient wavelet
ever, the integer Haar wavelet, an approximation to the Haar decomposition, which is shown in Fig. 4. It employs several

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 27,2023 at 05:59:22 UTC from IEEE Xplore. Restrictions apply.
PARMAR et al.: DESIGN OF DNN-BASED LOW-POWER VLSI ARCHITECTURE TO CLASSIFY AF FOR WEARABLE DEVICES 323

TABLE I
C ALCULATION OF THE C OEFFICIENTS U SING INTEGER H AAR WAVELET

4th- and 5th-level wavelet coefficients


⎛ ⎞
j +7
k=16 
k=16 j +15
CD4 [ j ] = ⎝ x[k] − x[k]⎠  3 (6)
k=16 j k=16 j +8
⎛ ⎞

k=16 j +15
CA4 [ j ] = ⎝ x[k]⎠  4 (7)
k=16 j
⎛ ⎞
k=32 j +15 k=32 j +31
 
CD5 [ j ] = ⎝ x[k] − x[k]⎠  4 (8)
k=32 j k=32 j +16
⎛ ⎞
k=32 j +31

CA5 [ j ] = ⎝ x[k]⎠  5. (9)
k=32 j

Utilizing the approximate and detailed coefficients shown


in Table I, the proposed wavelet decomposition architecture is
simplified by incorporating add, subtract, and shift operations.
This eliminates the need for a clock divider network for down-
sampling and the multipliers to extract the wavelet coefficients.
Fig. 4. Folded three-level wavelet decomposition architecture. “Ri ” and “D” Fig. 5 exhibits the proposed discrete wavelet decomposition
both represent word-level registers. “i” denotes the time allocation of the implementation, in which coefficients at a specific level are
register “Ri ” [27]. estimated without utilizing the previous outputs. It employs
three adders, two subtractors, three registers, three shifters, and
five switches. It is to mention that the switches are realized as
adders, multipliers, and registers, resulting in higher resource multiplexers on hardware, and the shifters are used to simplify
utilization. In the proposed work, instead of employing con- the division operation. This architecture accepts raw data and
ventional decomposition architectures, a new architecture is provides the approximate and detailed coefficients, including
depicted employing the integer Haar wavelet as the mother CD4 , CD5 , and CA5 .
wavelet. This architecture is optimized using approximate and The design shown in Fig. 4 implements level-three DWT
detailed coefficients of the integer Haar wavelet illustrated in while the proposed implementation in Fig. 5 realizes level-five
Table I. Equations (6), (8), and (9) showcase the simplified DWT with reduced hardware utilization. The implementation

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 27,2023 at 05:59:22 UTC from IEEE Xplore. Restrictions apply.
324 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 3, MARCH 2023

A flexible hardware for ECG classification with configurable


time-domain filters is proposed in [30]. The filters are used to
extract features, which are then fed into a recurrent neural
network for classification. The feature extraction module is a
filter bank that has multiple stages with several filters each.
Each stage performs a subsampling by a factor of 2 for data
reduction. The implementation presented in [30] can be reused
for low-frequency analysis, but it requires offline retraining
of the model and updating the hardware parameters. With
this, the filters are implemented as per Mallat’s decomposition
algorithm, which includes downsampling by two at each stage.
This downsampling logic requires the implementation of clock
division. On the contrary, in our proposed implementation, the
complex filter and downsampling logic for Mallat’s decom-
position reduces to a counter-based shift, add, and subtract
operations, eliminating clock division for downsampling. Our
proposed design also requires less memory as approximate
Fig. 5. Proposed wavelet decomposition architecture. “REG” represents and detailed coefficients of each stage are directly a function
word-level register and “n” represents a right shift by n operation. of the input samples. Thus, using the proposed realization of
DWT provides a threefold improvement. First, it reduces the
overall latency of the system. Second, it reduces hardware
depicted in Fig. 4 requires several multipliers, while the pro- complexity, and third, it also simplifies the synthesis of the
posed design employs only shifting operations, thus reducing clock tree network.
complexity. Estimating wavelet coefficients in this manner
utilizes lesser registers without using downsampling circuit B. Implementation of DNN-Based AF Classifier
and multipliers. In our previous work [15], DWT using integer The proposed architecture of the DNN-based AF classifier is
Haar wavelet is implemented as per Mallat’s decomposition presented in Fig. 2. During the feature extraction, using CD4 ,
algorithm, which includes downsampling by two at every CD5 , and CA5 , 64 wavelet coefficients are obtained, which
stage. This downsampling logic requires the realization of are fed as an 1-D input feature vector to a DNN to classify
clock division. The present work further optimizes DWT AF. These 64 wavelet coefficients cover frequency components
implementation to eliminate the need for complex filters of the ECG signal below 12 Hz. Note that for the ECG
and clock division for downsampling. Table I showcases the analysis, the samples need to be passed through a bandpass
detailed and approximate coefficients at every stage using the filter of 0.5–50 Hz [33] to filter out noise. The 4th- and
integer Haar wavelet. 5th-level DWT coefficients used in the proposed work contain
Loh et al. [29] and Jobst et al. [30] have also used DWT low-frequency information. Since the high-frequency wavelet
for feature extraction block. Loh et al. [29] propose a digital coefficients are not utilized, the high-frequency noise does not
signal processing accelerator for cardiac arrhythmia detection. propagate to the DNN classifier. In the hidden layers of the
It employs discrete wavelet transform with CNN to classify DNN, “ReLU” is utilized as an activation function, which not
cardiac arrhythmia. The DWT-based feature extraction is per- only reduces hardware complexity to a simple multiplexer but
formed using Daubechies (db) wavelets and is implemented also addresses the gradient vanishing problem faced by other
as per Mallat’s decomposition algorithm. In this design, each activation functions.
filter requires four multipliers. The proposed work does not As we know, each layer of a DNN comprises neurons,
utilize multipliers because the integer Haar wavelet uses which mainly consist of multiply and accumulate (MAC) units.
division by 2, which can easily be implemented using a Mathematically, a neuron is modeled using (10), where x i , wi ,
shift operation. The work presented in [29] also requires a and bi are the input to the neuron, the weight of each branch,
downsampling logic at each stage. For subsampling in the and node bias, respectively,
DWT stages, Loh et al. [29] employ duty cycling of the clock 
N
signal, i.e., corresponding filter taps are gated every second or y= wi x i + bi . (10)
third cycle relative to their previous components. While our i=1
proposed work eliminates the need for the clock division and The complexity and hardware resource utilization of a DNN
realizes (6), (8), and (9), representing the detailed coefficients primarily depends on the number of network layers and nodes
at level 4, detailed coefficients at level 5, and approximate in each layer. Therefore, we utilize a minimal input feature
coefficients at level 5, respectively. Note that [29] requires vector of size 64 as per the requirement, which helps in
level-4 wavelet coefficients for CA detection. The level-4 optimizing the size of a DNN to 64 × 45 × 30 × 15 × 2 after
wavelet coefficients are extracted using five 4-tap filters and several epochs of training. After optimizing and training the
three downsampling units. This can easily be reduced to (6) network, this pre-trained network is implemented on hardware
and (7), which can be realized using add, subtract, and shift using Verilog HDL. Note that the signed floating-point arith-
operations only. metic has a high resource utilization and power consumption

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 27,2023 at 05:59:22 UTC from IEEE Xplore. Restrictions apply.
PARMAR et al.: DESIGN OF DNN-BASED LOW-POWER VLSI ARCHITECTURE TO CLASSIFY AF FOR WEARABLE DEVICES 325

and also impacts the performance of a system. Therefore, the


floating-point weights and biases obtained using a pre-trained
network are converted to 18-bit fixed-point arithmetic in the
proposed method. As we know, each neuron consists of a MAC
unit, and multiplication is the most computationally intensive
operation in any hardware design; therefore, to optimize the
MAC unit of a neuron, an approximate multiplier (AM) is
proposed in this article, which is described in Section III-C.

C. Proposed Optimized AM
Several designs of accurate and AMs are proposed in the
literature. However, AMs have better power, area, and speed
efficiency than accurate multipliers. Moreover, Mitchell’s algo-
rithm [17] is the most simplified approach for integer and
fixed point multiplication among all the available methods of
approximate multiplications.
Mitchell proposes an efficient AM, which utilizes a log mul-
tiplication property. It computes approximate multiplication by
linearly approximating log and antilog values. In this method,
first, the characteristic part of the approximate log is obtained
by finding the position of the leading “1,” i.e., the leftmost one
in the binary sequence. Next, the remaining values are used
Fig. 6. Proposed architecture of AM.
as an approximate fractional part of the log. Later, these two
operands are added, and the approximate antilog operation is
performed to this summation, which generates an approximate
product. Therefore, the approximate product is decomposed into two
Consider an N bit integer B with Bn−1 , Bn−2 , . . . , B0 , cases. First, 0 ≤ x 1 + x 2 < 1, which has no carry generating
Nbits
−1 i from the fractional part to the characteristic and second, 1 ≤
which is represented as B = i=0 2 bi . Assuming that
leading one occurs at position k, where (N − 1)  k  0, x 1 + x 2 < 2, which has a carry generating to the characteristic
B can be written as (11) without any loss of the accuracy part. This can be expressed as follows:


k−1 k1 +k2 (x + x + 1), x + x < 1
B = 2k 1 + 2i−k bi . (11) = 2
P
1 2 1 2
(15)
2k1 +k2 +1(x 1 + x 2 ), x 1 + x 2 ≥ 1.
i=0

Equation (11) can further be represented as B = 2k (1 + x), Although the above-mentioned approximation reduces the
where x = k−1 i=0 2
i−k b . Hence, log of B can be described as
i entire multiplication to addition and shift operations, it intro-
(12), where k is essentially an integer representing character- duces a significant error in the product obtained. Therefore,
istic value of log, and log2 (1 + x) is the fractional part to obtain an efficient multiplier architecture for the MAC
unit of a neuron, we propose an efficient error reduction
log2 B = k + log2 (1 + x). (12)
technique and an optimal hardware implementation of (15).
Since, 0≤x < 1, the linear approximation of log2 B can be This obtains a more accurate result compared to the con-
represents approximate ventional Mitchell’s algorithm. The multiplier architecture
depicted as (13), where function log
proposed in this article is described in Fig. 6. The error reduc-
binary log
tion technique utilized in the proposed method is described
2 B = k + x.
log (13) below.
1) Error Reduction Scheme: Here, we propose a two-step
Further, assuming there are two N-bits binary numbers
method to reduce error in the approximated product. First,
B1 and B2 with leading ones at k1 and k2 , they can be
a bias is calculated by averaging errors across the entire range
represented as B1 = 2k1 (1 + x 1 ) and B2 = 2k2 (1 + x 2 ). Thus,
of the fractional part x. Further, this average error is added to
the approximate product can be represented as mentioned in the approximate product improving its accuracy [25]. Error E
the following:
in the approximate product is estimated by (16), where P is
 = k1 + k2 + x 1 + x 2
log2 ( P) (14) the logarithmic product of B1 and B2 and P is the approximate
product obtained by Mitchell’s algorithm
where P represents approximate product. Later, to estimate
approximate antilog of (14), “1” is added to the fractional part E =
P−P
x and is scaled with respect to the characteristic part. It is to −2k1 +k2 (x 1 x 2 ), x1 + x2 < 1
be noted that the fractional part should be in the range [0, 1). = (16)
Since 0 ≤ (x 1 , x 2 ) < 1, thus, in this case 0 ≤ x 1 + x 2 < 2. −2k1 +k2 (1 + x 1 x 2 − x 1 − x 2 ), x 1 + x 2 ≥ 1.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 27,2023 at 05:59:22 UTC from IEEE Xplore. Restrictions apply.
326 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 3, MARCH 2023

Later, the average error (bias) is calculated using the


following:
 1 1
1
E avg = Ed x 2 d x 1
(1 − 0)(1 − 0) 0 0
 1  1−x1
= −2k1 +k2 (x 1 x 2 )d x 2 d x 1
0 0
 1 1
+ −2k1 +k2 (1 + x 1 x 2 − x 1 − x 2 )d x 2 d x 1
0 1−x1
= −2k1 +k2 (0.083333). (17)
It can be inferred that the average error (bias) is always
negative and depends on k1 and k2 . Bias needs to be shifted and
added as per the position of the leading “1” in B1 and B2 . It is
to mention that the error got introduced in the product is due
to the approximation of the fractional part. Thus, in the second Fig. 7. Error in Mitchell’s multiplication algorithm.
step, a decomposition strategy similar to [18] is utilized in the
proposed multiplier. To reduce the error, the fractional part x is
decomposed into two small operands. The proposed decompo-
sition scheme minimizes the fractional part and the error in the
logarithm approximation. This decomposition methodology
further decreases switching activity [18], reducing power. The
proposed decomposition scheme is represented by (18), where
X = X n−1 , X n−2 , . . . , X 0 and Y = Yn−1 , Yn−2 , . . . , Y0 are
two N-bit binary numbers, and A, B, C, and D are the
decomposed operands depicted as follows:
A[i ] = X[i ] | Y [i ]
B[i ] = X[i ] & Y [i ]
C[i ] = X[i ] & Y [i ]
D[i ] = X[i ] & Y [i ]
XY = A ∗ B + C ∗ D. (18)
The implementation of the proposed multiplier is depicted in Fig. 8. Error correction using operand decomposition.
Fig. 6. To implement the AM, leading “1” finding module can
be optimized and implemented in hardware as a simple priority
encoder to reckon k1 and k2 . Next, the binary numbers can be addition for 18-bit multiplications due to memory requirement
shifted and rearranged in the fixed point representation to get more than physically available. Therefore, we choose to plot
respective fractional parts x 1 and x 2 . Finally, the approximate errors produced by the multiplication of 10-bit numbers to
product is obtained by adding and shifting x 1 , x 2 , k1 , and k2 study the behavior of the error produced, consuming 87% of
according to (15). the available memory.
2) Error Analysis of the Proposed Multiplier: In this Here, the subsequent figures illustrate the error estimation,
section, the estimated error of the outputs of Mitchell’s in which the x- and y-axes denote the inputs, and the
AM and the proposed AM is compared. The multipliers are z-axis represents the multiplier error. Fig. 7 showcases the
numerically analyzed using Python and the error is calcu- error produced by Mitchell’s algorithm. Fig. 8 depicts the error
lated analyzing their outputs when same binary inputs are of Mitchell’s algorithm when operand decomposition is used
provided to both the multipliers. In this method, many 18- along with it. Fig. 9 illustrates the error of Mitchell’s algorithm
bit numbers are multiplied, and their products are repre- when it incorporates both the bias addition and operand
sented by 36-bit numbers. Each product is stored in a 32 decomposition. The average error for 18-bit multiplication
Byte number by default. The entire process generates 218 × using Mitchell’s algorithm is 3.88%. Bias addition reduces
218 = 68, 719, 476, 736 multiplication products. Thus, the the average error to 2.19%. Furthermore, by incorporating
total memory used to store these products is 68, 719, 476, 736 operand decomposition, the average error scales down to
× 32 = 2048 GB. We use the same space to save the error 1.79%. However, for certain inputs, the error might increase
calculated. Additionally, extra memory is required for 3D using the proposed multiplier, but the average error and max
surface plot rendering. We employ a computer having 256 error reduce substantially. Thus, the proposed multiplier with
GB RAM for the analysis of the proposed method. However, an average error of 1.79% can be utilized in biomedical signal
it also cannot plot the errors produced by Mitchell’s algorithm, processing applications. The details of the realization of the
operand decomposition, and Operand Decomposition with bias proposed AF classifier are explained in the next section.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 27,2023 at 05:59:22 UTC from IEEE Xplore. Restrictions apply.
PARMAR et al.: DESIGN OF DNN-BASED LOW-POWER VLSI ARCHITECTURE TO CLASSIFY AF FOR WEARABLE DEVICES 327

TABLE II
FPGA I MPLEMENTATION OF AF C LASSIFIER A RCHITECTURE
(X ILINX V IRTEX -7 FPGA)

The proposed AF classifier is first implemented and tested


using Python to validate its correctness. Later, the complete
Fig. 9. Error correction using operand decomposition and bias addition. design is realized using Verilog HDL and is synthesized on a
Xilinx Virtex–7 FPGA board [20] to verify its performance by
IV. R ESULTS AND D ISCUSSION testing it on multiple testcases. The resource utilization of the
In this section, a detailed analysis of the implementation of proposed architecture on the Xilinx Virtex–7 FPGA board is
the proposed work is presented. exhibited in Table II. It is observed that the proposed design
utilizes 1.7% of the total available resources. It is the first
end-to-end ML-based AF classifier on hardware. Since our
A. Hardware Implementation
goal is to realize an end-to-end solution for AF detection as
In this section, a detailed discussion is presented on the
an ASIC, no FPGA-specific power optimization techniques are
evaluation of the proposed design. In this study, ECG signals
employed while implementing the proposed design on FPGA.
are acquired from the Physionet database. The AF and non-AF
Therefore, the area and power of the proposed design are
signals are extracted from Physionet’s MIT-BIH AF data-
calculated using Synopsys IC Compiler and SCL 180-nm bulk
base [19]. This dataset contains 25 ECG recordings sampled at
CMOS PDKs, as reported in Table III
250 Hz with an approximate duration of 10 h. However, in this
The complete design utilizes 1.18 mm2 area and consumes
study, four recordings are omitted because two recordings
11.098-µW power at 1.98 V and 25 kHz. Since we utilize
(“00735” and “03665”) are not available, and the other two
ECG excerpts having a sampling frequency of 250 Hz, it is
(“04936” and “05091”) have incorrect reference annotations
observed that an operating frequency of 25 kHz is suitable
as specified in the dataset. The dataset provides annotated
for the proposed design to process ECG beats in real time.
data labeled normal and AF. The annotated normal and AF
Placement and routing are performed using Synopsys IC
segments are used to create test and training data. Each
compiler, and Fig. 10 presents the chip layout of the proposed
segment is divided into subsegments using a sliding window
classifier. A detailed comparative analysis of the proposed
of 500 samples. These subsegments are then partitioned into
method with other implementations is given below.
training and test sets as per two schemes (Experiment-1 and
Experiment-2) to evaluate the performance metrics of the
proposed classifier. B. Comparison With State-of-the-Art Methods
The class-oriented approach is utilized in Experiment-1. Table III presents the comparison of the proposed work with
This approach divides the entire subsegments with AF and state-of-the-art methods. It is inferred from Table III that the
normal episodes into 20:80. Here, 20% of segments are used proposed classifier yields higher average accuracy, F1 score,
for testing, and the remaining 80% are utilized for training. sensitivity, and specificity than the methods reported in [8],
Experiment-1 using integer Haar wavelet yields an average [10], and [12]. It is worth mentioning that Zihlmann et al. [8]
accuracy of 92.37%, an F1 score of 91.63, and sensitivity and use a complex CRNN with spectrogram, making both feature
specificity of 91.84 and 92.87. Further, the subject-oriented extraction and classification computationally expensive. While
scheme is incorporated in Experiment-2, where training and Lake and Moorman [10] and Alcaraz et al. [12] use complex
test sets are entirely different. Out of the 21 subjects, sub- features, including sample entropy, relative harmonic energy,
segments from five subjects are used as testing data and the and dominant atrial frequency, yield comparable performance.
rest for training the model. Experiment-2 exhibits a more Couceiro et al. [6] use various features, including P-wave
realistic estimate of the classifier’s ability in practical scenar- absence, HRV, Kullback Leibler divergence, and wavelet trans-
ios. Further, test data always remain blind to the model in form with 12 beats, to classify AF. However, this approach
Experiment-2. It yields an average accuracy of 81.60% and yields sensitivity and specificity of 93.80% and 96.09%,
an F1 score of 82.40, along with sensitivity and specificity of respectively. It will demand higher hardware resources for
82.06 and 81.14 using the integer Haar wavelet. Finally, the computational and memory complexity of the feature extrac-
model generated using Experiment-2 is implemented further tion block and analysis of 12 ECG complexes. This makes it
due to its realistic behavior. suitable only for software platforms. Work proposed in [13]

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 27,2023 at 05:59:22 UTC from IEEE Xplore. Restrictions apply.
328 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 3, MARCH 2023

TABLE III
C OMPARISION OF THE P ROPOSED W ORK W ITH S TATE - OF - THE -A RT M ETHODS

also uses a set of complex features like log energy entropy,


peak-to-average power ratio, and wavelet transform with a 30-s
ECG sample window, making it unsuitable for wearable
devices. It is important to note that most researches
focus on software implementation, while only Andersson et
al. [22] have implemented the AF classifier on ASIC to
achieve sensitivity and specificity of 90.60% and 97.60%,
respectively.
It is worth mentioning that the work proposed in [22]
accepts a 128 RR-interval time series as input and includes
only an AF detector. The detector in [22] depends on an exter-
Fig. 10. Layout photograph of co-processor and its specifications.
nal processing unit to analyze and extract the RR-interval time
series from the ECG data. Implementation of the RR-interval
extraction block substantially increases area and power due to Andersson and Rodrigues [31] use a similar approach
its computation and memory requirements. Further, the design as stated in [22]. The model proposed in [31] takes the
in [22] employs hard-coded threshold values, which are not RR-interval series as input to extract the turning point ratio
adaptive to the ECG variations in practical scenarios. (TPR), root mean square of successive difference (RMSSD),

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 27,2023 at 05:59:22 UTC from IEEE Xplore. Restrictions apply.
PARMAR et al.: DESIGN OF DNN-BASED LOW-POWER VLSI ARCHITECTURE TO CLASSIFY AF FOR WEARABLE DEVICES 329

and Shannon entropy (SE), Later, it utilizes a threshold detec- consuming more power and hardware resources. Such complex
tor to classify a beat as normal or AF. Similar to the model classifiers are not suitable for wearable devices and can only
proposed in [22], this model does not extract the RR-interval be employed in a medical setup using a software platform.
from ECG signal, and an external processing unit is required Moreover, the method reported in [14] uses an approach
for RR-interval extraction. Further, the AF detector [31] also to evaluate performance on the blind data and reports an
uses nontunable hard-coded thresholds. accuracy of 82.7% and an F1 score of 79. This method is
The methods proposed in [22] and [31] consume less area similar to the approach adopted in Experiment-2 and yields
than ours because they implement only classifier on hardware comparable performance with our implementation. However,
using 65-nm technology and does not realize RR-interval it employs 37 time and frequency domain features, making it
extraction block. Further, their power consumptions are less inefficient to be used in wearable applications. Our proposed
than our proposed work because of using very low supply classifier is optimized not only at the algorithmic level but also
voltage (VDD) in the subthreshold (subVT ) region and low at the architectural level to be utilized in low-power wearable
operating frequency. The operating frequency of the classifier devices.
block is considered while estimating the power in the methods It is observed from Table III that our proposed classifier
proposed in [22] and [31]. Since power is proportional to VDD has better or comparable performance than other state-of-the-
and the operating frequency, the power consumptions of the art methods. However, multilead ECG data and more physi-
work reported in [22] and [31] are less. As stated above, the ological variables should be employed to obtain a medically
RR-interval series extraction is not realized in hardware in [22] acceptable device. As we know, the primary motive of the
and [31]. Due to this, the actual operating frequency of the wearable device is to alert an individual for any anomaly and
proposed designs would vary for real-time ECG analysis. Our not to provide any clinical suggestions. Therefore, the area
proposed design implements a complete end-to-end solution and power optimal proposed AF classifier can be utilized to
for AF detection, including feature extraction and classification realize wearable devices.
of ECG signal. Therefore, it can be considered as a better
choice for real-time AF detection and can be employed in any V. C ONCLUSION
wearable and portable devices.
This article proposes hardware realization of an end-to-
Additionally, Lim et al. [11] explore hardware–software
end area and power-efficient AF classifier for wearable health
co-design to achieve an accuracy of 95.30% but employ
care devices. Since abnormal atrial activity is confined in the
complex features including power spectral density, log energy
low-frequency range (<12 Hz), it is to mention that for the
entropy, wavelet transform with ANN as a classifier, and
first time, this frequency band is directly analyzed for AF
a 10-s input ECG sample window. Lim et al. [11] have
detection. Further, using the integer Haar wavelet and an effi-
prototyped the design on Intel’s DE2-115 FPGA board,
cient realization of the multilevel decomposition technique, the
which features a Cyclone IV FPGA and an onboard Nios
computational complexity of the proposed classifier is reduced
II processor. The ECG-processing blocks along with classi-
significantly. This classifier is implemented on 180-nm CMOS
fier are executed on the Nios II processor software, while
technology. It utilizes an AM with an optimized DNN to
only fast Fourier transform (FFT) is hardware accelerated
classify AF consuming 11.098 µW power at 25 kHz with
and is implemented on FPGA. Thus, the hardware–software
an accuracy of 92.37% for the class-oriented classification
co-design methodology is utilized to implement ECG feature
and 81.60% for the subject-oriented classification in real-time.
processing along with the classifier on the Nios II processor
This makes our design a highly suitable candidate for wearable
(software) and FFT on Cyclone IV FPGA (hardware). Our
health care devices.
proposed design, including the complete ECG processing and
the classifier, is realized on FPGA and then synthesized for an
ASIC implementation. Sadasivuni et al. [32] have proposed an R EFERENCES
analog machine learning classifier IC to detect sepsis and AF [1] V. Fuster et al., “ACC/AHA/ESC 2006 guidelines for the management of
from ECG signal. The classifier achieves an average accuracy patients with atrial fibrillation-executive summary: A report of the Amer-
ican College of Cardiology/American Heart Association task force on
of 98.2% for AF detection and 90.7% for predicting sepsis. practice guidelines and the European Society of Cardiology Committee
Using a 30-s ECG signal, a set of 63 time domain features are for practice guidelines (writing committee to revise the 2001 guidelines
computed for AF detection. It is important to note that only for the management of patients with atrial fibrillation),” Eur. Heart J.,
vol. 27, no. 16, pp. 1979–2030, 2006.
the ANN classifier is implemented on ASIC, while feature [2] P. Kirchhof et al., “2016 ESC guidelines for the management of
extraction is implemented OFF-chip. Extracting 63 features for atrial fibrillation developed in collaboration with EACTS,” Kardiologia
AF detection will drastically increase the feature extraction Polska, Polish Heart J., vol. 74, no. 12, pp. 1359–1469, 2016.
[3] S. Nattel, “New ideas about atrial fibrillation 50 years on,” Nature,
block’s computation complexity. The on-chip ANN classifier vol. 415, no. 6868, pp. 219–226, Jan. 2002.
[32] utilizes an area of 1.67 mm2 at 65 nm, while our [4] J. Oldgren et al., “Variations in cause and management of atrial fibrilla-
classifier, with its simplified DWT implementation and ML tion in a prospective registry of 15 400 emergency department patients
in 46 countries: The RE-LY atrial fibrillation registry,” Circulation,
model, utilizes an area of 1.18 mm2 at 180 nm. vol. 129, no. 15, pp. 1568–1576, Apr. 2014.
The proposed classifier has 8% less accuracy than the [5] P. Langley, J. P. Bourke, and A. Murray, “Frequency analysis of atrial
algorithm presented in [21]. The classifier proposed in [21] fibrillation,” in Proc. Comput. Cardiol., vol. 27, 2000, pp. 65–68.
[6] R. Couceiro, P. Carvalho, J. Henriques, M. Antunes, M. Harris, and
utilizes a feature vector of 2700 samples with a 13-layer J. Habetha, “Detection of atrial fibrillation using model-based ECG
convolution neural network, making it compute-intensive and analysis,” in Proc. 19th Int. Conf. Pattern Recognit., Dec. 2008, pp. 1–5.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 27,2023 at 05:59:22 UTC from IEEE Xplore. Restrictions apply.
330 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 3, MARCH 2023

[7] N. Sasaki et al., “Frequency analysis of atrial fibrillation from the [30] M. Jobst et al., “ZEN: A flexible energy-efficient hardware classifier
specific ECG leads V7–V9: A lower DF in lead V9 is a marker of exploiting temporal sparsity in ECG data,” in Proc. IEEE 4th Int.
potential atrial remodeling,” J. Cardiol., vol. 66, no. 5, pp. 388–394, Conf. Artif. Intell. Circuits Syst. (AICAS), Jun. 2022, pp. 214–217, doi:
2015. 10.1109/AICAS54282.2022.9869958.
[8] M. Zihlmann, D. Perekrestenko, and M. Tschannen, “Convolutional [31] O. Andersson and J. N. Rodrigues, “A 400 mV atrial fibrillation
recurrent neural networks for electrocardiogram classification,” in Proc. detector with 0.56 pJ/operation in 65 nm CMOS,” in Proc. IEEE
Comput. Cardiol. Conf. (CinC), Sep. 2017, pp. 1–4. Int. Symp. Circuits Syst. (ISCAS), May 2015, pp. 2628–2631, doi:
[9] S. Ladavich and B. Ghoraani, “Rate-independent detection of atrial 10.1109/ISCAS.2015.7169225.
fibrillation by statistical modeling of atrial activity,” Biomed. Signal [32] S. Sadasivuni, S. P. Bhanushali, S. S. Singamsetti, I. Banerjee,
Process. Control, vol. 18, pp. 274–281, Apr. 2015. and A. Sanyal, “Multi-task learning mixed-signal classifier for in-
[10] D. E. Lake and J. R. Moorman, “Accurate estimation of entropy in situ detection of atrial fibrillation and sepsis,” in Proc. IEEE Bio-
very short physiological time series: The problem of atrial fibrillation med. Circuits Syst. Conf. (BioCAS), Oct. 2021, pp. 1–4, doi: 10.1109/
detection in implanted ventricular devices,” Amer. J. Physiol.-Heart BioCAS49922.2021.9644994.
Circulatory Physiol., vol. 300, no. 1, pp. H319–H325, Jan. 2011. [33] W. J. Tompkins, Biomedical Digital Signal Processing.
[11] H. W. Lim, Y. W. Hau, M. A. Othman, and C. W. Lim, “Embedded Upper Saddle River, NJ, USA: Prentice-Hall, 1993.
system-on-chip design of atrial fibrillation classifier,” in Proc. Int. SoC
Design Conf. (ISOCC), Nov. 2017, pp. 90–91.
[12] R. Alcaraz, F. Sandberg, L. Sörnmo, and J. J. Rieta, “Classification Rushik Parmar received the B.E. degree in
of paroxysmal and persistent atrial fibrillation in ambulatory ECG electronics and telecommunication from the
recordings,” IEEE Trans. Biomed. Eng., vol. 58, no. 5, pp. 1441–1449, Institute of Engineering and Technology, DAVV,
May 2011. Indore, India, in 2019.
[13] S. Asgari, A. Mehrnia, and M. Moussavi, “Automatic detection of He worked as a Software Engineer with
atrial fibrillation using stationary wavelet transform and support vector Accenture Digital, Mumbai, Maharashtra, India,
machine,” Comput. Biol. Med., vol. 60, pp. 132–142, May 2015. from 2019 to 2020. He is currently a Research
[14] R. Mahajan, R. Kamaleswaran, J. A. Howe, and O. Akbilgic, “Cardiac Scholar with IIT Guwahati, Guwahati, Assam,
rhythm classification from a short single lead ECG recording via random India. His research areas include VLSI system
forest,” in Proc. Comput. Cardiol. Conf. (CinC), Sep. 2017, pp. 1–4. design, machine learning, and VLSI architectures
[15] M. Janveja and G. Trivedi, “An area and power efficient VLSI archi- for biomedical applications.
tecture for ECG feature extraction for wearable IoT healthcare applica-
tions,” Integration, vol. 82, pp. 96–103, Jan. 2022.
[16] S. G. Mallat, “A theory for multiresolution signal decomposition: The
wavelet representation,” in Fundamental Papers in Wavelet Theory. Meenali Janveja (Student Member, IEEE) received
Princeton, NJ, USA: Princeton Univ. Press, 2009, pp. 494–513. the B.Tech. degree in electronics and communication
[17] J. Y. L. Low and C. C. Jong, “Unified Mitchell-based approximation for engineering from the Government Women Engineer-
efficient logarithmic conversion circuit,” IEEE Trans. Comput., vol. 64, ing College, Ajmer, India, in 2013, and the M.Tech.
no. 6, pp. 1783–1797, Jun. 2015. degree in VLSI design from Indira Gandhi Delhi
[18] M. Ito, D. Chinnery, and K. Keutzer, “Low power multiplication algo- Technical University for Women, India, in 2016.
rithm for switching activity reduction through operand decomposition,” She worked as an Assistant Professor with the
in Proc. 21st Int. Conf. Comput. Design, 2003, pp. 21–26. Department of Electronics and Communication
[19] G. Moody, “A new method for detecting atrial fibrillation using RR Engineering, G.L. Bajaj Institute of Technology and
intervals,” in Proc. Comput. Cardiol., 1983, pp. 227–230. Management, India, from 2016 to 2017. She is
[20] Virtex-7 FPGA Family. Accessed: Sep. 15, 2022. [Online]. Available: currently a Research Scholar with the Department of
https://www.xilinx.com/products/silicon-devices/fpga/virtex-7.html Electronics and Electrical Engineering, IIT Guwahati, Guwahati, India. Her
[21] S. Nurmaini et al., “Robust detection of atrial fibrillation from short-term research areas include digital VLSI design, computer architecture, machine
electrocardiogram using convolutional neural networks,” Future Gener. learning, and VLSI for biomedical signal processing.
Comput. Syst., vol. 113, pp. 304–317, Dec. 2020.
[22] O. Andersson, K. H. Chon, L. Sörnmo, and J. N. Rodrigues, “A 290 mV
sub-VT ASIC for real-time atrial fibrillation detection,” IEEE Trans.
Biomed. Circuits Syst., vol. 9, no. 3, pp. 377–386, Jun. 2015. Jan Pidanic (Senior Member, IEEE) was born in
[23] G. B. Moody and R. G. Mark, “The impact of the MIT-BIH arrhythmia 1979. He received the M.Sc. and Ph.D. degrees from
database,” IEEE Eng. Med. Biol. Mag., vol. 20, no. 3, pp. 45–50, the University of Pardubice, Pardubice, Czechia, in
May/Jun. 2001. 2005 and 2012, respectively.
[24] M. Janveja, R. Parmar, M. Tantuway, and G. Trivedi, “A DNN-based His research interests include signal processing in
low power ECG co-processor architecture to classify cardiac arrhythmia passive radar systems, bistatic radars, clutter model-
for wearable devices,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 69, ing, and optimization of signal processing algorithms
no. 4, pp. 2281–2285, Apr. 2022. with parallel processing techniques.
[25] H. Saadat, H. Bokhari, and S. Parameswaran, “Minimally biased mul-
tipliers for approximate integer and floating-point multiplication,” IEEE
Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 37, no. 11,
pp. 2623–2635, Nov. 2018.
[26] J. Lee, B. A. Reyes, D. D. McManus, O. Mathias, and K. H. Chon,
“Atrial fibrillation detection using an iPhone 4S,” IEEE Trans. Biomed. Gaurav Trivedi (Member, IEEE) received the Ph.D.
Eng., vol. 60, no. 1, pp. 203–206, Jan. 2013. degree in electrical engineering from IIT Bombay,
[27] K. K. Parhi and T. Nishitani, “VLSI architectures for discrete wavelet Mumbai, Maharashtra, India, in 2007.
transforms,” in IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 1, He is currently an Associate Professor with the
no. 2, pp. 191–202, Jun. 1993, doi: 10.1109/92.238416. Department of Electronics and Electrical Engineer-
[28] M. Vishwanath, R. M. Owens, and M. J. Irwin, “VLSI architectures ing, IIT Guwahati, Guwahati, India. He worked as
for the discrete wavelet transform,” in IEEE Trans. Circuits Syst. II, a Senior Member of Technical Staff with Cadence
Analog Digit. Signal Process., vol. 42, no. 5, pp. 305–316, May 1995, Design System India Pvt. Ltd., Noida, UP, India and
doi: 10.1109/82.386170. Mentor-Siemens (Earlier Berkeley Design Automa-
[29] J. Loh, J. Wen, and T. Gemmeke, “Low-cost DNN hardware tion India), Bengaluru, India, for three years, and as
accelerator for wearable, high-quality cardiac arrythmia detec- a Post-Doctoral Fellow at IIT Bombay, for two years.
tion,” in Proc. IEEE 31st Int. Conf. Appl.-Specific Syst., Archi- His research interests include VLSI CAD, semiconductor devices, digital and
tectures Processors (ASAP), Jul. 2020, pp. 213–216, doi: 10.1109/ analog circuit design, high-performance computing, computer architecture and
ASAP49362.2020.00042. algorithms, embedded and IoT, and quantum computing.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 27,2023 at 05:59:22 UTC from IEEE Xplore. Restrictions apply.

You might also like