Sensors 23 09612 v2
Sensors 23 09612 v2
Sensors 23 09612 v2
Article
Analog Convolutional Operator Circuit for Low-Power
Mixed-Signal CNN Processing Chip
Malik Summair Asghar 1,2 , Saad Arslan 3 and HyungWon Kim 1, *
1 Department of Electronics, College of Electrical and Computer Engineering, Chungbuk National University,
Cheongju 28644, Republic of Korea
2 Department of Electrical and Computer Engineering, COMSATS University Islamabad, Abbottabad Campus,
University Road, Tobe Camp, Abbottabad 22044, Pakistan
3 TSY Design (Pvt.) Ltd., Islamabad 44000, Pakistan
* Correspondence: [email protected]
Abstract: In this paper, we propose a compact and low-power mixed-signal approach to imple-
menting convolutional operators that are often responsible for most of the chip area and power
consumption of Convolutional Neural Network (CNN) processing chips. The convolutional opera-
tors consist of several multiply-and-accumulate (MAC) units. MAC units are the primary components
that process convolutional layers and fully connected layers of CNN models. Analog implementation
of MAC units opens a new paradigm for realizing low-power CNN processing chips, benefiting from
less power and area consumption. The proposed mixed-signal convolutional operator comprises
low-power binary-weighted current steering digital-to-analog conversion (DAC) circuits and accumu-
lation capacitors. Compared with a conventional binary-weighted DAC, the proposed circuit benefits
from optimum accuracy, smaller area, and lower power consumption due to its symmetric design.
The proposed convolutional operator takes as input a set of 9-bit digital input feature data and
weight parameters of the convolutional filter. It then calculates the convolutional filter’s result and
accumulates the resulting voltage on capacitors. In addition, the convolutional operator employs a
novel charge-sharing technique to process negative MAC results. We propose an analog max-pooling
circuit that instantly selects the maximum input voltage. To demonstrate the performance of the
proposed mixed-signal convolutional operator, we implemented a CNN processing chip consisting
of 3 analog convolutional operators, with each operator processing a 3 × 3 kernel. This chip contains
Citation: Asghar, M.S.; Arslan, S.; 27 MAC circuits, an analog max-pooling, and an analog-to-digital conversion (ADC) circuit. The
Kim, H. Analog Convolutional mixed-signal CNN processing chip is implemented using a CMOS 55 nm process, which occupies a
Operator Circuit for Low-Power silicon area of 0.0559 mm2 and consumes an average power of 540.6 µW. The proposed mixed-signal
Mixed-Signal CNN Processing Chip. CNN processing chip offers an area reduction of 84.21% and an energy reduction of 91.85% com-
Sensors 2023, 23, 9612. https:// pared with a conventional digital CNN processing chip. Moreover, another CNN processing chip is
doi.org/10.3390/s23239612
implemented with more analog convolutional operators to demonstrate the operation and structure
Academic Editor: Alfio Dario Grasso of an example convolutional layer of a CNN model. Therefore, the proposed analog convolutional
operator can be adapted in various CNN models as an alternative to digital counterparts.
Received: 31 October 2023
Revised: 24 November 2023
Keywords: mixed-signal convolutional operation; analog multiplier; neural network accelerator;
Accepted: 28 November 2023
convolutional neural network; artificial intelligence; neuromorphic engineering
Published: 4 December 2023
to ensure the longevity of the devices [1]. As a result, there is a trend towards exploring
high-performance neural processing units or accelerators with low power consumption.
Recently, neuromorphic architectures have been developed upon non-von Neuman
architecture that can emulate the biological human brain network. Compared with tra-
ditional CPU/GPU designs established upon von Neuman architecture, neuromorphic
architectures often provide superior power efficiency and parallelism [2]. Recent presenta-
tions include several neuromorphic and accelerated systems that make use of SNN [3–6]
and CNN architectures [7–9].
Computing convolutional operations in the digital domain involves multipliers and
adders, i.e., the Multiply-and-Accumulate (MAC) operation. For concurrent processing,
the number of multipliers required equals the filter size, which can result in large area
consumption. Moreover, summing the outputs of these multipliers involves multiple
cascaded adders. Thus, digital MAC units occupy a massive area with higher power
consumption. This area and power constraint caused the researcher’s interest to drift
towards finding the new paradigm of analog kernels for CNN, which can not only perform
convolution but can also occupy significantly less area and consume less power. Therefore,
exploring unconventional architectures for the MAC unit is necessary.
A swarm of recent studies has focused on developing accelerators for CNNs [10–15],
which attempt to improve the area, power consumption, and delay. In addition, some
researchers are exploring mixed-signal approaches for CNNs [12,15], where some are
integrating the analog compute units directly with the image sensor [13,15].
A 3 × 3 analog Convolutional Unit (CU) is implemented in [12], which requires
differential analog input for weights and image values. Similarly, the analog CU of [10]
is not a good choice for directly replacing the conventional digital CUs as it requires
additional DACs to convert the filters and image values to analog. An analog light-weight
CNN integrated with a CMOS image sensor is presented in [13], capable of performing
face detection. In [13], only a 2 × 2 switched-capacitor CU is realized, which can be
inadequate for even slightly complex feature extraction applications. A mixed-signal
cellular neural network accelerator is presented in [14], targeting MNIST and CIFAR-10
datasets. Reference [14], however, does not natively support filter sizes larger than 3 × 3.
Moreover, the cellular structure of [14] is not compatible with fully connected layers.
This paper presents and implements a compact mixed-signal CNN processing chip in a
55 nm CMOS process. The proposed analog convolutional operator (ACO) is implemented
for CNNs, comprising low-power MAC units, which directly expect digital inputs for
weights/filter values and image pixels. A compact and low-power multi-channel analog
convolutional operator unit (ACU) is proposed and implemented, consisting of three con-
volutional operators, a max-pooling circuit, and an ADC circuit to replace conventional
digital processing elements inside the CNN. The proposed ACO can also be adapted to fully
connected layers. Furthermore, an example convolutional layer based on a 3 × 3 convolu-
tional kernel of a CNN model has been constructed in a CNN processing chip to illustrate
the structure and functionality of the proposed ACU. This paper elucidates the architecture
and design methodology for mixed-signal CNN processing chip implementation, elemental
circuit designs, and simulation results. Firstly, in Section 2, the complete architecture of
the mixed-signal CNN processing chip implementation is illustrated. Subsequently, in
Section 3, the design and implementation of the underlying CMOS circuits for the pro-
posed analog convolutional operator are presented and validated by simulations. Section 4
describes the complete implementation of the proposed mixed-signal CNN processing
chip and simulation results. Finally, in Section 5, the overall performance is discussed and
compared with other digital and analog CNN accelerators before a conclusion is drawn in
Section 6.
Sensors 2023, 23, 9612 3 of 17
Figure 1. Mixed-signal block diagram of the architecture of the analog convolutional operator.
Figure 2. Multi-channel analog convolutional operator unit comprising three input channels.
Finally, the output of the max-pooling operation is converted to digital value using a
lower-power Successive Approximation Register (SAR) ADC. In the proposed analog con-
volutional operator unit, moving the max-pooling layer before the ADC and right after the
four convolutional operations exploits the benefits of analog MACs while offering at least
four times the reduction in analog-to-digital conversions. Moreover, this will inherently
curtail the precision requirements of the system by discarding the small computational
values. The proposed analog convolutional operator unit integrates the operations of a
MAC, a max-pooling, and an ADC to perform one-shot calculations for the convolutional
and max-pooling layers. Therefore, the proposed analog convolutional operator unit finds
the advantages of area, power, energy, and speed.
Figure 3. Analog MAC unit: (a) current steering DAC-based multiplier; (b) accumulator summing
and integrating all the multipliers output currents.
The accumulator circuit that follows the multipliers is shown in Figure 3b. The
accumulator circuit converts output voltages from n multiplier circuits into corresponding
currents. The outputs OUTx of n multipliers are connected to the nodes OUT0, OUT1, . . .,
OUTn of the accumulator. These currents are summed together at node ‘x’, which is
used to charge the accumulation capacitor CACC . Before the start of computation, the
accumulation capacitor is discharged through NMOS Mreset by applying the reset signal.
After accumulation, the expression for the final capacitor voltage can be derived from
Equation (1).
Q
VCACC. = (1)
C ACC.
Here, Q represents the charge stored in the capacitor while the inputs are applied and can
be expressed as Equation (2).
Z T
Q= Ix · dt = Ix × T (2)
0
In Equation (2), Ix represents the summed current, and T is the duration for which
this current is applied (multiplier has valid inputs). Therefore, the final voltage on the
accumulation capacitor can be expressed by Equation (3).
Ix × T
VCACC. = (3)
C ACC.
Mirror
Current Mirror
Current Mirror
IA IA
IA·B IA·B
α·IA α·IA
Current
Operand A(MSBs) Operand A(LSBs) Operand B(MSBs) Operand B(LSBs)
M0 M1 M2 M3 M10 M11 M12 M13 M21 M22 M23 M24 M33 M34 M35 M36
A[4]
A[0]
B[4]
B[0]
A[7]
A[6]
A[5]
A[3]
A[2]
A[1]
B[5]
B[7]
B[6]
B[3]
B[2]
B[1]
VBN2H VBN2L
8x 4x 2x 1x 8x 4x 2x 1x M20 8x 4x 2x 1x M32 8x 4x 2x 1x
VBN1H M4 M5 M6 M7 VBN1L M14 M15 M16 M17 VBIASH M25 M26 M27 M28 VBIASL M37 M38 M39 M40
GROUND
from two
Figure 4. The 8-bit multiplier design constituted from two 4-bit
4-bit current
current steering
steering DAC
DAC circuits.
circuits.
8x
4x
15um
1x
25um
(a) (b)
Figure 5. (a) The 8-bit multiplier layout design consisting of four 4-bit current steering DAC circuits;
Figure
and (b)5. (a)layout
the The 8-bit multiplier
design layout
of a 4-bit DACdesign
circuit.consisting of four 4-bit current steering DAC circuits;
and (b) the layout design of a 4-bit DAC circuit.
3.1.2. Simulation Results for the Multiplier
3.1.2.The
Simulation
proposed Results
8-bit for the Multiplier current steering multiplier was simulated to
binary-weighted
verifyThe
theproposed
operation,8-bit
andbinary-weighted
the results are current steering
illustrated multiplier
in Figure 6. Here,wasthe
simulated to ver-
incrementing
ify the operation,
digital input data,and withthe results
a step are
size ofillustrated in Figure
15, are applied, 6. Here,
shared by both the multiplier
incrementing digital
operands.
input
The data, withshows
simulation a stepthe
size of 15,current
output are applied, shared
I A× B of by both multiplier
the multiplier operands.
(blue waveform) andThethe
simulation
digital shows
product the output
results current
of operands 𝐼𝐴×𝐵 Bof
A and thewaveform).
(red multiplier (blue
It canwaveform)
be observed and theupon
that dig-
applying
ital productthe results
digital input data, theAoutput
of operands and B current rapidly transitions
(red waveform). It can betoobserved
a value that
thatclosely
upon
matches
applyingthe theproduct
digitalofinput
operands
data,Athe
andoutput
B. In addition,
current to have low
rapidly power consumption,
transitions to a value thatall
the MOSFETs
closely matches used
theinproduct
the MAC ofunit have high
operands A andthreshold voltage, which
B. In addition, to haveoffers
low low leakage
power con-
and static current.
sumption, Since the circuit
all the MOSFETs used deals
in thewith
MAC nano-amperes
unit have high of current,
thresholdanyvoltage,
leakage would
which
significantly impact the accuracy of the result. The proposed 8-bit multiplier
offers low leakage and static current. Since the circuit deals with nano-amperes of current, consumes
1.44
any µW of power
leakage would to significantly
multiply two impact
operands thewith a maximum
accuracy of the product
result. Thevalue of 255 ×8-bit
proposed 255.
Sensors 2023, 23, x FOR PEER REVIEW 7 of 18
Sensors 2023, 23, 9612 multiplier consumes 1.44μW of power to multiply two operands with a maximum prod-
7 of 17
uct value of 255 × 255.
1.25 70
Multiplier output current(IA×B )
1 55
Digital Code for A×B
Code (k)
75 45
I (uA) 50 30
25 15
0.0 0
0.0 50.0 100.0 150.0 200.0 250.0 300.0
time (ns)
Simulationresults
Figure6.6.Simulation
Figure resultsfor
forcurrent
currentsteering
steeringDAC-based
DAC-basedmultiplier’s
multiplier’soutput
outputcurrent
currentagainst
againstthe
the
digital code.
digital code.
3.1.3. Accumulator Circuit Design
3.1.3. Accumulator Circuit Design
The accumulator circuit in the proposed convolutional operator is designed using
The accumulator circuit in the proposed convolutional operator is designed using
MOSFETs to steer the output current of the multiplier to be accumulated upon capacitors,
MOSFETs to steer the output current of the multiplier to be accumulated upon capacitors,
as shown in Figure 3b. In the example circuit presented in this work, a four-capacitor array
as shown in Figure 3b. In the example circuit presented in this work, a four-capacitor array
CACC is implemented to store the accumulation value of four different input image pixel
Cvalues
ACC is implemented to store the accumulation value of four different input image pixel
convolved with filter values with stride movement value 1. The four capacitors in
values
array Cconvolved with filter values with stride movement value 1. The four capacitors in
ACC accumulate, respectively, the current amount representing the four neighboring
array C ACC accumulate, respectively, the current amount representing the four neighbor-
convolution results, which are passed over to the max-pooling circuit to select the maximum
ing convolution
of the four convolutionresults,results.
which Each
are passed over tooperation
convolutional the max-pooling
is conducted circuit
overtoaselect the
set of data
maximum of the four
in a 3 × 3 matrix of Cconvolution
channels outresults. Eachinput
of the total convolutional
data channels,operation
whereisCconducted
is set to threeoverin
aour
set implementation
of data in a 3 × 3in matrix
this work. For the first layer of the CNN, the input data matrix isCthe
of C channels out of the total input data channels, where is
set to three in our implementation in this work. For the first
input image, while for other layers of CNN, the input data are the feature data produced layer of the CNN, the input
data
by thematrix is thelayer.
previous inputFirstly,
image,the while
firstfor other layers
convolution of CNN,
result the input
is obtained data areThe
as follows: the three
fea-
ture data produced by the previous layer. Firstly, the first
ACOs of the proposed analog convolutional operator unit simultaneously convolve the convolution result is obtained
as follows:
three 3 × 3The three
filter ACOs
kernels of the
over the proposed
top-left 3 × analog
3 matrix convolutional
selected from operator unit data
the input simulta-
with
neously
three inputconvolve the three
channels. The3result
× 3 filter kernels
of the first over the top-left
convolution 3 × 3 matrix
is stored in the selected from
first capacitor
the input data
in array CACCwith in thethree
forminput channels.
of voltage The. result
VACC For the of second
the firstconvolution,
convolution isthe stored
3 × 3infilter
the
first capacitor in array C in the form of voltage V . For the
kernel shifts in the right direction over the input data by a stride value S and stores this
ACC ACC second convolution, the 3
×convolutional
3 filter kernelresult shiftsininthe
thesecond
right direction over the input data by a stride
capacitor in CACC . In this work, we use a stride value S value S and
stores
of 1. Inthis convolutional
a similar fashion,result
the 3 in
× the second
3 filter kernelcapacitor
movesin CACC.for
down In the
thisthird
work,convolution,
we use a stride and
value
then itS moves
of 1. Inleft a similar
for thefashion, the 3 × 3 filter
fourth convolution. kernel
These moves
results aredown
storedfor the third
in the thirdand convolu-
fourth
tion, and then
capacitors in it moves
CACC left for the fourth
, respectively. In the convolution.
implementation These of results
CACC , thearemetal–oxide–metal
stored in the third
and
(MOM)fourth capacitors
capacitors arein CACC, respectively.
employed to benefit fromIn the implementation
their higher capacitance of CACC , the metal–ox-
density and linear
ide–metal (MOM)
current–voltage capacitors
(CV) are Each
curve [17]. employed to benefit
capacitor from theirusing
is implemented higher capacitance
configurable den-
parallel
capacitors,
sity and linear so each can be configured
current–voltage to have
(CV) curve a size
[17]. Each from 300 fF is
capacitor upimplemented
to 700 fF in stepsusing ofcon-
10 fF.
Before accumulation,
figurable parallel capacitors,each capacitor’s
so each can voltage is reset totoa have
be configured reference
a sizevoltage
from 300 VreffFofupvalue
to
400fFmV
700 by a reset
in steps of 10signal provided
fF. Before by a digital
accumulation, controller.
each capacitor’s Afterward,
voltage isthe digital
reset controller
to a reference
generates
voltage Vrefaofstart
value accumulation
400 mV bysignala resetthat enables
signal the current
provided from the
by a digital ACOs toAfterward,
controller. accumulate
in the
the respective
digital controller capacitors
generatesof aCACC
start. accumulation signal that enables the current from the
ACOs to accumulate in the respective capacitors of CACC.
3.1.4. Accumulator Circuit for Negative Values
In general, MAC
3.1.4. Accumulator operations
Circuit of a convolutional
for Negative Values layer of CNN models must handle the
accumulation of both positive and negative MAC values.
In general, MAC operations of a convolutional layer of CNN Conventional analog
models must convolu-
handle the
tional operators, however, do not provide efficient ways to compute the negative
accumulation of both positive and negative MAC values. Conventional analog convolu- MAC
valuesoperators,
tional and accumulate
however, thedonegative and positive
not provide efficient MAC values
ways to into convolution
compute the negative results.
MAC
In contrast, our proposed analog convolutional operator can handle both negative and
values and accumulate the negative and positive MAC values into convolution results. In
positive MAC values as follows.
The proposed MAC unit implements a charge-sharing technique to achieve the multi-
plication of negative values, as illustrated in Figure 7. Each of the four capacitors in array
CACC is split into a pair of two capacitors, each having a capacitance of 700 fF in the current
Sensors 2023, 23, 9612 8 of 17
implementation. The first capacitor (C1) of the pair accumulates the current IW+ from
the positive multiplication values. The second capacitor (C2) of the pair accumulates the
current IW− from the negative multiplication values. For this purpose, a sign bit is added to
the 8-bit value of operands to present the operands in a 9-bit signed magnitude format. The
lower than 8-bit values are directly applied to the multiplier, while the 9-bit value (MSB),
as a sign bit, is utilized for steering the multiplication currents either onto the positive
capacitor (C1) or negative capacitor (C2). The operation comprises the sampling mode and
subtracting mode. During the sampling mode, the equivalent currents IW+ and IW− are
accumulated as voltages VC1 and VC2, respectively, in capacitors C1 and C2 connected
in parallel formation. During the subtracting mode, on the other hand, the connection of
capacitors C1 and C2 is changed to series formation, and the charge sharing generates the
subtraction result Vout at the output,
To explain the operation of the subtraction, Figure 8 illustrates the simulation result
of subtracting VC1 and VC2, which are the voltage results from two multipliers. In this
example, the first multiplier takes a maximum positive operand A and B value (255 × 255)
to produce the highest output value. In contrast, the second multiplier takes a maximum
negative and positive value (−255 × 255) to produce the lowest negative value. The
subtraction example operates as follows. Firstly, the digital controller gives a reset signal,
which discharges all the accumulation capacitors to the Vref value (400 mV). Secondly, the
digital controller gives a Start Accumulation signal, which triggers the accumulation of
currents in the capacitor pairs. As a result, the current IW+ charges the capacitor voltage
VC1 up to the equivalent VC1+, while IW − charges VC2 up to VC2+. Finally, the digital
controller gives a subtract signal, which subtracts the two voltages between VC1 and
VC2, and then generates an equivalent Vout . The proposed analog MAC unit utilizes the
existing accumulation capacitors to implement the charge-sharing technique for negative
multiplication. Therefore, it does not require extra circuitry to implement the multiplication
of the negative values, unlike the conventional methods of using complementary circuits.
Figure 9. Analog max-pooling unit: (a) implemented voltage-mode max-voltage selection circuit;
(b) simulation results for max-pooling circuit showing Vout tracking maximum Vin .
Figure 10. The block diagram of the 32-bit digital parallel interface for the mixed-signal CNN
processing chip.
Figure 11. Micrograph and the demarcated layout of the ACU chip for mixed-signal CNN processing
chip implementation.
Figure 12. Timing diagram showing the control signals and the applied input signals for the
two operands.
Figure 13 shows the simulation results of the analog max-pooling operations followed
by analog convolutional operations. After the convolutional operation of 3 × 3, it performs
a 2 × 2 max-pooling operation followed by an ADC operation. The example of Figure 13
Sensors 2023, 23, 9612 12 of 17
applies the identical operands of A [8:0] and B [8:0] to all nine multipliers in each analog
convolutional operator for the sake of simplicity. Each of the four convolutional operations
takes as input 64 × 64, 128 × 128, 64 × 64, and −32 × 32, respectively. Upon receiving
the first start accumulation signal, the multiplier produces a current value equivalent to the
product of 64 × 64 and charges the respective accumulation capacitor CACC , providing an
equivalent voltage VACC0 . Afterward, upon receiving the subtract signal, the output voltage
of 423 mV is obtained, which represents the first convolutional operation as indicated by
the brown waveform. Similarly, the second convolutional operation for the input operands
of A [8:0] = 128 and B [8:0] = 128 produces a resulting voltage VACC1 of 554 mV as indicated
by the blue waveform, which is equivalent to 128 × 128. For the third convolutional
operation, the resulting voltage VACC2 of 423 mV is obtained (purple waveform), which is
equivalent to 64 × 64. Lastly, for the fourth convolutional operation, the resulting voltage
VACC3 of 356 mV is obtained (orange waveform), which is equivalent to −32 × 32. The four
accumulation voltages corresponding to four convolutional operations are directly applied
to the max-pooling circuit, which selects the maximum voltage and produces a Vmax_out of
555 mV (red waveform). Afterward, upon receiving a start_ADC signal (black signal), the
ADC converts the max-pooling output into a 12-bit digital value. To verify the accuracy of
the ADC, an ideal DAC is added to the simulation, which converts the digital value back
to the analog value, resulting in 556 mV (green waveform). The error between the digital
and analog values is as small as 1 mV. The SNDR and ENOB of the ADC are measured as
68.45 dB and an 11.07-bit value, respectively [17].
Figure 13. Simulation results verifying the operation of analog convolutional operator unit.
Figure 14. Overall architecture of an example mixed-signal CNN chip consisting of eight analog
convolutional operator units with 3 input channels each.
Figure 15. (a) The micrograph of the mixed-signal CNN processing chip, (b) the demarcated full chip
layout, and (c) the zoomed-in structure of the one ACU.
5. Performance Analysis
To demonstrate the performance and cost of the ACU chip in comparison with con-
ventional architectures, we compare the implementation result of the ACU chip with an
implementation of a digital neural network processing unit (NPU) [20] in Table 1. The NPU
is implemented using the same 55 nm process technology as the proposed mixed-signal
CNN. The NPU comprises 288 processing elements (PEs), each of which is composed
of an 8-bit MAC operator. While the proposed ACU chip can be extended to cover the
whole CNN model, we restrict our experiment only to the first layer of the CNN model
for proof of concept. We consider an example CNN model called YOLOv2-tiny which
Sensors 2023, 23, 9612 14 of 17
consists of nine convolutional layers. For simplicity, we analyze the computation of only
the first layer of the YOLOv2-tiny model. Since the proposed ACU chip comprises 27 MAC
units with a 9-bit configuration for a three-channel 3 × 3 convolutional filter, we have
downscaled the digital NPU to a small implementation consisting of 27 MAC PEs to make
a fair comparison.
Table 1. Performance comparison of the proposed analog convolutional operator unit with digi-
tal counterpart.
Table 1 compares the area, power consumption, and energy consumption of the ACU
chip and the NPU implementations. It can be observed that the proposed ACU chip
consumes 85% less chip area and 72.4% less power consumption. For the digital NPU with
the full CNN model, it takes 137.9 ms for inference computations through all the layers for
a 416 × 416 input image when operating at 200 MHz. After scaling down for the first layer
of the CNN model, the digital NPU consumes an average energy of 321.72 µJ. On the other
hand, for the case of the proposed ACU chip, it takes 48.5 ms, when operating at 200 MHz,
for inference computations of the first layer of the CNN model for a 416 × 416 input image.
Therefore, it consumes 26.19 µJ of average energy, which is 92% less than the energy
required by the digital NPU. Hence, the proposed analog convolutional operator unit can
provide a significant reduction in area, power, and energy consumption compared with its
digital counterpart.
Table 2 compares this work with other state-of-the-art analog implementations of
convolutional operators. Here, the chip area and power efficiencies are calculated based on
the method described in [21]. The mixed-signal cellular neural network presented in [14]
realizes the AlexNet CNN model and achieves high computation speed (GOPS). However,
it is less energy efficient than other works due to the use of operational transconductance
amplifiers as a primary computing unit. The hybrid architecture of [22] incorporates
64 analog convolutional operators integrated with an on-chip CMOS image sensor array
for object detection. It occupies a considerable amount of chip area, which is 123× less
area efficient than the proposed ACU chip. Its analog convolutional operator constitutes
a 4-bit multiplier, which consumes a high power of 18.75 µW, leading to a high energy
consumption of 61.98 µJ. In contrast, the proposed MAC unit consumes only 1.44 µW of
power, leading to an energy consumption of 26.19 µJ, which makes it 2.5× more energy
efficient. The CMOS image sensor integrated with a light-weight CNN presented in [13]
consumes less power and thus provides relatively high energy efficiency. However, it
suffers from excessively low computation speed due to its low frequency of 2 × 2 kernel
operations. Furthermore, it requires excessive chip area as it constitutes 180 computing
units, each occupying 4250 µm2 of area, integrated with an on-chip pixel memory. The
in-memory computing circuit based on capacitors presented in [23] employs an energy recy-
cling technique to achieve high power efficiency. However, our proposed ACU chip area is
2× smaller than [23] when normalized to a 28 nm technology node with the same number
of MAC operators. Moreover, the evolving state-of-the-art quantization techniques [20]
and the recent low-precision accelerators [22,23] pave the way for analog circuits to operate
without requiring high resolutions. The proposed ACU chip demonstrates a relatively
smaller chip area and lower power consumption than most of the previous works. There-
fore, the proposed architecture is a promising alternative to conventional digital NPUs and
previous analog convolutional operator circuits.
Sensors 2023, 23, 9612 15 of 17
6. Conclusions
This work proposes a mixed-signal CNN processing chip implementation that aims to
replace the digital convolutional units of conventional CNN accelerators. The proposed
analog MAC unit comprising symmetric binary-weighted current steering DAC circuits
offers a better matching compact and low power consumption design. The proposed
3 × 3 analog convolutional operator unit tightly integrates a MAC unit, a max-pooling
circuit, and an ADC to perform convolutional operations. The pooling operation before
an ADC in the analog domain reduces the number of ADCs and improves the speed by
one-shot convolutional computations. Therefore, the proposed implementation consumes
26.19 µJ of energy, which is 92% less than the fully digital NPU implementation. The
presented analog implementation occupies 0.0559 mm2 of chip area and consumes 540 µW
of power. Hence, the mixed-signal CNN system-on-chip (SoC) promises to be a beneficial
replacement as a computing unit in conventional digital CNNs.
Author Contributions: Conceptualization, M.S.A. and H.K.; methodology, M.S.A., S.A. and H.K.;
software, M.S.A. and S.A.; validation, M.S.A. and S.A.; formal analysis, M.S.A.; investigation, M.S.A.;
writing—original draft preparation, M.S.A.; writing—review and editing, S.A. and H.K.; supervision,
H.K.; project administration, H.K.; funding acquisition, H.K. All authors have read and agreed to the
published version of the manuscript.
Funding: This work was supported by the National Research Foundation of Korea (NRF) grant for
RLRC funded by the Korean government (MSIT) (No. 2022R1A5A8026986, RLRC), and was also sup-
ported by the Institute of Information & communications Technology Planning & Evaluation (IITP)
grant funded by the Korea government (MSIT) (No. 2020-0-01304, Development of Self-learnable
Mobile Recursive Neural Network Processor Technology). It was supported by the MSIT (Ministry of
Science and ICT), Korea, under the Grand Information Technology Research Center support program
(IITP-2023-2020-0-01462, Grand-ICT) supervised by the IITP (Institute for Information & commu-
nications Technology Planning & Evaluation). This research was supported by the National R&D
Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science
and ICT (No. 2020M3H2A1076786, System Semiconductor specialist nurturing), and supported by
the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT)
(No. 2021R1F1A1061314). In addition, this work was partly supported by Institute of Information &
communications Technology Planning & Evaluation (IITP) grant funded by the Korea government
Sensors 2023, 23, 9612 16 of 17
(MSIT) (2020-0-01077, Development of Intelligent SoC having Multimodal IOT Interface for Data
Sensing, Edge computing analysis and Data sharing).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data are contained within the article.
Acknowledgments: We appreciate the collaboration and help provided by Muhammad Junaid for
the design of the Digital Controller.
Conflicts of Interest: Author Saad Arslan was employed by the company TSY Design (Pvt.) Ltd.
The remaining authors declare that the research was conducted in the absence of any commercial or
financial relationships that could be construed as a potential conflict of interest.
References
1. Kankanhalli, A.; Charalabidis, Y.; Mellouli, S. IoT and AI for smart government: A research agenda. Gov. Inf. Q. 2019, 36, 304–309.
[CrossRef]
2. Gupta, S. Neuromorphic Hardware: Trying to Put Brain into Chips. 30 June 2019. Available online: https://towardsdatascience.
com/neuromorphic-hardware-trying-to-put-brain-into-chips-222132f7e4de (accessed on 6 February 2023).
3. Kim, H.; Hwang, S.; Park, J.; Yun, S.; Lee, J.-H.; Park, B.-G. Spiking Neural Network Using Synaptic Transistors and Neuron
Circuits for Pattern Recognition with Noisy Images. IEEE Electron Device Lett. 2018, 39, 630–633. [CrossRef]
4. Tang, H.; Kim, H.; Kim, H.; Park, J. Spike Counts Based Low Complexity SNN Architecture with Binary Synapse. IEEE Trans.
Biomed. Circuits Syst. 2019, 13, 1664–1677. [CrossRef] [PubMed]
5. Chen, G.K.; Kumar, R.; Sumbul, H.E.; Knag, P.C.; Krishnamurthy, R.K. A 4096-Neuron 1M-Synapse 3.8-pJ/SOP Spiking Neural
Network with On-Chip STDP Learning and Sparse Weights in 10-nm FinFET CMOS. IEEE J. Solid-State Circuits 2019, 54, 992–1002.
[CrossRef]
6. Asghar, M.S.; Arslan, S.; Al-Hamid, A.A.; Kim, H. A Compact and Low-Power SoC Design for Spiking Neural Network Based on
Current Multiplier Charge Injector Synapse. Sensors 2023, 23, 6275. [CrossRef]
7. Bachtiar, Y.A.; Adiono, T. Convolutional Neural Network and Maxpooling Architecture on Zynq SoC FPGA. In Proceedings of
the International Symposium on Electronics and Smart Devices (ISESD), Badung-Bali, Indonesia, 8–9 October 2019; pp. 1–5.
8. Sabogal, S.; George, A.; Crum, G. ReCoN: A Reconfigurable CNN Acceleration Framework for Hybrid Semantic Segmentation on
Hybrid SoCs for Space Applications. In Proceedings of the IEEE Space Computing Conference (SCC), Pasadena, CA, USA, 30
July–1 August 2019; pp. 41–52.
9. Halawani, Y.; Mohammad, B.; Abu Lebdeh, M.; Al-Qutayri, M.; Al-Sarawi, S.F. ReRAM-Based In-Memory Computing for Search
Engine and Neural Network Applications. IEEE J. Emerg. Sel. Top. Circuits Syst. 2019, 9, 388–397. [CrossRef]
10. Park, S.-S.; Chung, K.-S. CENNA: Cost-Effective Neural Network Accelerator. Electronics 2020, 9, 134. [CrossRef]
11. Kwon, H.; Samajdar, A.; Krishna, T. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable
Interconnects. SIGPLAN Not. 2018, 53, 461–475. [CrossRef]
12. Zhu, J.; Huang, Y.; Yang, Z.; Tang, X.; Ye, T.T. Analog Implementation of Reconfigurable Convolutional Neural Network Kernels.
In Proceedings of the 2019 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Bangkok, Thailand, 11–14 November
2019; pp. 265–268. [CrossRef]
13. Choi, J.; Lee, S.; Son, Y.; Kim, S.Y. Design of an Always-On Image Sensor Using an Analog Lightweight Convolutional Neural
Network. Sensors 2020, 20, 3101. [CrossRef] [PubMed]
14. Lou, Q.; Pan, C.; McGuinness, J.; Horvath, A.; Naeemi, A.; Niemier, M.; Hu, X.S. A Mixed Signal Architecture for Convolutional
Neural Networks. ACM J. Emerg. Technol. Comput. Syst. 2019, 15, 19. [CrossRef]
15. Wong, M.Z.; Guillard, B.; Murai, R.; Saeedi, S.; Kelly, P.H.J. AnalogNet: Convolutional Neural Network Inference on Analog
Focal Plane Sensor Processors. arXiv 2020, arXiv:2006.01765. Available online: http://arxiv.org/abs/2006.01765 (accessed on 8
November 2020).
16. Asghar, M.S.; Shah, S.A.A.; Kim, H. A Low Power Mixed Signal Convolutional Neural Network for Deep Learning SoC. IDEC J.
Integr. Circuits Syst. 2023, 9, 7–12.
17. Lee, J.; Asghar, M.S.; Kim, H. Low power 12-bit SAR ADC for Analog Convolutional Kernel of Mixed-Signal CNN Accelerator.
CMC-Comput. Mater. Contin. (CMC) 2023, 75, 4357–4375. [CrossRef]
18. Kazemi Nia, S.; Khoei, A.; Hadidi, K. High Speed High Precision Voltage-Mode MAX and MIN Circuits. J. Circuits Syst. Comput.
(JCSC) 2007, 16, 233–244.
19. Soleimani, M.; Khoei, A.; Hadidi, K.; Nia, S.K. Design of high-speed high-precision voltage-mode MAX-MIN circuits with low
area and low power consumption. In Proceedings of the 2009 European Conference on Circuit Theory and Design, Antalya,
Turkey, 23–27 August 2009; pp. 351–354. [CrossRef]
20. Son, H.; Al-Hamid, A.A.; Na, Y.; Lee, D.; Kim, H. Cnn accelerator using proposed diagonal cyclic array for minimizing memory
accesses. Comput. Mater. Contin. 2023, 76, 1665–1687. [CrossRef]
Sensors 2023, 23, 9612 17 of 17
21. Åleskog, C.; Grahn, H.; Borg, A. Recent Developments in Low-Power AI Accelerators: A Survey. Algorithms 2022, 15, 419.
[CrossRef]
22. Kim, J.-H.; Kim, C.; Kim, K.; Yoo, H.-J. An Ultra-Low-Power Analog-Digital Hybrid CNN Face Recognition Processor Integrated
with a CIS for Always-on Mobile Devices. In Proceedings of the 2019 IEEE International Symposium on Circuits and Systems
(ISCAS), Sapporo, Japan, 26–29 May 2019; pp. 1–5. [CrossRef]
23. Zhang, B.; Saikia, J.; Meng, J.; Wang, D.; Kwon, S.; Myung, S.; Kim, H.; Kim, S.J.; Seo, J.-S.; Seok, M. A 177 TOPS/W, Capacitor-
based In-Memory Computing SRAM Macro with Stepwise-Charging/Discharging DACs and Sparsity-Optimized Bitcells for
4-Bit Deep Convolutional Neural Networks. In Proceedings of the 2022 IEEE Custom Integrated Circuits Conference (CICC),
Newport Beach, CA, USA, 24–27 April 2022; pp. 1–2. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.