A Review of 018-spl Mum Full Adder Performances Fo
A Review of 018-spl Mum Full Adder Performances Fo
A Review of 018-spl Mum Full Adder Performances Fo
net/publication/3337651
Article in IEEE Transactions on Very Large Scale Integration (VLSI) Systems · July 2005
DOI: 10.1109/TVLSI.2005.848806 · Source: IEEE Xplore
CITATIONS READS
456 3,387
3 authors, including:
All content following this page was uploaded by Chip-Hong Chang on 03 July 2014.
Date 2005
URL http://hdl.handle.net/10220/6013
Abstract—The general objective of our work is to investigate the the development of cell libraries. They are likely to perpetuate
area and power-delay performances of low-voltage full adder cells the ability to further reduce the cost-per-function and improve
in different CMOS logic styles for the predominating tree struc- the performance of integrated circuits. With the lowering of
tured arithmetic circuits. A new hybrid style full adder circuit is
threshold voltage in ultra deep submicron technology, lowering
also presented. The sum and carry generation circuits of the pro-
posed full adder are designed with hybrid logic styles. To operate at the supply voltage appears to be the most eminent means to
ultra-low supply voltage, the pass logic circuit that cogenerates the reduce power consumption. However, lowering supply voltage
intermediate XOR and XNOR outputs has been improved to over- also increases circuit delay and degrades the drivability of cells
come the switching delay problem. As full adders are frequently designed with certain logic styles. Recently, clustered voltage
employed in a tree structured configuration for high-performance scaling (CVS) and dual voltage supply (dual-VS) schemes have
arithmetic circuits, a cascaded simulation structure is introduced
been proposed to maintain the chip throughput by selectively
to evaluate the full adders in a realistic application environment.
A systematic and elegant procedure to scale the transistor for min- lowering the supply voltage for noncritical subcircuits [1], [2].
imal power-delay product is proposed. The circuits being studied For such techniques to be effective, it is imperative that the
are optimized for energy efficiency at 0.18- m CMOS process tech- performances of the basic cells dominating the critical path
nology. With the proposed simulation environment, it is shown that be characterized in the targeted technology and application
some survival cells in stand alone operation at low voltage may fail environment over various ranges of supply voltage.
when cascaded in a larger circuit, either due to the lack of driv-
Full adder is the core element of complex arithmetic circuits
ability or unsatisfactory speed of operation. The proposed hybrid
full adder exhibits not only the full swing logic and balanced out- like addition, multiplication, division, exponentiation, etc.
puts but also strong output drivability. The increase in the tran- [3]–[6]. The role of full adders in computer arithmetic can
sistor count of its complementary CMOS output stage is compen- be classified into two main categories. One category involves
sated by its area efficient layout. Therefore, it remains one of the the chain structured applications [7], [8], such as ripple carry
best contenders for designing large tree structured arithmetic cir- adders (RCA) and array multipliers. In these applications, the
cuits with reduced energy consumption while keeping the increase
critical path often traverses from the carry-in to the carry-out
in area to a minimum.
of the full adders. It is demanded that the generation of the
Index Terms—Adders, CMOS digital integrated circuits, digital carry-out signal is fast. Otherwise, the slower carry-out gener-
arithmetic, logic devices.
ation will not only extend the worst case delay, but also create
more glitches in the later stages, hence, dissipate more power.
I. INTRODUCTION The other category involves the tree structured applications,
which is frequently used in Wallace–Dadda tree multipliers
the lower stages. The published literatures on full adder circuit to the complementary transistor pairs and smaller number of in-
optimization pay no attention to the specific layout require- terconnecting wires.
ments and the stringent drivability of the latter application. The complementary pass transistor logic (CPL) [8] full adder
In this paper, we target the tree structured application for the with swing restoration is shown in Fig. 1(b). Its dual-rail structure
evaluation of full adders with the optimization and simulation uses 32 transistors (henceforth “ transistors” is abbreviated as
pursued in the proposed tree structure simulation environment. ). The basic difference between the pass-transistor logic and
The goal to extend the battery life span of portable electronics the complementary CMOS logic styles is that the source side of
is to reduce the energy expended per arithmetic operation, but the pass logic transistor network is connected to some input sig-
low-power consumption need not necessarily implies low en- nals instead of the power lines [12], [13]. The advantage is that
ergy. To execute an arithmetic operation, a circuit can consume one pass-transistor network (either pMOS or nMOS) is sufficient
very low power by clocking at extremely low frequency but it to implement the logic function, which results in smaller number
may take a very long time to complete the operation. One of the of transistors and smaller input load. However, pass-transistor
objectives of our work is to investigate, as supply voltage re- logic has an inherent threshold voltage drop problem. The output
duces, the optimal energy efficiency of the full adders designed is a weak logic “1” when “1” is passed through a nMOS and is a
with different logic styles based on the same 0.18- m CMOS weak logic “0” when “0” is passed through a pMOS. Therefore,
process technology. We measure the energy consumption by the output inverters are also used to ensure the drivability.
product of average power and worst case delay. The power-delay A transmission function full adder (TFA) [14] based on the
product (PDP) represents a tradeoff to be optimized between transmission function theory is shown in Fig. 1(c). A transmis-
two conflicting criteria of power dissipation and circuit latency sion-gate adder (TGA) [15] using CMOS transmission gates is
in transistor sizing. In this paper, a systematic and unified ap- shown in Fig. 1(d). Transmission gate logic circuit is a special
proach to size the transistors of different full adder cells to opti- kind of pass-transistor logic circuit [13], [15]. It is built by con-
mize their power-delay product performance is suggested. The necting a pMOS transistor and a nMOS transistor in parallel,
proposed transistor sizing procedure is proven to be conver- which are controlled by complementary control signals. Both the
gent. Through the review of the pros and cons of various CMOS pMOS and nMOS transistors will provide the path to the input
logic design styles, a new hybrid full adder cell constructed with logic “1” or “0,” respectively, when they are turned on simulta-
mixed logic styles in its constituent modules is proposed. The neously. Thus, there is no voltage drop problem whether the 1 or
the 0 is passed through it. The main disadvantage of transmission
proposed hybrid full adder features balanced outputs, making
gate logic is that it requires double the number of transistors of
it easy for large tree structured arithmetic circuits to maximize
the standard pass-transistor logic or more to implement the same
area efficiency without unduly degrading the VLSI power and
circuit. Smaller transistor count adder circuits have been pro-
delay. It dissipates next to the lowest energy and occupies the
posed, most of which exploit the nonfull swing pass transistors
smallest layout area among all simulated full adder cells that
with swing restored transmission gate techniques. This is ex-
are operable below 1 V.
emplified by the state-of-the-art design of 14 T in Fig. 1(e) [16]
The rest of this paper is organized as follows. Section II ex-
and 10 T in Fig. 1(f) [17]. These adders differ in their transistor
plores the full adder designs in different logic styles. In Sec-
counts and the way their intermediate nodes are generated.
tion III, the proposed hybrid full adder cell is analyzed in three
Evidently, different logic styles tend to favor one performance
constituent modules. To optimize and analyze the performance aspect at the expense of the other. Layout regularity and wiring
of the different full adders, a tree structured setup is proposed complexity have also nonnegligible impact on the cost/perfor-
for the simulation environment in Section IV. A transistor opti- mance for large arithmetic circuit that uses many such instances.
mization procedure is also described. In Section V, the circuits To summarize, the following performance criteria are considered
are simulated for power, delay and power-delay product perfor- in the design and evaluation of adder cells for the tree structured
mances and the results are analyzed and compared. arithmetic circuits [5], [11]–[13], [18], which are working supply
voltage range, voltage swing, delay, power-delay product, output
II. REVIEW OF FULL ADDER DESIGN OF DIFFERENT CMOS skew, driving capability, and silicon area. As an illustration, we
LOGIC STYLES will present the design of a novel low-voltage full adder with a
Several variants of static CMOS logic styles have been used hybrid logic style in the next section. The unique features pos-
to implement low-power 1-b adder cells [5], [10]–[12]. In gen- sessed by this hybrid full adder will be analyzed.
eral, they can be broadly divided into two major categories: the
complementary CMOS and the pass-transistor logic circuits. III. HYBRID FULL ADDER
The complementary CMOS full adder (C-CMOS) of Fig. 1(a) As shown in Fig. 2, the proposed hybrid full adder circuit can
is based on the regular CMOS structure with pMOS pull-up and be decomposed and analyzed in three submodules as in [5]. The
nMOS pull-down transistors. The series transistors in the output logic expressions for the intermediate signals and outputs are
stage form a weak driver. Therefore, additional buffers at the last
given as follows:
stage are required to provide the necessary driving power to the
cascaded cells. The advantage of complementary CMOS style (1)
is its robustness against voltage scaling and transistor sizing,
(2)
which are essential to provide reliable operation at low voltage
and arbitrary transistor sizes. Moreover, the layout of comple- (3)
mentary CMOS circuit is straightforward and area-efficient due (4)
688 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 6, JUNE 2005
Fig. 1. Full adder cells of different logic styles. (a) C-CMOS. (b) CPL. (c) TFA. (d) TGA. (e) 14 T. (f) 10 T.
A. Module 1: XOR/XNOR paring with those designs that use an inverter to generate the
One approach to realize the exclusive OR and exclusive NOR complement signal, the switching speed is increased by elimi-
(XOR/XNOR) functions is to synthesize the XOR function and nating the inverter from the critical path. The two complemen-
generate the XNOR function through an inverter (e.g., TFA and tary feedback transistors restore the weak logic caused by the
TGA). This type of design has the disadvantage of delaying one pass transistors. They restore the non full-swing output by ei-
of the and outputs, giving rise to skewed signal arrival ther pulling it up through pMOS to the power supply or down
time to the successive modules. This will increase the chance of through nMOS to ground so that sufficient drive is provided to
producing spurious switching and glitches in the last two mod- the successive modules. In addition, since there is no direct path
ules. A better approach is to use different sets of transistors to between the power supply and ground, short-circuit current has
generate the XOR and XNOR functions separately, with the pos- been reduced.
sibility of introducing a larger transistor count [19]. To reduce However, this circuit suffers from the same threshold voltage
the number of transistors, we use a similar pass transistor cir- drop problem as any other pass-transistor logic circuits. The
cuit as in [10] and [16] with only six transistors to generate the worst-case delay happens at the transition from 01 to 00 for
balanced XOR and XNOR functions, as shown in Fig. 3(a). Com- inputs . This could be explained with the aid of Fig. 3(b).
CHANG et al.: REVIEW OF 0.18- m FULL ADDER PERFORMANCES FOR TREE STRUCTURED ARITHMETIC CIRCUITS 689
circuit can also be used [10], [16]. However, it suffers from in- usual that the adder cells that perform well in such simulation
sufficient driving power due to the pass transistors. Therefore, still fail upon actual deployment because of the lack of driving
we use a similar circuit as that of TFA and 14 T, but fully exploit power. This is because adder cells are normally cascaded to form
the available XOR and XNOR outputs from Module 1 to allow a useful arithmetic circuit. Therefore, the adder cells must pos-
only a single inverter to be attached at the last stage. The circuit sess sufficient drivability to provide the next cell with clean in-
is shown in Fig. 5(a). The output inverter guarantees that suffi- puts [5]. In short, the driving cell must provide almost full-swing
cient drive is provided to the cascaded cell. outputs to the driven cells. Otherwise the performance of the
circuit will be degraded dramatically or become nonoperative
C. Module 3: MUX at low-supply voltage. For this reason, the adder cells of TFA,
The smallest number of transistors for generating the TGA, 14 T, and 10 T cannot be cascaded without additional
signal is two (circuit 10 T), but it suffers from the threshold buffers attached to the outputs of each cell. This will be further
voltage drop problem. Although a 4-transistor circuit can be verified.
used to generate a full swing signal, it does not provide The authors of [5] suggested one circuit structure, which is
enough driving power. This can be proven in the later section made of four cascaded adder cells, as shown in Fig. 6. This struc-
when it is compared with our proposed circuit. The new cir- ture simulates the circuits like regular multipliers and binary
cuit is based on complementary CMOS logic style, as shown in adders that use full-adder cells as the building block. The inputs
Fig. 5(b) [20]. Its logic expression is given by are fed from the buffers (two cascaded inverters) to give more
realistic input signals and the outputs are loaded with buffers
(5) to give proper loading condition. All the required input-pat-
tern-to-input-pattern transitions are included in the test patterns.
This circuit has inherited the advantages of complementary The power consumption value is measured for the four cascaded
CMOS logic style, which has been proven in [12] to be superior adder cells, in addition to the intermediate buffers, while the
in performance to all pass transistor logic styles for all logic delay is measured from the moment the inputs are applied to
gates except XOR at high supply voltage. Its robustness against the first cell, until the latest of the and signals of
voltage scaling and transistor sizing (high-noise margins) en- the fourth cell is produced. However, this structure has some
ables it to operate reliably at low voltage and arbitrary (even shortcomings. First, although the first adder has exercised all the
minimal) transistor size. input-pattern-to-input-pattern transitions, the subsequent adders
may not have all the input-pattern-to-input-pattern transitions
IV. SIMULATION ENVIRONMENT AND TRANSISTOR exercised. Thus, it is not appropriate to consider the four cas-
SIZING OPTIMIZATION caded cells as a whole and then divide the average power by 4.
As the last three adders are likely to consume lower power than
A. Simulation Environment the first adder, this simulation tends to produce more optimistic
It has been a common practice to treat the adder cell as a stand power dissipation. Instead, it would be better to measure only
alone cell in simulation [10], [14], [16], [17]. It is also not un- the power dissipated by the first adder. Second, it is also no-
CHANG et al.: REVIEW OF 0.18- m FULL ADDER PERFORMANCES FOR TREE STRUCTURED ARITHMETIC CIRCUITS 691
ticed that every has two fan-outs while has only one
fan-out. The loading of the two outputs is unbalanced. cessive iterations is smaller than a given error . More than one
Our proposed simulation structure is shown in Fig. 7, which iterations may be necessary because each time a new transistor
emulates the tree structure of a parallel multiplier. Altogether 12 is sized in the current run, the other transistors sized in the pre-
identical full adders are used, with the full adder (FA) marked vious run may no longer maintain their optimality.
with “ ” being the cell of interest. Input signals of FA are fed In order to obtain enough coverage so that the optimal or
from the outputs of FA in the preceding stage, while the outputs quasi-optimal sizing will fall in the search region, the step res-
of FA are used to drive a FA in the following stage. This ar- olution, is made variable. Large step size is used at the first
rangement of full adders ensures that either the or few iterations and smaller step size is used for fine tuning in the
output of each FA drives only one input of the FA in the next remaining iterations. Two optimization strategies are adopted
stage. The reason of cascading three levels of FAs preceding in the previous procedure of transistor sizing to accelerate the
FA is to examine the output drivability of the FA cells. If the process. First, the corresponding pMOS and nMOS in a com-
FAs cannot provide enough driving power, the output signals plementary pair are optimized in successive runs because the
after three successive stages will become very weak. Under this output transitions of the node driven by one transistor is often
situation, FA may fail to function. influenced most by the driving capability of its complementary
counterpart. Second, series transistors or parallel transistors of
B. Transistor Sizing Optimization the same type that source current to or sink current from the
same node have equal size and can be optimized simultaneously.
As shown in [21], the transistor sizing for optimal per-
formance is technology dependent. To provide a fair and V. RESULTS AND ANALYSIS
insightful evaluation of all the full adders presented earlier
based on the same TSMC 0.18- m CMOS process technology, The six circuits C-CMOS, CPL, TFA, TGA, 14 T, 10 T of
a systematic and effective way of sizing the transistors for Fig. 1 and the proposed hybrid adder cell are prototyped and
optimal performance is necessary. To provide a good tradeoff simulated using the TSMC 0.18- m CMOS process with Level
between the conflicting sizing requirements for power and 49 technology file. The threshold voltages of the pMOS and
delay performances, the goal of optimization is to minimize nMOS transistors are around 0.46 and 0.48 V, respectively.
the power-delay product, i.e., the energy consumption. For a These full adder circuits are all optimized using the procedure
certain technology the channel lengths of all transistors are presented in Section IV. For the six previously reported full
fixed at the minimal feature size, 0.18 m in our example, so adders, the starting sizes of the transistors are based on the
the only variable to be optimized is the channel width of each aspect ratio reported in [8] and [14]–[17]. The initial sizes
transistor. The proposed procedure for sizing the transistors is of the transistors of our hybrid cell are estimated from stan-
described by the pseudocodes in Fig. 8. dard practices and past experience. The step size of the first
Initially, the sizes of the transistors in the circuit are reason- optimization run of each transistor in our example is set to
ably set. The scaling operations are carried out in several itera- 0.05 m, which is around 10% to 20% of the initial channel
tions transistor by transistor. In Fig. 8, is the width of the width. The step size of the subsequent iterations is reduced to
th transistor at Step and is the power-delay product of the 0.02 m. Thus, the final transistor sizes have the precision of
full adder circuit of the th iteration. For every optimization iter- 10% of the channel length, which is 0.18 m for our targeted
ation, one transistor at a time is tuned for minimal power-delay technology. The termination error is set to 1%. Optimization
product in steps with a step resolution of . The op- of the transistor sizing is carried out at two different voltages,
timization stops when the performance difference in two suc- 0.8 V and 1.8 V for C-CMOS, CPL, TFA, TGA and hybrid. As
692 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 6, JUNE 2005
TABLE I
TRANSISTOR SIZES (m) OF HYBRID FULL ADDER OPTIMIZED FOR PDP
VI. CONCLUSION 2
[4] O. Kwon, K. Nowka, and E. E. Swartzlander, “A 16-b 16-bit MAC
design using fast 5:2 compressor,” in Proc. IEEE Int. Conf. Application-
For full adder cell design, pass-logic circuit is thought to Specific Systems, Architectures, and Processors, Jul. 2000, pp. 235–243.
be dissipating minimal power and have smaller area because [5] A. Shams, T. Darwish, and M. Bayoumi, “Performance analysis of low-
power 1-bit CMOS full adder cells,” IEEE Trans. Very Large Scale In-
it uses less number of transistors. Thus, CPL adder is consid- tegr. (VLSI) Syst., vol. 10, no. 1, pp. 20–29, Feb. 2002.
ered to be able to perform better than C-CMOS adder in [12]. [6] P. J. Song and G. De Micheli, “Circuit and architecture trade-offs for
However, in our opinion, pass-logic circuit usually has irregular high-speed multiplication,” IEEE J. Solid-State Circuits, vol. 26, no. 9,
pp. 1184–1198, Sep. 1991.
structure, which increases the wiring complexity and its perfor- [7] M. Alioto and G. Palumbo, “Analysis and comparison on full adder
mance is highly susceptible to transistor sizing. On the other block in submicron technology,” IEEE Trans. Very Large Scale (VLSI)
hand, the complementary CMOS logic circuit has the advan- Syst., vol. 10, no. 6, pp. 806–823, Dec. 2002.
[8] A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS
tages of layout regularity and stability at low voltage. There- Design. Norwell, MA: Kluwer, 1995.
fore, it is the different design constraints imposed by the ap- [9] Z. Wang, G. Jullien, and W. C. Miller, “A new design technique for
plications that each logic design style has its place in the cell column compression multipliers,” IEEE Trans. Comput., vol. 44, no. 8,
pp. 962–970, Aug. 1995.
library development. [10] D. Radhakrishnan, “Low-voltage low-power CMOS full adder,” Proc.
In the past, full adders are often evaluated in isolation without IEE Circuits, Devices and Systems, vol. 148, no. 1, pp. 19–24, Feb. 2001.
concern on how they are deployed in the actual circuit [10], [14], [11] A. Shams and M. Bayoumi, “Performance evaluation of 1-bit CMOS
adder cells,” in Proc. IEEE Int. Symp. Circuit and Systems, Jul. 1999,
[16], [17]. We argue that a 1-b full adder cell that functions cor- pp. 27–30.
rectly in stand alone simulation is not sufficient to validate its ac- [12] R. Zimmermann and W. Fichtner, “Low-power logic styles: CMOS
tual performance or even functionality when it is integrated into versus pass-transistor logic,” IEEE J. Solid-State Circuits, vol. 32, no.
7, pp. 1079–1090, Jul. 1997.
a larger circuit. Inadequate consideration in simulation setup [13] M. M. Vai, VLSI Design. Boca Raton, FL: CRC, 2001.
tends to produce the level of performance optimistically above [14] N. Zhuang and H. Hu, “A new design of the CMOS full adder,” IEEE J.
the capability of the circuit being simulated. In this paper, we Solid-State Circuits, vol. 27, no. 5, pp. 840–844, May 1992.
[15] N. Weste and K. Eshraghian, Principles of CMOS VLSI Design, A System
proposed a reasonably simple architecture to simulate the adder Perspective. Reading, MA: Addison-Wesley, 1993.
cell in an environment realistic to its actual deployment in most [16] M. Vesterbacka, “A 14-transistor CMOS full adder with full voltage-
frequently used parallel multiplier structure. swing nodes,” in Proc. IEEE Workshop Signal Processing Systems, Oct.
1999, pp. 713–722.
Based on the simulation environment, an effective circuit op- [17] H. T. Bui, Y. Wang, and Y. Jiang, “Design and analysis of low-power
timization algorithm is proposed. The proposed method system- 10-transistor full adders using novel XOR-XNOR gates,” IEEE Trans. Cir-
atically scales the channel width of each transistor of the cell for cuits Syst. II, Analog Digit. Signal Process., vol. 49, no. 1, pp. 25–30,
Jan. 2002.
minimal energy consumption. [18] G. M. Blair, “Designing low-power digital CMOS,” IEEE Electron.
Lastly, a hybrid full adder cell consisting of the XOR/XNOR, Commun. Eng., vol. 6, pp. 229–236, Oct. 1994.
sum and carry out subcircuits, is proposed. The pass logic design [19] H. T. Bui, A. Al-Sheraidah, and Y. Wang, “New 4-transistor XOR and
XNOR designs,” in Proc. 2nd IEEE Asia Pacific Conf. ASICs, Cheju Is-
style is used to efficiently generate the XOR and XNOR functions land, Korea, Aug. 2000, pp. 25–28.
simultaneously and a good drivability carry out is generated by a [20] M. Zhang, J. Gu, and C. H. Chang, “A novel hybrid pass logic with static
complementary CMOS style circuit with regular layout. In ad- CMOS output drive full-adder cell,” in Proc. 36th IEEE Int. Symp. Cir-
cuits and Systems, vol. V, Bangkok, Thailand, May 2003, pp. 317–320.
dition, the last-stage inverter de-couples the output and input [21] M. Sayed and W. Badawy, “Performance analysis of single bit full adder
to improve the driving capability. Despite having higher tran- cells using 0.18, 0.25 and 0.35 m CMOS technologies,” in Proc. 35th
sistor count than the recently reported designs, the proposed cir- IEEE Int. Symp. Circuits and Systems, vol. 3, Phoenix, AZ, May 2002,
pp. 26–29.
cuit has shown to be highly energy efficient over a wide supply [22] V. G. Oklobdzija, D. Villeger, and S. S. Liu, “A method for speed op-
voltage range. The balanced sum and carry outputs also offer timized partial product reduction and generation of fast parallel multi-
considerable flexibility in allocating the adder cells in tree struc- pliers using an algorithmic approach,” IEEE Trans. Comput., vol. 45, no.
3, pp. 294–306, Mar. 1996.
tured circuit to eliminate as many cross-stage interconnections
and to reduce the maximum length of in-stage interconnections
without affecting the critical path delay as there is no discrimina-
Chip-Hong Chang (S’92–M’98–SM’03) received
tion on any port for any legitimate connections [9]. Although its the B.Eng. (Hons) from National University of Sin-
power-delay performance is comparable to C-CMOS and poorer gapore, Singapore, in 1989 and the M.Eng. and Ph.D.
than CPL, the area efficient layout makes it a good choice for degrees from the School of Electrical and Electronic
Engineering, Nanyang Technological University,
implementing large tree structured arithmetic circuit when the Singapore, in 1993 and 1998, respectively.
overall performance and area efficiency are prominent cost func- He worked as a Component Engineer for General
tion elements. Motors, Singapore, in 1989 and as a Technical Con-
sultant of Flextech Electronics Pte. Ltd., Singapore,
in 1998. In 1993, he joined the Electronics Design
REFERENCES Center, Nanyang Polytechnic University. Since 1999,
he has been with the School of Electrical and Electronic Engineering, Nanyang
[1] T. Kuroda and M. Hamada, “Low-power CMOS digital design with dual Technological University, where he is currently an Assistant Professor. He has
embedded adaptive power supplies,” IEEE J. Solid-State Circuits, vol. served a number of administrative and consultation roles during his academic
35, no. 4, pp. 652–655, Apr. 2000. career. He holds concurrent appointments at the university as the Deputy Di-
[2] K. Usami and M. Horowitz, “Clustered voltage scaling technique for rector of the Center for High Performance Embedded Systems, and the Program
low-power design,” in Proc. Int. Symp. Low Power Design, Apr. 1995, Director of VLSI Design and Embedded Systems research group of the Center
pp. 3–8. for Integrated Circuits and Systems. His current research interests include low
[3] U. Ko, P. Balsara, and W. Lee, “Low-power design techniques for power arithmetic circuits, design automation and synthesis, and algorithms and
high-performance CMOS adders,” IEEE Trans. Very Large Scale Integr. architectures for digital image processing. He has published around 100 refereed
(VLSI) Syst., vol. 3, no. 2, pp. 327–333, Jun. 1995. international journal and conference papers, and book chapters.
CHANG et al.: REVIEW OF 0.18- m FULL ADDER PERFORMANCES FOR TREE STRUCTURED ARITHMETIC CIRCUITS 695
Jiangmin Gu (S’01) received the B.Sc. degree in Mingyan Zhang received the B.Eng and M.Eng,
physics and the M.Eng. degree in electronic engi- (hons.) from the School of Electrical and Electronic
neering and information science from the University Engineering, Nanyang Technological University,
of Science and Technology of China, Heifei, China, Singapore, in 2002 and 2004, respectively.
in 1997 and 2000, respectively. He is currently She is currently working as a Failure Analysis
working toward the Ph.D degree at the School of Engineer in Tech Semiconductor Singapore Pte.
Electrical and Electronic Engineering, Nanyang Ltd. Her research interests include low power VLSI
Technological University, Singapore. digital circuit design and digital image processing.
His research interests are low power VLSI design
methodologies and optimization of CMOS arithmetic
circuits.