A 10 Gb/s, 6 Vp-p, Digitally Controlled, Differential

Distributed Amplifier MZM Driver
Yi Zhao, Member, IEEE, Leonardo Vera, and John R. Long, Senior Member, IEEE

Abstract—A 10 Gb/s, digitally controlled, differential dis- realize the desired output swing and low capacitive parasitics
tributed amplifier (DA) optical modulator driver is implemented for broad bandwidth, while circuit topologies must be selected
in 0.18 µm SiGe-BiCMOS technology. The 2.87 mm2 prototype which optimize these aspects of performance.
integrates clock phase shifters, digital latches, limiting ampli-
fiers, broadband n+/n-well back termination resistors and a Advanced transistor technologies trade-off gain, bandwidth
substrate-shielded output line on chip. It produces 6 Vp-p differ- and breakdown voltage. The product of unity-current-gain
ential output swing across 50 Ω loads. The output edge speed is frequency and breakdown voltage (i.e., the Johnson limit, or
trimmable, with 20–80% rise/fall times ranging from < 15 ps to for a bipolar transistor) is relevant to modulator
50 ps at 10 Gb/s. Minimum sensitivity of the ECL-compatible driver circuit implementations employing HBTs because output
inputs is 65 mVp-p at 10 Gb/s single-ended, with negligible additive
jitter. Measured output return loss is better than 10 dB below swing and bandwidth must be maximized simultaneously.
35 GHz, sufficient to drive an external push-pull Mach-Zehnder SiGe-HBTs in production BiCMOS technologies with
optical modulator. Total power consumption is 2.13 W operating above 200 GHz have collector-emitter breakdown voltages
from –5.2 V and 5 V supplies. The fully-digital input interface (i.e., ) below 2 V [4], while the recommended supply
supports scalability in the number of DA stages, output swing and voltages in silicon CMOS technologies with comparable de-
to multiple output channels.
vice are less than 1.2 V. Thus, MZM drivers are usually
Index Terms—Digitally controlled input line, distributed ampli- implemented in III-V technologies where breakdown voltages
fier, limiting amplifier, Mach-Zehnder modulator driver, on-chip
are above 5 V in order to meet the output swing requirement
back termination, optical communication, vector-sum phase
shifter. for optical modulation [5], [6].
Modulator drivers for optical communication are often de-
signed using the distributed amplifier (DA) topology because
I. INTRODUCTION it offers broad bandwidth, a flat gain response, and controlled
output return loss [7], [8]. Distributed amplifier modulator

O PTICAL data communication systems operating at driver circuits have also demonstrated faster rise/fall times than
multi-Gb/s rates across metro or long-haul spans rely drivers designed around a single output stage when fabricated
upon external modulators to encode the transmitted optical in technologies with comparable [3]. The wideband return
carrier at a high extinction ratio and with minimal frequency loss of a single-stage amplifier (e.g., as in [9] and [10]) is
chirp. The Mach-Zehnder modulator (MZM) is often used for typically insufficient when attempting to stream data across a
this purpose, however, the electronic circuit required to drive transmission line interface to an optical modulator at high data
the interferometer remains one of the most challenging blocks rates, which results in eye closure and unwanted bit errors.
to design in the optical transceiver. A large voltage swing must This paper describes a new distributed limiting amplifier in-
be developed across a load at data rates up to 40 Gb/s corporating a fully-digital input interface that replaces the pas-
(i.e., , from 2 to 10 V, depending upon MZM design and sive input line of a conventional DA, thereby mitigating many
length [1], [2]). The 50 electrical interface between the driver performance impairments inherent in current DA designs. A
and MZM when packaged separately requires an output return proof-of-concept prototype is implemented in a production sil-
loss of better than 10 dB across a bandwidth that includes the icon technology (i.e., IBM’s 180 nm SiGe-BiCMOS technology
third harmonic of the transmit pulse train (e.g., 15 GHz in with peak of 60 GHz [11]) that targets 10 Gb/s with 6
a 10 Gb/s system) [3]. Implementation of the MZM driver output swing and a return loss better than 10 dB below 20 GHz.
requires active devices with breakdown voltage sufficient to The proposed topology is also amenable to scaling of the data
rate, which is important to the development of high capacity op-
tical systems in the future.
Manuscript received August 26, 2013; revised February 04, 2014; accepted
May 03, 2014. Date of publication June 26, 2014; date of current version August Factors limiting the performance of conventional DAs
21, 2014. This paper was approved by Associate Editor Jared Zerbe. are outlined in Section II of this paper. The proposed driver
Y. Zhao was with the Electronics Research Laboratory/DIMES, Delft Uni-
topology, its operating principles and circuit design details are
versity of Technology, 2628CD Delft, The Netherlands, and is now with Marvell
Semiconductor, Santa Clara, CA 95054 USA. described in Section III. The physical layout, measurement
L. Vera and J. R. Long are with the Electronics Research Laboratory/DIMES, set-up, and experimental results for the driver prototype are
Delft University of Technology, 2628CD Delft, The Netherlands.
then reported in Section IV, followed by a comparison with
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. the performance of existing commercial products and other
Digital Object Identifier 10.1109/JSSC.2014.2327036 recently published designs.

0018-9200 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Fig. 1. Simplified model of a conventional distributed amplifier used as a modulator driver.

II. CONVENTIONAL DISTRIBUTED AMPLIFIER AS A back-traveling wave. The cut-off frequency of a lossless,
MACH-ZENDER MODULATOR DRIVER synthetic transmission line is
The simplified schematic of a modulator driver using a linear
DA is shown in Fig. 1. Transconductor is a unilateral am-
plifier (e.g., a cascode stage) with lumped parasitic capacitances
at the input and output ( and , in Fig. 1). Dissipation at
the input (e.g., for a FET, or and of a HBT) is mod- where is the series inductance and is the shunt capacitance
eled by , while represents the output resistance of the for a lumped-element equivalent of each transmission line sec-
stage. The output transmission line sums the currents pro- tion between amplifier stages [12]. Capacitance is added to the
duced by the amplifying stages. The outputs of stages 1 to line by active device loading , and is the char-
add constructively when the time delay between stages along acteristic impedance of the loaded line. Simulations at 15 GHz,
the input line and along the output line are synchro- which is the third harmonic of an alternating 1–0 pattern at
nized. Distortion that degrades rise/fall times at the load results 10 Gb/s, predict that a m cascode HBT stage has
when and are not equal. of fF and of fF for
The propagation delay time and attenuation of signals car- the m SiGe technology used in this work. The loading ef-
ried by an analog transmission line on chip vary due to manu- fect of on the output line is therefore negligible compared
facturing tolerances of the backend fabrication processes (e.g., to the 50 terminations. However, loading on the input line
layer thickness variations). Although the input and output lines from the HBT is significant, and it increases with increasing
are usually designed to have the same characteristic impedance frequency, causing pulse dispersion. For the cascode HBT stage,
when loaded by DA stages, the electrical properties of the 2 lines the input line attenuation is 2.0 Np/mm at 15 GHz, which would
do not track each other with respect to maufacturing process, severely affect the gain and bandwidth of the DA. As
supply voltage, and temperature (PVT) variations, because the for transistors generally, the input line limits the bandwidth
loading on each line (and hence its variability) differs (i.e., of most DA designs [12]–[14]. Capacitance is often added to
not equal to ). Moreover, the electrical properties of an the DA stage outputs in an attempt to match the interstage de-
on-chip analog transmission line are difficult to trim after manu- lays [15]. This compensation lowers the output line bandwidth
facture. The quality and integrity of the output waveform is also and increases the pulse rise/fall times seen at the load even fur-
affected by mismatch between the characteristic impedance of ther. Delay matching between the input and output lines in a
either transmission line and its terminating impedance (typically fully-differential implementation also leads to a large chip area
50 ), with waveform distortion and timing jitter being typical (e.g., 13.2 mm in [5] and 3.5 mm in [16]).
impairments caused by mismatch. Resistor terminates the Design innovations that address the constraints on DA perfor-
input line, while terminates the wave traveling away from mance imposed by the passive input line are required in order
the load on the output line. Mismatches arising from non-ideal to maximize bandwidth, improve signal quality (e.g., faster rise/
packaging parasitics can be minimized when and are fall times, reduce time jitter and pulse distortion, etc.), and con-
wideband resistors fabricated on chip. serve chip area.
The bandwidth of the conventional DA in Fig. 1 is typically
dominated by the cut-off frequencies of the transmission lines
and the terminations. Back-termination resistor , suffers III. DIGITALLY CONTROLLED DA MODULATOR DRIVER
from capacitive parasitics when it is sized large enough to A block diagram of the proposed modulator driver prototype
conduct the DC bias current and handle the AC power of the is shown in Fig. 2 [17]. It is a wideband DA with a fully-dig-

Fig. 2. Block diagram of the digitally controlled modulator driver prototype.

ital input interface consisting of three differential limiting am- A. Distributed Limiting Amplifier With a Periodically-Loaded
plifiers (LAs) embedded in a balanced, synthetic output trans- Output Line
mission line. Two, on-chip 50 n+/n-well termination resistors The output voltage and efficiency of the distributed limiting
supply bias current to each DA cell, eliminating the need for ex- amplifier topology is projected in this section. In order to sim-
ternal bias-Ts and suppressing reflection of the back-traveling plify the derivation of a compact result, it is assumed that ampli-
waves on the output line. The input master latch (M in Fig. 2), fication is distributed across the length of the output line in the
together with the slave latches (i.e., ) driving each ampli- DA. Given that assumption, the total voltage at the load (
fier cell, form a master/slave flip-flop that isolates the data inputs from Fig. 2) can be written as:
from the LA cells. The outputs of the slave latches are identical
in amplitude and waveshape, but the time delay between them (2)
is defined by the phases of the retiming clocks. Replicating the
10 Gb/s data digitally ensures that each LA is driven to satu- where the incremental current produced by lim-
ration by a copy of the input signal, thereby eliminating dis- iting stages ( in Fig. 2) is summed from the back termina-
persion, attenuation, ringing and pulse distortion inherent in a tion to the load . The propagation constant
conventional analog implementation using a passive input line. for each section of the output line between stages (length,
The precise analog matching between input and output lines that ) is assumed identical. Note that only one-half of the driver
is difficult to achieve in a conventional DA is not required in a current flows through the load, and that the DA is terminated
DA with digitally controlled stage inputs. Also, layout restric- properly, as the load is equal to the characteristic impedance
tions imposed by the input transmission line and the chip area of the output line. The uniformly distributed driver cur-
it consumes do not constrain the new design. It is readily scal- rent can be expressed as
able in the number of DA stages. However, the bandwidth on
the input side of the new driver topology is limited by the speed (3)
of the digital latches or DA amplifier cells. For example, simu-
lations predict that the maximum speed of the latch loaded by The exponential term in (3) accounts for the phase of the cur-
the LA in the SiGe-BiCMOS prototype (see Fig. 11) is 20 Gb/s, rent injected into the output line ( , referenced to ) de-
compared to 32.5 Gb/s for the LA stand alone. fined by the digital input to each stage. Ideally, the incremental

current is the total bias current consumed by the LA

stages divided by the total line length . Substi-
tution of (3) into (2) and evaluation of the integral gives


where the amplitude driving each stage is assumed identical for

the digitally controlled DA, and that the input and output signals
at each stage are phase synchronized (i.e., ). This
implies that: , as
Approximating the decaying exponential in (4) by a Taylor
series expansion gives an expression for the magnitude of the
single-ended output voltage ,

Fig. 3. Simplified layout of one section of the substrate-shielded output line.
Note that (5) predicts that the effect of the output line is neg-
ligible when the total attenuation across the line, , and bit errors in an optical link. A substrate shield implemented
is much less than 2. The power at the load (assuming random using floating metal strips mitigates this problem by reducing
transmit data with an equal density of 1’s and 0’s) is losses. Electric induction from the differential signal traveling
along the line creates a virtual ground V on the shield
The differential LA cell outputs are connected to the shielded
(6) 3.5 m wide, balanced output line in the physical layout as
shown in Fig. 3. This is the minimum width in copper topmetal
As long as the attenuation of the output line is sufficiently low that satisfies the current density limits specified by electromigra-
(i.e., ), approaches , which tion for the m SiGe-BiCMOS technology. Minimzing the
is also the driver power output at low frequency. Note that conductor width increases the inductance per unit length of the
for a differential driver is two times the power of (6), as the same output line, while minimizing the shunt capacitance to ground.
bias current is steered between the 2 outputs. The differential common-base transistors at the output of each
The efficiency of a digitally controlled DA with differential LA cell ( in Fig. 3) are centered in the gap between the line
outputs is given by top conductors to preserve symmetry in the layout, while mini-
mizing parasitics from the interconnects between the transistor
outputs and the line. A m space between the topmetal con-
ductors is chosen to minimize the negative mutual coupling that
reduces the self-inductance of the output line.
There is a trade-off between attenuation and shunt parasitic
capacitance as the distance between the top conductor and un-
where is the global supply voltage, is the derlying shield varies. Fig. 4 compares the simulated attenuation
current consumed by all circuits shared by the driver stages coefficient and shunt capacitance to ground for a m long
(e.g., input data retiming and I/Q generation in Fig. 2), and m wide transmission line, without and with shielding.
is the current drawn by the digital circuitry supplying the signal The unshielded line ( m above the substrate) ex-
to each LA stage (e.g., latch, phase shifter and predriver in hibits the highest attenuation (see Fig. 4(a)). Capacitive loading
Fig. 2), and is the total current consumed by the digitally of the line increases as the distance from the shield to the top
controlled DA. The efficiency of just the driver output stages conductor is diminished (see Fig. 4(b)). A distance of m is
alone, , is typically well below 10%. DAs thus selected for the DA as a compromise. The increase in shunt
are ultra-wideband circuits, and as such do not offer efficien- capacitance is limited to just 25%, rather than 60% for the shield
cies comparable to narrowband RF power amplifiers. Equation with a m thick insulator.
(7) indicates that the driver should be operated from the lowest For a 3-stage DA, an inductance of 400 pH is required be-
supply voltage possible in order to maximize efficiency, and that tween two consecutive LA stages to absorb the 160 fF total
the fixed power consumption terms: , and shunt parasitic capacitance at each LA output, and synthesize
should be kept as small as possible through efficient digital de- an output line with a of . Each output line section in
sign techniques. the driver prototype is m long, and the floating metal
shield strips in metal-2 are m apart and m long. They
B. Output Transmission Line span m beneath the line. Metal-1 is used for interconnect
Frequency-dependent losses from the output line cause dis- wiring beneath the floating shield. The simulated time delay be-
persion and increased rise/fall times, resulting in eye closure tween two LA outputs for the loaded line is 7.5 ps. The group

Fig. 4. Simulated attenuation coefficient and shunt capacitance for 500 m long unshielded and shielded transmission lines. (a) Attenuation coefficient (b) Shunt

delay of the entire output line when loaded is 25.5–26.8 ps

across DC–15 GHz, and the cut-off frequency is approximately
40 GHz from simulation.
A multi-section, lumped-element model for the
output line is used to capture its single-ended and differential
behaviors for large-signal (transient) simulation of the DA
circuit. The periodically loaded output line used in the driver
prototype has a total attenuation of just 0.1 Np at 15 GHz from
simulation. As the total attenuation across the DA output line is
much less than 2 Np, (6) predicts that maximum output power
should be developed across the load easily at 10 Gb/s. The
Fig. 5. Simplified cross-section of the on-chip back termination resistor.
time response of the output line was also investigated from
simulation using a longer (i.e., 1 cm) section of line in order
to highlight differences in performance between the shielded
and unshielded designs. The simulated response of a 1 cm long
transmission line to a 10 ps step shows rise/fall times of 46.3 ps
when unshielded, compared to 16.3 ps when shielded.

C. On-Chip Back Termination

Fig. 5 shows a simplified cross-section of the n+/n-well ter-
mination resistor (not to scale). The shallow n-well diffusion
is used for the resistor body because it has less parasitic ca-
pacitance to substrate than a buried layer capable of dissipating
higher amounts of power (e.g., n+ sub-collector). Terminal 1 of
the resistor is connected to AC ground in the prototype driver
(i.e., in Fig. 2), so only parasitic capacitance associated
with terminal 2 affects the DA performance. The metal intercon-
nects to the resistor diffusion contacts use a stack from metal-1
to top copper in order to satisfy the current density rules for elec- Fig. 6. Measured frequency response of the termination resistor.
The m , full-custom n+/n-well resistor could
not be modeled a priori. The impedance measured from a capable of dissipating the same power (i.e., 0.5 W) is included
stand-alone test structure shows a first-order roll-off from 53 for comparison. The impedance of the polysilicon resistor
at 40 MHz to 41 at 40 GHz, as seen in Fig. 6. The frequency rolls-off much faster, with a 3 dB bandwidth of just 7.5 GHz.
response from simulation of a 50 polysilicon equivalent Characterization reveals that the n+/n-well resistor developed

Fig. 7. Quadrature clock generator block diagram.

in this work has 100 fF lower parasitic capacitance than a

polysilicon equivalent, giving it a much wider bandwidth (i.e.,
beyond 40 GHz in Fig. 6). The resistance of the n-well resistor
increases with increasing temperature. Its dimensions satisfy
the restrictions on current flow for a n-well diffusion in the
SiGe-BiCMOS technology, with some margin added in order
to suppress temperature variations at the expense of slightly
narrower bandwidth. Characterization of the termination re-
sistor separately reveals that the thermally induced rise in DC
resistance is less than 20% when dissipating 0.5 W.
Fig. 8. Schematics of the regenerative frequency doubler and quadrature di-
D. Quadrature Clock Generation vider. (a) Frequency doubler. (b) Quadrature divider.

Quadrature (i.e., and ) clocks are summed proportion-

ally to realize variable-phase retiming clocks for the second and doubler and divider circuits are at (high) and (low)
third latches driving their respective LA cell inputs [19]. The values, respectively.
and clocks on chip are generated by a frequency doubler fol-
lowed by a quadrature divide-by-two circuit (see Fig. 7) [20]. E. Clock Phase Control
Both the frequency doubler and divider are power-efficient, in- The clock phase control logic is illustrated in the system block
jection-locked circuits consuming 20.5 mA and 8.7 mA, respec- diagram of Fig. 2. Differential and clocks (i.e., 90 phase
tively, from the 5.2 V supply. The external clock retiming data resolution) are selected via a 2:1 ECL multiplexer using the
input to the master latch at frequency is injected into the dou- two most significant phase control bits (or ). Cur-
bler, and its output at is then used to lock the divider. rent-weighted summation of the selected and phases under
The doubler of Fig. 8(a) consists of two buffered differ- control of the 4 LSBs ( or ) produces a variable-phase,
ential stages connected in a positive feedback loop to form low-jitter clock with 1.67 ps resolution (i.e., 6 resolution) and
a 2-stage ring oscillator (transistors and ). The 360 phase control range at 10 GHz. Simulations show that a
simulated frequency-locking range of the doubler is 5.4 GHz 1.67 ps offset from the desired interstage delay of 7.5 ps in the
(7.3–12.7 GHz) with a minimum single-ended output voltage DA causes a maximum increase in the rise/fall times of 0.5 ps
swing of 150 for an ECL input signal amplitude of in the 10 Gb/s output waveform, which is considered accept-
(single-ended). Post-layout simulation predicts that able. Simplified schematics of the 2:1 selector, 4-bit DAC, cur-
the minimum amplitude of the differential injecting signal rent summer and clock buffer circuits are shown in Fig. 9. The
required to lock the doubler at 10 GHz input is . The total current consumption of these clock phase control blocks is
regenerative divider in Fig. 8(b) reverses the doubler operation. 12.7 mA from the 5.2 V supply.
The minimum output amplitude is at 11.85 GHz, Timing relationships between the signals for the proposed cir-
and the locking range of the divider from post-layout simulation cuit of Fig. 2 are illustrated in Fig. 10. A repetitive 1–0 pat-
is 5.8 GHz (6.05–11.85 GHz) when driven by a tern is applied to inputs , and of slave latches
differential, double-frequency signal. , and , respectively. The 50% duty cycle clock driving
The frequency locking range of the divider-doubler chain is latch and its inverse are also shown.
7.3 GHz to 11.8 GHz for a single-ended, injected clock swing The propagation delay time from the clock input to Q output of
of 800 from post-layout simulation at 50 C using nom- master latch M is defined as , while and
inal process technology models. The locking range varies from are the delays of the data buffers driving slave latches
6.7–11.1 GHz to 7.9–12.7 GHz when the load resistors in the and , respectively. The setup and hold times of the slave

Fig. 9. Schematic of the clock phase control circuitry.

between the clocks driving latches and (i.e., , and be-

tween latches and ) must equal the signal delay seen be-
tween stages along the output line, which is predicted to be 7.5
ps from simulation. Simulations of the ECL latch designed in
this work also predict: ps, ps,
ps and . The range of timing adjustment
for is given by , and for
by , both with
respect to the rising edge of . These ranges are indicated
as shaded areas in Fig. 10. Therefore, the timing adjustment
range for is 50 ps, and 62 ps for . These ranges
are much greater than the 7.5 ps propagation delay between
stages along the output line (i.e., a factor of 6.7x for ,
and a factor of 8.3x for ), allowing output waveform
alignment under digital control. Simulations predict that the
maximum interconnect delay in the data path so that
does not mistime data is approximately 8 ps. This is easily sat-
isfied in the physical layout ( 3 ps in the prototype from simu-
lation). Note that has a more stringent requirement on
the interconnect delay in the data path than due to the
extra delay from data buffer 3 (i.e., in Fig. 2).
Monte Carlo (MC) simulations across process and wafer
mismatch (50 runs) predict that the delay between
and ranges from 7.2 ps to 12.2 ps at the extremes
(8.8 ps nominal value), and that the delay between and
varies from 6.7 ps to 9.5 ps (extreme values, 7.8 ps
Fig. 10. Timing diagram of the latch signals in the modulator driver prototype. nominal value). The simulated rise/fall times (10–90%) at the
driver output range from 19 ps to 23 ps (16 ps nominal value).
Parameter variations of the output line, which require EM
latches are annotated as and , respectively. The out- models, are not included in the MC simulations. The simulated
puts of the slave latches are assumed to be loaded identically on delay between and varies by 1.2 ps at
chip, making the clock-to-Q delay the same for each stage. The 0 C to 2.7 ps at 125 C (50 C nominal temperature). These
master latch and slave latch 1 form a D-type flip-flop as they are variations are significant compared to the time delay along the
driven by anti-phase clocks (i.e., and its inverse). Set- output line, and can be compensated by: 1) well-known on-chip
ting the phase selection logic so that or rise temperature compensation schemes (either open or closed
before the rising edge of is possible in this scheme. loop), and/or 2) re-calibration of the input line, as the variations
However, the rising edges of and are timed are much smaller than the clock adjustment range (i.e., 50 ps
to occur after in normal operation. The time difference for and 62 ps for , as described previously).

Fig. 11. Schematic of the latch and limiting amplifier cell.

The ability to adjust and align the timing between DA stages to achieve a cut-off frequency of 40 GHz when capacitive
under digital control is a unique aspect of this circuit (e.g., loading of the output line in each stage does not exceed 160
at start-up or during continuous operation using an on-board fF. For 6 differential output swing, each LA in a 3-stage
microcontoller). When fully automated, external trimming of design consumes 40 mA bias current and adds 1 V swing to
the circuit would not be required. Developing such a scheme each output, as seen from (5).
is a subject for further study of the digitally controlled DA Grounding the bases of the cascode transistors at the output
concept. The procedure used to calibrate the prototype during extends the collector-emitter breakdown voltage of to
characterization is described in Section IV-A. near [21]. Using a cascode in the LA stages raises the
output impedance, thereby reducing loading on the output line,
F. Digital Latch, Predriver and Limiting Amplifier leading to a more efficient DA overall. Each CB transistor in the
LA output stage presents a total collector shunt capacitance of
Fig. 11 shows a simplified schematic of the latch and limiting approximately 115 fF that is embedded in the synthetic output
amplifier stages used in the DA. Data input to the bases of line.
transistors propagates to the outputs when the input clock
is low, and is held by latch transistors when the clock goes
high. The latch consumes 10.5 mA current from the 5.2 V IV. PROTOTYPE CHARACTERIZATION
supply, and has a differential output swing of 800
The driver prototype is implemented in IBM’s 0.18 m SiGe-
at 10 Gb/s. Double emitter followers with a small amount of
BiCMOS 7WL technology V , which features
negative feedback are added to reduce transient ringing effects
HBTs with peak- of 60/85 GHz and 6 metal intercon-
in the latch outputs. The LA in the DA gain cell consists
nect layers (i.e., 1 thick top copper and 5 aluminum) [11].
of a predriver , a differential Darlington pair driver
Fig. 12 shows a chip photomicrograph of the prototype. Input
and a common-base (CB) output stage .
data propagates across the chip from left to right. In order to
The latch outputs are preamplified by and buffered by
before feeding the signal to the switching transistors minimize loading effects on the data and clock paths, the digital
to ensure the fastest edge speed. The predriver and latch for each stage is placed close to the LA cell in the layout.
buffers consume 23.4 mA from the 5.2 V supply. Darlington The variable-phase clock control blocks (i.e., the 2:1 selectors,
pairs minimize the loading on transistors , and thus DAC, current adders and clock buffers in Fig. 2) are located
improve speed at the expense of doubling the input voltage close to their respective latch. The doubler-divider chain feeds
swing required for full switching. the clock phase selector sub-circuits, and it is placed equidistant
The current consumption in each LA stage, the number of from the second and third stages in the layout. As mentioned
stages, and the cut-off frequency of the output line are related previously, the CB transistors in the LAs are connected to the
to each other in a DA design. A three stage design was chosen output transmission line along its horizontal line of symmetry in
in this work as a trade-off between output line bandwidth, order to load each transistor equally. Ground for the LAs and the
DC power consumption, and design complexity. Fewer stages digital grounds are separated on chip, but connected together at
would require larger area transistors in each stage, which the bondpads. Metal-2 is used to implement the ground plane,
narrows the output line bandwidth of the DA as seen from (1). and bottom metal-1 is used to route the 12 clock phase con-
A design with more than 3 stages requires additional latches trol lines (i.e., in Fig. 2) to the bondpads at the bottom
and clock phase control circuitry, which increases the com- of the chip. The prototype occupies 2.87 mm total chip area,
plexity and decreases efficiency of the prototype unnecessarily with 0.95 mm of the area consumed by active circuity (i.e., ex-
(see eq. (7)). Three is the minimum number of stages necessary cluding bondpads and interconnections to chip periphery). The

Fig. 12. Chip photomicrograph of the driver prototype.

Fig. 13. Test setup for the driver prototype.

current consumption is 295 mA from 5.2 V, and 120 mA from for each channel of the oscilloscope were adjusted to compen-
5 V, for a total power consumption of 2.13 W. sate for the different attenuations in the two output paths. The
cut-off frequency of the output measurement path (on-wafer
A. Test Setup probe, connectors, etc.) is greater than 60 GHz. A Tektronix
The driver prototype was mounted on a heat sink and all DC TDS-8000B digital sampling mainframe captured the output
supplies are wirebonded to a full-custom PCB for testing (see waveforms and processed the measured eye diagrams. The rise/
Fig. 13). Three GSGSG RF probes were used to contact the dif- fall times of the off-chip components in the measurement path at
ferential clock, data and output on three sides of the die. Differ- the output (i.e., AC coupling capacitor, attenuator, coaxial cable
ential data and clock signals from an Anritsu MP1763C pulse and sampling heads) are not de-embedded from measured data,
pattern generator were fed to the chip via pairs of phase-matched making the actual rise/fall time performance of the IC slightly
cables. The outputs were AC-coupled to 70 GHz sampling mod- better than the measured data reported in this paper. A micro-
ules (Tektronix 80E09) via 10 cm of semi-rigid coaxial cable controller driven by Matlab codes provided the signals used to
and wideband 50 attenuators. The vertical amplifier settings control the phases of the clocks driving DA stages 2 and 3 in

Fig. 15. Measured single-ended output waveforms at 10 Gb/s.

Fig. 14. Simulated and measured output return loss of the modulator driver.

The slight distortion seen near the end of each transition

order to optimize rise and fall times for the output waveform. A in the eye pattern in Fig. 16(c) is likely caused by undesired
look-up table for clock phase control was created from simula- negative magnetic coupling or insufficient self-inductance
tion, and the delay calibration was performed in two steps during in the synthetic output line. Coupling between the two
testing. The external clock was first aligned a few picoseconds outputs causes one to respond to a changing current in the
after the data (i.e., after the setup time required for the latch). other path when a transition happens. Further output line
The second and third latch stages were disabled during this step. modeling revealed that is near 0.15 for the output line,
The 12 DAC control bits were then set to provide the desired in- which was underestimated in the design phase at .
terstage delay according to the look-up table, and fine tuned to The simulated eye pattern in Fig. 17(a) for and
achieve the fastest edge speed at the driver outputs. (per section) is more consistent with the measured
data. The larger lowers the line inductance and results in the
B. Experimental Results distortion seen in the measured eye pattern. Thus, increasing
the between stages by using a longer line would restore
Fig. 14 plots the measured driver output return loss across the desired electrical performance, giving a sharp, uniform
10 MHz–40 GHz. Both single-ended and differential results are transition between output levels (e.g., as seen in Fig. 17(b)
more than 20 dB up to 13 GHz, and better than 10 dB up to for 150 pH increase in ).
35 GHz. The prediction from simulation using the measured The eye pattern measured directly at the data output of
S-parameters for the back termination resistor and EM simula- the pulse pattern generator (i.e., thru connection between RF
tion data for the output line agrees well with the measurements probes) is shown in Fig. 16(d). The 20–80% rise/fall times for
(see Fig. 14). Imperfect supply decoupling causes a spike in the the 10 Gb/s data are 12 ps. Compared to Fig. 16(a)–(c), the
measured single-ended return loss below 1 GHz, but this does driver prototype produces eye patterns with much less time
not affect the differential return loss. jitter. The root-mean-squared (RMS) and peak-peak jitters
The measured single-ended output transient waveforms at measured for the driver prototype are 0.9 ps and 5 ps, respec-
10 Gb/s are shown in Fig. 15. The signal amplitudes from the tively, compared to 1.9 ps and 13.2 ps at the pulse pattern
differential path match very well and are 3.1 V at each output generator data output. Retiming the input data using the low
across 50 at 10 Gb/s data rate. jitter clock from the generator reduces the jitter observed in the
Fig. 16 shows the measured eye patterns for a 2 1 length driver output. Additive jitter of the prototype is negligible, as
non-return-to-zero (NRZ), pseudorandom data at 10 Gb/s for the measured RMS and pk-pk jitters at the driver output are
different clock phase settings. Duty cycle distortion is below identical to the clock jitter measured at the pattern generator
1% for single-ended input amplitude larger than 120 . output.
The total rise/fall times (20–80%, including the output inter- The measured edge speed and jitter as a function of the single-
connections in Fig. 13) are 15 ps , and can be increased ended input amplitude for a 2 1 length NRZ data at 10 Gb/s
to 50 ps by varying the latch clock phases digitally (see are plotted in Fig. 18. The 20–80% rise/fall times are still less
Fig. 16(a)–(c)). The measured tuning range of 35 ps confirms than 25 ps when the single-ended input amplitude is decreased
that the prototype has sufficient range required to compen- to 80 . The minimum single-ended input data amplitude
sate for any PVT variations anticipated from simulation (i.e., measured at full output swing is 65 . The measured input
10–90% rise/fall times varying from 19 ps to 23 ps, as described data rate ranges from 7.24 to 11.8 Gb/s, which tracks the locking
in Section III-E) or encountered on the bench during testing. range of the frequency doubler-divider chain. The 9.9–12.5 Gb/s

Fig. 16. Measured electrical output and input eye patterns for length NRZ data at 10 Gb/s (timing indicated for 20–80% rise and fall times). (a) Clock
phases set for . rise/fall times (b) Clock phases set for medium rise/fall times (c) Clock phases set for . rise/fall times (d) Data output from pulse pattern

Fig. 17. Simulated eye diagrams for different self-inductances of each output line section . (a) (b) .

range (including tolerances anticipated from PVT variations) C. Performance Comparison

could be covered by increasing the self-resonant frequency of
the doubler-divider chain (e.g., reduce the load resistors in the A survey of the driver circuits published in the recent litera-
regenerative circuits). ture with output swing reveals that 10 Gb/s silicon


III-V technology equivalents [23]. At the 40 Gb/s rate, 6

output drivers reported in [24] and [6] demonstrated rise/fall
times below 10 ps, as opposed to 15 ps for a silicon design [25].
Table I compares the prototype driver performance to
commercial products (manufactured in III-V technologies),
and other recently published modulator drivers. The pro-
posed 10 Gb/s SiGe MZM driver prototype produces 6
swing with waveform symmetry between outputs, the fastest
(and trimmable) edge speeds among the 10 Gb/s examples,
and negligible additive timing jitter. It should be noted that
packaging parasitics affect the output waveforms. Simulation
results predict that the 20–80% rise/fall times of the driver
output degrade by 7 ps when 1 nH of bondwire inductance
and 30 fF shunt capacitance (e.g., estimated parasitics for a
high-speed BGA package) are included at each output. The
digitally controlled DA, with a demonstrated output return loss
Fig. 18. Measured rise/fall times and jitter as a function of the single-ended better than 10 dB below 35 GHz, is well-suited to the optical
input data amplitude at 10 Gb/s (2 1 pseudorandom data). modulator driver application. It occupies one-fifth of the chip
area of a commercial, differential DA modulator driver that
drivers typically exhibit slower rise/fall times, greater timing uses synthetic input and output transmission lines and LA
jitter [9], [10], and inferior eye pattern quality [22] compared to stages [5]. As predicted from (7), further development of the

proposed concept to operate from a lower supply would reduce driver design, as required for high bit-rate D-QPSK trans-
the power consumption of the new DA driver in proportion to mission system proposals employing multiple optical
the drop in supply voltage. modulators [32].
The TGA4954-SL driver from TriQuint Semiconductor [26]
consumes one-half of the power dissipated by the prototype ACKNOWLEDGMENT
(i.e., 1.1 W vs. 2.13 W), but it is not designed to drive a
balanced Mach-Zehnder modulator. Two single-ended ampli- The authors thank D. L. Harame and IBM Microelectronics
fiers, or a wideband balun (which attenuates the output swing) for foundry access and chip fabrication, MOSIS for facili-
would therefore be required to drive a balanced optical mod- tating the MPW service, and W. Straver and A. Akhnoukh at
ulator using the single-ended commercial products [27], [28]. Delft University of Technology for measurement, test fixture
By contrast, the TGA4957-SM differential driver (also from assembly, and technical support.
TriQuint) [29] dissipates as much power as the prototype dif-
ferential driver, although it is capable of greater output swings REFERENCES
(4–8 vs. 6 ) and higher data rates (28 Gb/s). Nev- [1] K. Noguchi et al., “Millimeter-wave Ti:LiNbO3 optical modulators,”
ertheless, examination of (7) suggests that power consumption J. Lightw. Technol., vol. 16, no. 4, pp. 615–619, Apr. 1998.
of the proposed driver topology could be lowered to 1.31 W by [2] L. Liao et al., “40 Gbit/s silicon optical modulator for highspeed appli-
cations,” Electron. Lett., vol. 43, no. 22, Oct. 2007.
making the following modifications: 1) using external bias-Ts [3] E. Säckinger, Broadband Circuits for Optical Fiber Communication.
to eliminate the DC power consumed by the back termination; New York, NY, USA: Wiley, 2005.
2) designing the ECL circuits to operate from 4.5 V (instead [4] J. Yuan and J. D. Cressler, “Design and optimization of superjunction
collectors for use in high-speed SiGe HBTs,” IEEE Trans. Electron
of 5.2 V); and 3) improving the efficiency of circuit blocks Devices, vol. 58, no. 6, pp. 1655–1662, Jun. 2011.
that were overdesigned for the prototype. [5] T. Y. K. Wong et al., “A 10 Gb/s AlGaAs/GaAs HBT high power fully
differential limiting distributed amplifier for III-V Mach-Zehnder mod-
ulator,” IEEE J. Solid-State Circuits, vol. 31, no. 10, pp. 1388–1393,
Oct. 1996.
[6] J. Dupuy et al., “A 6 V , 52 dB, 30-dB dynamic range, 43 Gb/s InP
V. SUMMARY DHBT differential limiting amplifier,” in Proc. IEEE CSICS 2011, Oct.
2011, pp. 1–4.
[7] K. Schneider et al., “Comparison of InP/InGaAs DHBT distributed am-
A novel, digitally controlled distributed amplifier designed plifiers as modulator drivers for 80-Gbit/s operation,” IEEE Trans. Mi-
to drive a balanced Mach-Zehnder optical modulator at crow. Theory Tech., vol. 53, no. 11, pp. 3378–3387, Nov. 2005.
10 Gb/s has been demonstrated successfully in IBM’s 0.18 m [8] J. Dupuy et al., “Distributed amplifiers in InP DHBT for 100-Gbit/s
operation,” in Proc. IEEE MTT-S 2010, May 2010, pp. 920–923.
SiGe-BiCMOS technology. The proposed topology eliminates [9] S. Mandegaran and A. Hajimiri, “A breakdown voltage multiplier for
many of the technical drawbacks that hamper the performance high voltage swing drivers,” IEEE J. Solid-State Circuits, vol. 42, no.
of conventional DAs implemented in silicon technology, by 2, pp. 302–312, Feb. 2007.
[10] D. Li and C. Tsai, “10-Gb/s modulator drivers with local feedback net-
employing a fully-digital, scalable input interface in place of works,” IEEE J. Solid-State Circuits, vol. 41, no. 5, pp. 1025–1030,
a passive transmission line. Digital latches driven by vari- May 2006.
able-phase clocks retime the input data stream to provide the [11] N. Feilchenfeld et al., “High performance, low complexity 0.18 m
SiGe BiCMOS technology for wireless circuit applications,” in Proc.
precise delay required for signal reconstruction at the DA IEEE BCTM, 2002, pp. 197–200.
output. Moreover, the data replicated at the input of each [12] T. Y. Wong, Fundamentals of distributed amplification. Boston, MA,
LA stage is identical in waveshape and amplitude, allowing USA: Artech House, 1993.
[13] Y. Ayashi et al., “A monolithic GaAs 1-13-GHz traveling-wave am-
every stage in the DA to be driven to its maximum output. plifier,” IEEE Trans. Microw. Theory Tech., vol. MTT-30, no. 7, pp.
Impairments affecting passive analog transmission lines, such 976–981, Jul. 1982.
as dispersion, attenuation, ringing and pulse distortion, were [14] J. B. Beyer et al., “MESFET distributed amplifier design guidelines,”
IEEE Trans. Microw. Theory Tech., vol. 32, no. MTT-3, pp. 269–275,
eliminated by the new, all-digital input interface. The operating Mar. 1984.
bandwidth of the proposed DA is no longer dominated by the [15] M. D. Ker et al., “ESD protection design for 1- to 10-GHz distributed
input line but rather by the speed of the digital input circuits, amplifier in CMOS technology,” IEEE Trans. Microw. Theory Tech.,
vol. 53, no. 9, pp. 2672–2681, Sep. 2005.
which can support data rates approaching 100 Gb/s in silicon [16] J. Dupuy et al., “A 6.2-V 100-Gb/s selector-driver based on a differ-
technology [30], [31]. ential distributed amplifier in 0.7- m InP DHBT technology,” in Proc.
The proof-of-concept, 2.87 mm prototype produces 6 IEEE MTT 2012, Jun. 2012, pp. 1–3.
[17] Y. Zhao et al., “A 10 Gb/s 6 V differential modulator driver in 0.18
differential output swing at 10 Gb/s data rate when amplifying m SiGe-BiCMOS,” in 2013 IEEE ISSCC Tech. Dig., Feb. 2013, pp.
2 1 pseudorandom data. Measured 20–80% rise/fall times 132–133.
are less than 15 ps with negligible additive jitter. Measured [18] T. S. D. Cheung and J. R. Long, “Shielded passive devices for silicon-
based monolithic microwave and millimeter-wave integrated circuits,”
output return loss is better than 20 dB below 10 GHz and more IEEE J. Solid-State Circuits, vol. 41, no. 5, pp. 1183–1200, May 2006.
than 10 dB up to 35 GHz. The inputs are ECL-compatible with a [19] L. Schmidt and H.-M. Rein, “Continuously variable gigahertz phase-
minimum measured sensitivity of 65 (single-ended) at shifter IC covering more than one frequency decade,” IEEE J. Solid-
State Circuits, vol. 27, no. 6, pp. 854–862, Jun. 1992.
10 Gb/s. Total power dissipation is 2.13 W operating from dual [20] J. P. Maligeorgos and J. R. Long, “A low voltage 5.1–5.8-GHz image-
5.2 V and 5 V supplies. reject receiver with wide dynamic range,” IEEE J. Solid-State Circuits,
The proposed modulator driver architecture can be scaled vol. 35, no. 12, pp. 1917–1926, Dec. 2000.
[21] T. S. D. Cheung and J. R. Long, “A 21–26-GHz SiGe bipolar power
easily to produce higher output swings, or multiple outputs. amplifier MMIC,” IEEE J. Solid-State Circuits, vol. 40, no. 12, pp.
Much of the input logic could be shared in a multi-channel 2583–2597, Dec. 2005.

[22] B. Goll and H. Zimmermann, “10 Gbit/s SiGe modulator driver with Santa Clara, CA, USA. Her research interests include integrated transceiver
37 dB gain and 680 mW power comsumption,” Electron. Lett., vol. 48, circuits for wireless and high-speed communication in the RF and mm-wave
no. 15, pp. 938–940, Jul. 2012. domains.
[23] J. Jeong and Y. Kwon, “10 Gb/s modulator driver IC with ultra high Dr. Zhao was a recipient of Best Paper Awards from the IEEE Radio Fre-
gain and compact size using composite lumped-distributed amplifier quency Integrated Circuits Symposium (RFIC) and the Topical Meeting on Sil-
approach,” in Dig. IEEE GaAs IC 2003, Nov. 2003, pp. 149–152. icon Monolithic Integrated Circuits in RF Systems (SiRF) in 2011.
[24] H. Shigematsu et al., “A 54-GHz distributed amplifier with 6-V
output for a 40-Gb/s LiNbO modulator driver,” IEEE J. Solid-State
Circuits, vol. 37, no. 9, pp. 1100–1105, Sep. 2002.
[25] C. Knochenhauer et al., “A compact, low-power 40-GBit/s modulator Leonardo Vera received the B.S. degree in elec-
driver with 6-V differential output swing in 0.25- m SiGe BiCMOS,” tronics from the Faculty of Science and Technology,
IEEE J. Solid-State Circuits, vol. 46, no. 5, pp. 1137–1146, May 2011. Universidad Mayor de San Simon, Cochabamba,
[26] TriQuint, 9.9-12.5 Gb/s optical modulator driver. Part no. Bolivia, in 2004, and the M.Sc. degree in electrical
TGA4954-SL. [Online]. Available: http://www.triquint.com/prod- engineering from Delft University of Technology,
ucts/p/TGA4954-SL Delft, The Netherlands, in 2008. From 2008 to
[27] TriQuint, 10.7Gb/s modulator driver amplifier. [Online]. Available: 2010, he worked as a researcher engineer in the
http://www.triquint.com/products/p/TGA4807 Electronics Research Laboratory, Delft University
[28] Picosecond Pulse Labs, Model 5865 12.5 Gb/s modulator driver. of Technology, and is currently a Ph.D. candidate in
[Online]. Available: http://www.picosecond.com/product/product. electrical engineering at the same institution.
asp?prod_id=1 His research interests include broadband ampli-
[29] TriQuint, 28 Gb/s differential modulator driver amplifier. Part no. fiers, frequency multipliers and dividers synthesizers, optical drivers, and RF
TGA4957-SM. [Online]. Available: http://www.triquint.com/prod- IC design.
[30] J. Rieh et al., “SiGe heterojunction bipolar transistors and circuits
toward terahertz communication applications,” IEEE Trans. Microw.
Theory Tech., vol. 52, no. 10, pp. 2390–2408, Oct. 2004.
[31] M. Möller, “Challenges in the cell-based design of very-high-speed John R. Long (M’83–SM’14) received the B.Sc.
SiGe-bipolar ICs at 100 Gb/s,” IEEE J. Solid-State Circuits, vol. 43, degree in electrical engineering from the University
no. 9, pp. 1877–1888, Sep. 2008. of Calgary, Calgary, AB, Canada, in 1984, and
[32] P. J. Winzer and R. Essiambre, “Advanced optical modulation for- the M.Eng. and Ph.D. degrees in electronics from
mats,” Proc. IEEE, vol. 94, no. 5, pp. 952–985, May 2006. Carleton University, Ottawa, ON, Canada, in 1992
[33] C. Schick et al., “40 Gbit/s differential distributed modulator driver and 1996, respectively, with distinction.
realised in 80 GHz SiGe HBT process,” Electron. Lett., vol. 45, no. 8, He was employed for 10 years by Bell-Northern
pp. 408–409, Apr. 2009. Research designing ASICs for Gbit/s fibre-optic
transmission systems, and from 1996 to 2001 as
an Assistant and then Associate Professor at the
Yi Zhao (S’08–M’14) received the B.Sc. degree University of Toronto, Toronto, ON, Canada. Since
in electrical engineering from Fudan University, January 2002, he has been Chair of the Electronics Research Laboratory
Shanghai, China, in 2005, and the M.Sc. and Ph.D. at the Delft University of Technology, Delft, The Netherlands. His current
(cum laude) degrees in microelectronics from the research interests include transceiver circuits for high-frequency, high-speed
Delft University of Technology, Delft, The Nether- and low-power integrated wireless/wireline systems.
lands, in 2008 and 2013, respectively. Prof. Long is Editor of the IEEE RFIC Virtual Journal, a new on-line journal
She was with NXP Semiconductors, Eindhoven, dedicated to radio-frequency circuit technologies. He was a co-recipient of Best
The Netherlands, working on ultra-wideband RF Paper Awards from the IEEE International Solid-State Circuits Conference
circuit design in 2008, and was with IBM, Essex (ISSCC) in 2000 and 2007 and the RFIC Symposium in 2006 and 2011, and is a
Junction, VT, USA, in 2012–2013. Since November member of the technical program committee for the 2014 European Solid-State
2013, she has been with Marvell Semiconductor, Circuits (ESSCIRC) conference.

