# **On-chip High Performance Signaling Using Passive Compensation**

Yulei Zhang∗, Ling Zhang∗, Akira Tsuchiya†, Masanori Hashimoto‡, Chung-Kuan Cheng<sup>∗</sup>

∗University of California, San Diego, La Jolla, CA 92093-0404, USA

†Kyoto University, Kyoto, Kyoto 606-8501, JAPAN, ‡Osaka University, Suita, Osaka 565-0871, JAPAN

<sup>∗</sup>{y1zhang, ckcheng}@ucsd.edu, lizhang@cs.ucsd.edu

†tsuchiya@vlsi.kuee.kyoto-u.ac.jp, ‡hasimoto@ist.osaka-u.ac.jp

*Abstract***— To address the performance limitation brought by the scaling issues of on-chip global wires, a new configuration for global wiring using on-chip lossy transmission lines(T-lines) is proposed and optimized in this paper. Firstly, we use passive compensation and repeated transceivers composed by sense amplifier and inverter chain to compensate the distortion and attenuation of on-chip T-lines. Secondly, an optimization flow for designing this scheme based on eye-diagram prediction and sequential quadratic programming (SQP) is proposed. This flow is employed to study the latency, power dissipation and throughput performance of the new global wiring scheme as the technology scales from 90nm to 22nm. Compared with conventional repeater insertion methods, our experimental results demonstrate that, at 22nm technology node, this new scheme reduces the normalized delay by 85.1%, the normalized energy consumption by 98.8%. Furthermore, all the performance metrics are scalable as the technology advances, which makes this new signaling scheme a potential candidate to break the "interconnect wall" of digital system performance.**

## I. INTRODUCTION

As technology scales, interconnect planning has been widely regarded as one critical factor in determining the system performance and total power consumption. According to the prediction of ITRS roadmap of year 2007 [1], the 1mm global RC wire delay is 385ps without inserting repeaters, while the 10 stage FO4 delay is below 200ps. Given the fact that global wires with 1mm length or more are very commonly used for on-chip communication nowadays, we can clearly see there exists a big performance gap between the interconnect and logic gates. Interconnects also consume a significant portion of total power. In [2], Magen et al. found that the interconnect power alone accounts for half the total dynamic power of a 0.13um microprocessor that was designed for power efficiency.

The conventional approach to deal with the interconnect delay problem is buffer insertion, which is also referred as repeated RC wires. By inserting buffers or repeaters along the long wire, the relationship between wire delay and wire length changes from quadratic to linear. Repeaters improve the RC wire performance but also introduce overhead in terms of power and wiring complexity. In [3], Zhang et al. compared the repeated RC wires under different design goals across multiple technology nodes. They pointed out that to minimize delay, the optimum repeated RC wire has equal amount of wire capacitance and gate capacitance, which means half of the dynamic power is dissipated on repeaters.

On-chip global signaling using transmission line(T-line) has attracted intensive research focus in recent years. Compared with repeated RC wires, T-line delivers signals with speed of light in the medium and consumes much less power as well, since the wave propagation eliminates the full swing charge and discharge on wire and gate capacitance. However the inter symbol interference (ISI) can be a barrier of higher data rate, various approaches have been proposed. [4] and [5] tuned the termination resistance to achieve optimal eyediagram and derived the analytical formula. [6], [7] and [8] proposed the surfliner scheme that intentionally inserting shunt resistors along the wire to minimize the distortion.

In this work, we propose a high performance on-chip global signaling using passive compensation. We use a parallel RC circuit at driver side to compensate the attenuation of high frequency components and adopt a double-tail sense amplifier followed by an inverter chain as transceivers to recover the received signals. The proposed scheme is optimized and compared with repeated RC wire in terms of latency, power and bandwidth. The experimental results demonstrate that, at 22nm node, the proposed signaling scheme could reduce the normalized delay by 85.1%, reduce the normalized energy consumption per bit by 98.8% compared with optimum repeated RC wires.

Our contributions include: 1) An on-chip global signaling scheme with passive compensation, 2) An optimization flow based on sequential quadratic programming (SQP) method for determining optimal design variables, 3) Comparison between the proposed on-chip T-line scheme and repeated RC wire under the design goal of minimum delay across different technologies.

# II. SIGNALING SCHEME FOR GLOBAL WIRING

Fig. 1(a) shows the proposed signaling scheme, which consists of parallel RC equalizers, differential wires, termination resistance and transceivers. The parallel RC circuit serves as a high pass filter which boosts high frequency components in the input signal and therefore compensate the attenuation along the wires. The termination resistance  $R_l$ which determines the saturation voltage, could be tuned with *Rd*, *Cd* to achieve better far-end eye-opening. Two identical transceivers, which include a double-tail sense amplifier followed by a differential inverter chain as indicated in



Fig. 1. The proposed signaling scheme for global wiring: (a) one stage structure; (b) transceiver configuration.

Fig. 1(b), are used at both the driver and receiver sides to amplify and recover the signal to full-swing.

### *A. On-chip T-line*

On-chip T-line is very lossy due to the miniaturization of the wire cross section. Given different frequencies and wire dimensions, the wire can operate in either RC or LC region. In RC region, the frequency is low, which satisfies that  $\omega L \ll R$  and  $G \approx 0$ . The propagation constant can be written as:

$$
\gamma \approx \sqrt{j\omega RC} = \sqrt{\frac{\omega RC}{2}} + j\sqrt{\frac{\omega RC}{2}}\tag{1}
$$

both the attenuation and phase velocity are frequency dependent.

If the frequency increases such that  $\omega L \gg R$  and *G* approaches zero, the wire is in LC region and the propagation constant becomes:

$$
\gamma \approx \sqrt{(R + j\omega L)j\omega C} = \frac{R}{2\sqrt{L/C}} + j\omega\sqrt{LC}
$$
 (2)

therefore we can approximate the attenuation constant  $\alpha =$  $\frac{R}{2\sqrt{L/C}} = \frac{R}{2Z_0}$  where  $Z_0$  is the characteristic impedance of Then and the phase velocity  $v = \frac{\omega}{\beta} = \frac{1}{\sqrt{LC}} = \frac{c_0}{\sqrt{\epsilon_r}}$ , is the speed of light in the medium with the dielectric constant  $\varepsilon_r$ . In LC region, both the attenuation and the phase velocity are independent of frequency.

We adopt two parameters to determine the operation region of wire. The boundary wire length *Dle* distinguishes lumpedelement region and distributive-element region. It corresponds to the minimum wire length that satisfies distributive element model and can be computed as follows [9]:

$$
D_{le} = \left| \frac{0.25}{\sqrt{(R + j\omega L)(j\omega C)}} \right| \tag{3}
$$

The other one is the corner frequency *fLC* between RC region and LC region, which is defined as:

$$
f_{LC} = \frac{1}{2\pi} \frac{R_{DC}}{L} \tag{4}
$$

where  $R_{DC}$  is the DC resistance of the wire.

In our design, we tune the resistance, inductance and capacitance of the wire by selecting wire dimensions, including width, spacing, thickness and height of dielectric, which further determine the characteristic impedance, attenuation

TABLE I THE 3 DIFFERENT WIRE CASES USED IN THIS WORK.

| Id. | Length<br>(L)/mm | Width<br>W)/µm | Spacing<br>$(S)/\mu$ m | Height<br>$(H)/\mu$ m | <i>Thickness</i><br>$(T)/\mu$ m | $D_{le}$<br>/um | ŤLС<br>/GHz |
|-----|------------------|----------------|------------------------|-----------------------|---------------------------------|-----------------|-------------|
| A   |                  | .              | .                      | 0.6                   | 0.6                             | 348.9           | 13.20       |
| в   | 10               | 1.8            | 1.8                    | 0.8                   | 0.8                             | 429.7           | 6.61        |
| U   |                  | 2.0            | 2.0                    |                       | 1.0                             | 554.5           | 4.12        |



Fig. 2. The eye-diagrams observed at the far-end of lossy on-chip T-line: (a) w/o parallel *RC* equalizer; (b) w/ parallel *RC* equalizer.

and phase velocity. Table I lists the 3 wire cases we use in this work including the dimension of wire, boundary wire length  $D_{l\rho}$  and corner frequency  $f_{LC}$ . The data show that, all the 3 cases could be modeled as T-line in LC region if the signal frequency goes up to and beyond 13.20GHz, which is achieved as shown in Section IV.

## *B. Parallel RC equalizer*

Parallel RC circuit has been used in [10] to minimize the distortion for on-board T-line. We adopt this approach at the driver side for on-chip T-line to compensate the attenuation of high frequency components given the fact that on-chip T-line is very lossy, especially at high frequency.

Fig. 2 shows the qualitative result of adding RC equalizer at the driver side of on-chip T-line. The line is 5mm long and the bit rate of input signal is 20Gbps. It is clear to see that, introducing parallel RC equalizer could improve the eye-opening from less than 200mV to 400mV. For different wire and bit rate, the values of  $R_d$  and  $C_d$  can be tuned to have a better eye-opening.



Fig. 3. Double-tail latch-type voltage sense amplifier.

#### *C. Transceiver design*

The adopted sense amplifier (SA) is based on a double-tail latch-type scheme [11] (Fig. 3). This scheme achieves fast decisions by using positive feedback in the second stage. Furthermore, because of high input impedance, full-swing output and absence of static power consumption, it could be used in global wiring scheme to achieve high-performance, low-power interconnect. Different from other one stage SAs, the double-tail scheme employs two tail current sources controlling the working currents in two stages, which provides more flexibility for designer to handle the tradeoffs between speed, power, input offset and other performance metrics.

To fully utilize the performance of this double-tail SA, we need to carefully tune the size of transistors. Firstly, the larger M12 and smaller M9 (as shown in Fig. 3, same for following notations) are typically set to achieve both large current in latching stage and small current in input stage, for fast switching and low offset. Secondly, the sizes of input transistors M5 and M6 are tuned to balance the SA delay during reset phase and decision phase. Finally, ratio of M2/M10 (M4/M11) is optimized to speed up the positive feedback, which is the dominant factor of SA delay. The analysis above provides a guideline to design the SA under a given technology.

For the inverter chain, the optimal stage number and sizing ratio could be computed in terms of different performance costs. In this work, in order to simplify the formulation, we fix the stage number to 6 and the size ratio to  $e \approx 2.7$ , to minimize the total delay while changing all the inverter sizes simultaneously according to output resistance of the last inverter, referred as *Rs*.

We model the total transceiver stage at the near-end of Tline as a voltage source  $V<sub>S</sub>$  with output resistance  $R<sub>s</sub>$ , where *VS* provides the full swing output signal of transceiver and *Rs* corresponds to the output resistance of the inverter chain, which is set to be a design variable to be optimized in the following experiments as  $R_d$ ,  $C_d$  and  $R_l$ .

#### III. PROBLEM FORMULATION AND OPTIMIZATION FLOW

We formulate this optimization problem as a constrained non-linear programming problem, and adopt Sequential Quadratic Programming (SQP) method [12] to solve it. The design goal is to minimize total latency. The optimization variables are  $R_s$ ,  $R_d$ ,  $C_d$  and  $R_l$  as defined in Section II.



Fig. 4. The cross section of a differential stripline

For a given technology node and a given wire type, the formulation can be written as:

$$
\begin{array}{ll}\n\min & f = f_0 + a e^{k(V_0 - V_{eye})} \\
\text{s.t.} & R_{min}^s \le R_s \le R_{max}^s \\
& R_{min}^d \le R_d \le R_{max}^d \\
& C_{min} \le C_d \le C_{max} \\
& R_{min}^l \le R_l \le R_{max}^l\n\end{array} \tag{6}
$$

where  $f_0$  is the total latency,  $a$ ,  $k$  are constants and  $V_0$ corresponds to the minimal input voltage difference required by the SA. We add the exponential term to handle the constraint on eye opening. When the eye opening *Veye* is smaller than  $V_0$ , the exponential term dominates and forces the flow to find a larger  $V_{eve}$ , otherwise the  $f_0$  dominates and the total latency will be minimized.

As discussed in Section II-C, we model the transceiver stage at the near-end of T-line as a voltage source with output resistance  $R_s$  and characterize the transceiver at the far-end as a look-up table with index of  $\Delta V_{in}$  and  $R_s$  and entry of delay. In each iteration of optimization, we simulate the farend step response of T-line for a given set of  $R_s$ ,  $R_d$ ,  $C_d$ , and  $R_l$  and adopt [13] to estimate the eye opening, which corresponds to the  $\Delta V_{in}$  of the following transceiver. Given  $\Delta V_{in}$  and  $R_s$ , the delay of transceiver stage could be derived using look-up table model. Finally, we combine the wire delay and transceiver delay to have the total delay.

### IV. EXPERIMENTAL RESULTS

The proposed signaling scheme is optimized using 3 wire cases with different dimensions. Also, we study the performance scaling of this new scheme and compare the results with repeater-inserted RC wires.

We perform the optimization for 5 technology nodes: 90nm, 65nm, 45nm, 32nm and 22nm. At each technology node, we try 3 different wire types as shown in Table I.

#### *A. Experiment settings*

A differential stripline configuration is used to model the on-chip T-lines, which is shown in Fig. 4. The resistivity of copper wire is  $\rho = 2.2 \times 10^{-6} \Omega \text{ cm}$  in this case. The dielectric constant and loss tangent are also shown in the figure. The 2D EM solver CZ2D from EIP tool of IBM [14] is employed to extract the frequency dependent RLGC values to build the tabular model, which could be simulated in SPICE.

The design and simulation of transceiver stage adopts a predictive transistor model including the process from 90nm to 22nm based on the work of [15]. The model is a Synopsys level3 MOSFET model and the parameters are tuned to follow the ITRS prediction.

We use HSPICE to simulate the whole circuit as well as measure the delay. The optimization flow is implemented in MATLAB. All the experiments are performed on a Linux Workstation with 3GHz CPU and 16GB memory.

# *B. Optimal solutions*

We use proposed flow to optimize the signaling scheme using wire A-C as shown in Table I with technology node from 90nm to 22nm. The optimal design variables  $(R_s, R_d, C_d, R_l)$ in terms of minimum total delay are listed in Table II. While running the optimization, the ranges of  $R_s$ ,  $R_d$ ,  $C_d$ ,  $R_l$ are set to be [10Ω, 60Ω], [0, 500Ω], [0, 5*pF*] and [0, 500Ω], respectively. In summary, we study  $5 \times 3 = 15$  cases, and each case costs about 300 to 1000 seconds to complete the optimization. In order to avoid trapping in the local minimal, we randomly choose three or four initial solutions and apply the SQP flow respectively, so the total CPU time for one case varies from 0.5 hour to 1 hour.

The total delay, power consumption and energy consumed by single bit transmission corresponding to optimal solution of each case are summarized in Table III. The total delay includes time of flight for a given length wire, the rise time of far-end received signal (which corresponds to cycle time *TC* at each technology node) and the transceiver delay, which is optimized indeed. Similarly, the total power consumption consists of power consumed on the T-line, passive elements  $R_d$ ,  $R_l$  and the transceiver stage. Typically, energy per bit, which is defined as power consumption divided by bit rate, is used to assess the power efficiency of interconnect. In this scheme, bit rate is restricted by the bandwidth of SA, which is shown in the last column of Table III. The results demonstrate that for 15mm long global wire, the proposed signaling scheme could achieve 120.6ps delay and as low as 0.032pJ/bit energy consumption at 22nm technology node.

The effects of design variables  $R_s$  and  $R_l$  upon the eyeopening observed at the far-end of T-line are illustrated in Fig. 5, which could provide a physical intuition regarding how these variables are chosen to minimize the total delay. Adopting the optimal solutions of wire B, we sweep *Rs* and  $R_l$  while fixing other variables to generate Fig. 5, which includes five different curves corresponding to the technology nodes from 90nm to 22nm.

The effect of transceiver output resistance  $R_s$  is shown in Fig. 5(a). It could be observed that, the eye-opening decreases as the *Rs* increases because of the lower saturation voltage. For reducing the SA delay, eye-opening is needed to be as large as possible which means  $R_s$  should be set as lower boundary. However, considering the delay of inverter chain, which could be reduced as  $R_s$  increases, the optimal  $R_s$  is chosen to be around 20 $\Omega$  to balance this tradeoff. The figure also demonstrates that eye-opening decreases as the technology scales down because of the increasing signal frequency.

The effect of load resistance  $R_l$  is shown in Fig. 5(b). The optimal  $R_l$  in terms of maximum eye-opening decreases



(a) The change of eye-opening when  $R_s$ : 10 $\Omega - 60\Omega$ .



(b) The change of eye-opening when *R<sup>l</sup>* : 20Ω−400Ω.

Fig. 5. Effect of design variables upon the eye-opening.

from about 400 $\Omega$  to 70 $\Omega$  as the technology scales down from 90nm to 22nm. This phenomenon could be explained using the view of matching high frequency and low frequency components to minimize the distortion introduced in [4]. As the technology scales from 90nm to 22nm, the signal frequency increases and causes the larger attenuation for high frequency components. As a result, the optimal *Rl* decreases to reduce the saturation voltage in order to match this increasing high frequency attenuation to minimize the distortion.

## *C. Evaluation and comparison of performance metrics*

Choosing  $L = 15$ *mm* wire case, we compare the normalized delay (*delayn*), normalized energy consumption (*powern*) and normalized throughput (*throughputn*) of proposed signaling scheme with those of optimal repeated RC wires, and summarize all the results in Table IV. These three performance metrics are defined as following:

TABLE II

OPTIMAL SOLUTIONS OF DIFFERENT WIRE LENGTHS AND TECHNOLOGY NODES FOUND BY PROPOSED FLOW.

| Tech<br>Node     | $L = 5$ mm |       |                   |       | $L=10$ mm |       |                |              | $L=15$ mm      |       |                 |       |
|------------------|------------|-------|-------------------|-------|-----------|-------|----------------|--------------|----------------|-------|-----------------|-------|
|                  | $R_{S}$    | $R_d$ | $C_d$             | R,    | $R_{S}$   | $K_d$ | $\mathsf{C}_d$ | $\mathbf{v}$ | $\mathbf{v}_s$ | $R_d$ | $\mathcal{C}_d$ | $R_l$ |
|                  | /Ω         | /Ω    | 'vF               | /Ω    | /Ω        | /Ω    | ' vF           | /Ω           | /Ω             | /Ω    | 'pF             | /Ω    |
| 90 <sub>nm</sub> | 17.52      | 280.3 | .23               | 498.9 | 15.94     | 220.9 | 2.68           | 399.5        | 18.85          | 174.7 | 4.59            | 500.0 |
| 65nm             | 19.90      | 87.6  | 3.94              | 499.5 | 15.38     | 143.8 | 3.66           | 268.6        | 19.97          | 181.4 | 3.79            | 493.7 |
| 45nm             | 25.00      | .17.6 | $2.\overline{33}$ | 429.6 | 20.00     | 155.6 | 65             | 118.6        | 16.27          | 87.5  | 3.64            | 494.0 |
| 32nm             | 10.00      | 52.8  | 2.97              | 175.6 | 15.51     | 235.4 | 0.80           | 58.8         | 18.04          | 270.0 | 1.54            | 289.0 |
| 22nm             | 21.85      | 63.8  | 1.92              | 121.5 | 14.86     | 247.3 | 0.82           | 68.7<br>∍    | 22.07          | 181.0 | 0.63            | 133.9 |



| 10011<br>Node    | delay | power | bit energy | delay | power | bit energy | delay | power | bit energy | Du iran<br>/Gbps |
|------------------|-------|-------|------------|-------|-------|------------|-------|-------|------------|------------------|
|                  | DS /  | 'uW   | /pJ        | 'DS   | /uW   | /pJ        | l ps  | /uW   | /pJ        |                  |
| 90 <sub>nm</sub> | 281.2 | 2057  | 0.309      | 313.1 | 2443  | 0.366      | 343.7 | 2235  | 0.335      | 6.7              |
| 65nm             | 83.0  | 2053  | 0.185      | 215.1 | 2433  | 0.219      | 245.9 | 2063  | 0.186      |                  |
| 45nm             | 11.3  | 1984  | 0.099      | 142.5 | 2273  | 0.114      | 73.7  | 2008  | 0.100      | 20.0             |
| 32nm             | 78.0  | 716   | 0.051      | 110.0 | 1909  | 0.057      | 140.3 | 714   | 0.051      | 33.3             |
| 22nm             | 58.2  | 1834  | 0.037      | 89.8  | 1816  | 0.036      | 20.6  | 579   | 0.032      | 50.0             |

TABLE IV PERFORMANCE COMPARISON BETWEEN PROPOSED ON-CHIP T-LINE SCHEME AND REPEATED RC WIRE AT L=15MM.



$$
delay_n = \frac{\text{Delay}}{\text{Wire Length}} \tag{7}
$$

$$
power_n = \frac{\text{Energy per Bit}}{\text{Wire Length}} \tag{8}
$$

*throughput<sub>n</sub>* = 
$$
\frac{\text{Bit Rate}}{\text{Pitch}}
$$
 (9)

where the definition of bit rate in (9) is different for repeated RC wire and proposed on-chip T-line. For former one, if adopting data pipelining approach, the bit rate could be improved to the inverse of delay between two inserted inverters, however, in this work we use normally defined inverse of total delay as the bit rate. For the latter one, the bit rate is actually the bandwidth of SA in transceiver stage, which is pre-decided by designing at different technodes. The data for repeated RC wires are computed based on the analytical formulas derived in [3]. The wire dimensions and parameters follow the prediction of minimum-pitch global wire in ITRS reports [1] and the transistor parameters are obtained from the same predictive model.

Our experimental results show that, the normalized delay of repeated RC wires increases from 35.55ps/mm at 90nm node to 60.44ps/mm at 45nm node and then decreases to 54.11ps/mm at 22nm node due to the reduction of dielectric constant as ITRS predicts, whereas the delay of proposed on-chip T-line is 22.91ps/mm at 90nm node and decreases following the technology scaling to 8.04ps/mm at 22nm node. For normalized energy per bit, repeated RC wires consume 311.3pJ/m at 90nm node, and the value decreases to 179.0pJ/m at 22nm node; correspondingly the normalized energy per bit of proposed on-chip T-line is 22.33pJ/m at 90nm node and decreases to 2.13pJ/m at 22nm node. The throughput per pitch of repeated RC wires is 4.57Gbps/μ*m* at 90nm node and increases to 18.67Gbps/μ*m* at 22nm node because of the scaling pitch size as technology advances. For on-chip T-line, the normalized throughput is 0.84Gbps/μ*m* at 90nm node and increases to 6.25Gbps/μ*m* at 22nm node. Indeed, if we change the Aspect Ratio(AR) of wire from 0.5 to 2.0, the wire could maintain the same resistance but the normalized throughput will double. We list the new results in the last row of Table IV. In summary, at 22nm node, the proposed on-chip T-line will reduce the normalized delay by 85.1%, the normalized energy per bit by 98.8%, with the sacrifice of losing 33.0% normalized bandwidth compared with repeated RC wires.

The results are also illustrated using histograms in Fig. 6, 7 and 8. The figures show the improvements of the proposed on-chip T-line compared with repeated RC wires in terms of delay, energy consumption, and the tradeoff in terms of throughput. Also, it can be seen that, all the performance metrics of the proposed scheme are scalable as the technology advances from 90nm to 22nm.

## V. CONCLUSIONS AND FUTURE WORK

#### *A. Conclusion*

In this paper, a new signaling scheme using on-chip lossy transmission line(T-line) for global point-to-point communication is proposed. The new scheme adopts the parallel RC equalizer combined with optimal termination resistor to compensate the distortion of on-chip T-line and employs the transceiver stage composed by sense amplifier(SA) and inverter chain to amplify and regenerate the full-swing digital



Fig. 6. Normalized delay comparison between repeated RC wire and proposed on-chip T-line for L=15mm.



Fig. 7. Normalized energy consumption between repeated RC wire and proposed on-chip T-line for L=15mm.



Fig. 8. Normalized throughput between repeated RC wire and proposed on-chip T-line for L=15mm.

signal. The analysis and design of such a scheme are discussed and an optimization flow based on eye-diagram prediction and Sequential Quadratic Programming (SQP) is applied to determine the design variables under the object function of minimum total delay. We optimized the scheme with three different wires under five different technology nodes. The experimental results demonstrate that, by comparing with repeated RC wires, the proposed on-chip Tline scheme could greatly improve the delay and power consumption with a sacrifice of reducing the throughput at advanced technology node. At 22nm node, it could reduce the normalized delay by 85.1%, the normalized energy per bit by 98.8%, and achieve 2/3 of normalized throughput of repeated RC wires.

## *B. Future work*

Future works include further exploring the potential of the proposed signaling scheme by adopting other passive compensation approaches like serial *R*-*L* at termination or comparing the proposed scheme with other schemes without passive compensation, and improving the optimization flow for handling more design goals, like delay-power product and delay2-power product. We also want to study the wire cases with different spacing to reveal the tradeoffs between delay, power consumption and throughput for proposed scheme, and provide a guideline to help designers make choices. Also, the more complex effects regarding the system level implementation of proposed scheme, which consists of the reliability, signal integrity and so on, should be taken into consideration while modeling the T-line and transceiver stage during the following research.

#### VI. ACKNOWLEDGMENTS

The work was supported in part under the grant of California *MICRO* program. The authors would like to thank the reviewers for their valuable comments.

# **REFERENCES**

- [1] S. I. Association. (2004,2006,2007) International technology roadmap for semiconductors. [Online]. Available: http://www.itrs.net
- [2] N. Magen, A. Kolodny, U. Weiser, and N. Shamir, "Interconnect power dissipation in a microprocessor," in *IEEE/ACM Int. Workshop on System Level Interconnect Prediction*, 2004, pp. 7–13.
- [3] L. Zhang, H. Chen, B. Yao, K. Hamilton, and C. Cheng, "Repeated on-chip interconnect analysis and evaluation of delay, power and bandwidth metrics under different design goals," in *IEEE Int. Symp. on Quality Electronic Design*, 2007, pp. 251–256.
- [4] M. Flynn and J. Kang, "Global signaling over lossy transmission lines," in *IEEE/ACM Int. Conf. on Computer-Aided Design*, Nov. 2005, pp. 985–992.
- [5] A.Tsuchiya, M. Hashimoto, and H. Onadera, "Design guidline for resistive termination of on-chip high-speed interconnects," in *IEEE Custom Integrated Circuits Conf.*, Sept. 2005, pp. 613–616.
- [6] H.Chen, R.Shi, and C.K.Cheng, "Surfliner: A distortionless electrical signaling scheme for speed-of-light on-chip communication," in *IEEE Int. Conf. on Computer Design*, Oct. 2005, pp. 497–502.
- [7] H.Zhu, R.Shi, C.K.Cheng, and H.Chen, "Approaching speed-of-light distortionless communication for on-chip interconnect," in *IEEE/ACM Asia and South Pacific Design Automation Conf.*, Jan. 2007, pp. 684– 689.
- [8] C.C.Liu, H.Zhu, and C.K.Cheng, "Passive compensation for high performance inter-chip communication," in *IEEE Int. Conf. on Computer Design*, Oct. 2007, pp. 547–552.
- [9] H. Johnson and M. Graham, *High-speed signal propagation*. Prentics Hall, 2003.
- [10] J. Shin and K. Aygun, "On-package continuous-time linear equalizer using embedded passive components," in *IEEE Electrical Performance of Electronic Packaging*, Oct. 2007, pp. 147–150.
- [11] D. Schinkel, E. Mensink, E. Klumperink, E. Tuiji, and B. Nauta, "A double-tail latch-type voltage sense amplifier with 18ps setup+hold time," in *IEEE Int. Solid-State Circuits Conf.*, Feb. 2007, pp. 314– 316.
- [12] M.C.Biggs, "Constrained minimization using recursive quadratic programming: some alternative subproblem formulations," in *L.C.W.*<br>Dixon and G.P. Szego, eds Towards elobal optimization. North-Dixon and G.P. Szego, eds.,Towards global optimization. Holland, Amsterdam, 1975, pp. 341–349.
- [13] W. Yu, R. Shi, and C. Cheng, "Accurate prediction of eye-diagram characteristics based on step response," in *IEEE/ACM Int. Conf. on Computer-Aided Design*, 2008.
- [14] IBM, "IBM electromagnetic field solver suite of tools," in *http://www.alphaworks.ibm.com/tech/eip*.
- [15] S. Uemura, A. Tsuchiya, and H. Onodera, "A predictive transistor model based on itrs roadmap," in *General Conf. of IEICE*, Mar. 2006, p. 81.