# Transparent Repeaters<sup>1</sup>

Radu M. Secareanu and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester, Rochester, New York 14627-0231 radums@ece.rochester.edu, friedman@ece.rochester.edu

Abstract—The concept of a "transparent repeater<sup>1</sup>," which is an amplifier circuit designed to minimize the delay introduced by highly resistive interconnect lines in high speed digital circuits, is introduced and described in this paper. An insertion methodology for this circuit is also discussed. Defining characteristics of this circuit are: the input is connected to the output, the output generates the same sense transition as the corresponding input transition, the buffer output becomes high impedance after every transition, and the buffer may detect input transitions with low threshold voltages.

## I. INTRODUCTION

With continuous technology scaling into the deep submicrometer (DSM) range, increased resistance in the interconnect lines has made the passive interconnect impedances dominate over the active gate output impedances. Therefore, the speed of current processors is currently dominated by the RC interconnect lines. Solutions to overcome this performance limiting constraint have been developed. Optimal repeater insertion [1–4] is the most well known and widely used solution, and is schematically shown in Figure 1.



Fig. 1. Inserted repeaters (or buffers) along resistive RC interconnect

A distributed RC line can be optimally partitioned into k equal segments, each segment driven by an inverting buffer IB. The delay and power dissipation necessary to drive the RC line is therefore minimized. The total delay of the line can be characterized as

$$D = k(D_1 + D_B), \tag{1}$$

<sup>1</sup>Patent pending

where  $D_1$  is the delay introduced by an  $RC_1$  segment, and  $D_B$  is the delay introduced by an inverting buffer IB. Note that the k buffers introduce a total delay equal to  $kD_B$ . Note also that the total delay of the line depends upon the number of segments k in which the line is partitioned, the delay of the IB buffer  $D_B$ , and the geometric size of the IB buffer. Values for k,  $D_B$ , and the IB buffer size exist to obtain the minimal delay and power dissipation to optimally drive an RC interconnect line [3].

The minimal delay of the line can be further improved using an HDR buffer [5], which has a voltage transfer characteristic (VTC) with low switching threshold voltages and hysteresis. Briefly, the HDR buffer detects an input transition with a low threshold voltage (for example, for a 5 volt system, the lowto-high signal transitions at 1 volt instead of approximately 3 volts for an inverter, and the high-to-low signal transitions at 4 volts instead of approximately 2 volts for an inverter). The total delay of an RCline using HDR buffers is

$$D_h = g(D_1 + D_{HDR}). \tag{2}$$

Maintaining the same segment delay  $D_1$ , the length of the line segment can be increased since the signal has to only reach the low threshold voltage of an HDR buffer [5]. A consequence of this capability is that the line can be partitioned into only g segments with g smaller than k. The delay of the HDR buffer  $D_{HDR}$  must also be minimized to minimize  $D_h$ . To achieve this minimal delay, the design of the HDR buffer is based on an HD buffer [6].

Briefly, the HD buffer introduces a minimal delay from the input transition to the output response by eliminating any parasitic capacitance within the buffer along the signal path. For example, in a tapered buffer, the output of a stage drives the gate capacitances of the N and P transistors of the next stage. For a low-to-high input transition, the output of the stage drives the parasitic gate capacitance of the N transistor of the following stage. The capacitance of the N transistor for this transition represents 25% of the total capacitance at the output of a stage. For a high-to-low input transition, the output of that stage drives the parasitic gate capacitance of the P transistor of the following stage. The capacitance of the P transistor for this transition represents 75% of the total capacitance at the output of a stage. Accordingly, the delay introduced by a stage of an optimal tapered buffer is 25% larger for a low-to-high input transition, and 75% larger for a high-to-low input transition. The HD buffer eliminates these parasitic

This research was supported in part by the National Science Foundation under Grant No. MIP-9610108, the Semiconductor Research Corporation under Contract No. 99-TJ-687, a grant from the New York State Science and Technology Foundation to the Center for Advanced Technology—Electronic Imaging Systems, and by grants from the Xerox Corporation, IBM Corporation, Intel Corporation, Lucent Technologies Corporation, and Eastman Kodak Company.

capacitances, thereby decreasing the overall delay.

The proposed circuit, called a transparent repeater (TR), together with an insertion methodology, reduces the  $kD_B$  [see (1)] or  $gD_{HDR}$  [see (2)] delays. Therefore, the TR circuit significantly reduces the total delay of the line as compared to classical repeater [3] or HDR buffer insertion methodologies [5]. The name of the proposed circuit is intended to suggest that inserting a TR along the RC line introduces zero delay into the signal path, and therefore the repeater is transparent to the signal.

To better understand the operation of the TR buffer, the TR buffer insertion methodology is described in Section II. The operation of the TR buffer circuit is discussed in Section III. Simulation results are briefly summarized in Section IV. Finally, some conclusions are presented in Section V.

## II. THE TR BUFFER INSERTION METHODOLOGY

An important observation of a TR buffer is that the circuit operates as a local, controlled current source which sources or sinks current on the interconnect line at specific insertion points. The TR buffer is turned on by any signal transition detected along an interconnect line, producing an amplified transition at the output (a low-to-high output transition for a low-to-high input transition and a high-to-low output transition for a high-to-low input transition), and auto tri-states the output after the transition is completed while preparing the input to detect the next transition on the interconnect line.



Fig. 2. Insertion methodology for TR buffers

The insertion methodology for the TR buffers is shown in Fig. 2. An example is chosen to describe the insertion methodology where the TR buffer has the same switching threshold voltages as an *IB* buffer. Consider  $D_1$  equal to 0.1 ns, and the TR buffer delay (from the input transition detection to the complete output response) is equal to 0.3 ns. The distributed *RC* line is assumed uniform, *i.e.*, the length of a segment of the line is proportional to the delay introduced by that segment.

Initially, all of the TR buffers are off and the IB buffer drives the entire line. After  $DR_1 = D_1 = 0.1$  ns, the TR threshold voltage is reached at A. TR1 forces the output active after 0.3 ns. The signal further propagates along B, C, and D through the line segments 2, 3, and 4, respectively. Other TR buffers are inserted at B, C, and D, where  $DR_2 =$ 

 $DR_3 = DR_4 = 0.1$  ns. The length of the line segment 4 is smaller than the length of the line segment 3, which is smaller than the length of the line segment 2, which finally is smaller than the length of the line segment 1 due to the characteristics of the signal propagation along an RC line. When the signal reaches the TR threshold in D, TR4 is activated, TR3 and TR2 are progressively closer to the time when the output becomes active, while the output of TR1 has just become active. The current at the output of TR1 enhances the drive strength of the line (similar to having the *IB* buffer placed at point A of the line as shown in Fig 2), thereby increasing the length of the segment 5 to provide a delay equal to 0.1 ns, as compared to the length of the segment 5 when only the IB buffer is active. After 0.1 ns, when the signal reaches E and TR5 is activated, the output of TR2 becomes active and enhances the drive strength of the line at point B. This cycle in which the next TR buffer along the line becomes active and enhances the drive strength of the line at the respective insertion point repeats until the line terminates. While segments 1, 2, 3, and 4 decrease in length, 5and 6 increase. Segments after a certain rank become approximately constant in length. Towards the end of the line, the segments increase in length, since the current provided by the TR buffers is large for the remaining portion of the line. Alternatively, the length of the segments can be kept constant across the entire line by adjusting the size of the TR buffers.

There are situations where the use of TR buffers is not efficient. Such a situation occurs when the delay of the line driven only by the IB buffer is similar to the TR buffer delay. This situation implies that by the time the output of the first TR buffer is active, the signal reaches the end of the line. Another situation is when the TR buffer delay is comparable but smaller than the total delay of the line. For example, if the initial line is partitioned into k segments according to (1), the line using the TR buffers will be partitioned into k + p segments with a total line delay of

$$D_{TR} = D_1(k+p).$$
 (3)

From (1) and (3), note that a gain in speed is achieved if

$$D_1 p \le D_B k. \tag{4}$$

A small gain in speed may not be efficient because of the increased power and area. Therefore, only a large gain in speed may be acceptable. However, this tradeoff can be improved and the efficiency of the line driven by the TR buffers increased if  $D_{TR}$  [see (3)] is decreased by reducing the number of line segments p. Reducing the number of line segments can be achieved by reducing the delay of the TR buffers, which can also be achieved by using an internal structure for the TR buffers similar to an HD buffer.

The efficiency of the line driven by the TR buffers can be further improved if the TR buffers incorporate low switching threshold voltages as in an HDR buffer [5]. If these low switching threshold voltage TR buffers drive a highly resistive RC line as shown in Fig. 2, and if the same delay  $D_1$  is maintained for the TR buffer to reach the low switching threshold voltages as for the *IB* buffer to reach the *IB* threshold voltages, the line can be partitioned into less segments, possibly less segments than k. If m is the number of segments into which the line driven by the low threshold voltage TR buffers is partitioned, the total delay of the line is

$$D_{TR} = mD_1, (5)$$

and the gain in delay as compared to a classical repeater insertion process as expressed by (1) is

$$G = (k - m)D_1 + D_B k.$$
 (6)

Multiple tradeoffs exist to improve the response and increase the gain of the TR buffered line, such as using TR buffers with low threshold voltages and increased size of the final stage (as compared with an *IB* buffer) together with less line segments, or a combination of low threshold and normal threshold voltage TR buffers.



Fig. 3. A TR buffer driving a highly resistive RC line

# III. THE TR BUFFER CIRCUIT

Summarizing Section II, the following requirements are necessary for a TR buffer: 1) the input of the TR buffer is connected to the output, 2) the output is driven in the same sense as the input transition as a response to any input transition, sourcing or sinking current on the interconnect line at the insertion point, 3) the output must auto tri-state after a delay from when the output is driven so that the output does not create a conflict with the following signal transition, 4) the buffer should have minimal delay from the input to the output to increase the insertion efficiency, and 5) the buffer should detect the input transitions at low threshold voltages. Each of these requirements are satisfied by the TR buffers shown in Figs. 3, 4, and 5.

A basic configuration of a TR buffer is shown in Fig. 3. Note that the output is tri-stated by a delayed input signal. This simple circuit has the disadvantage that the output is tristated by the input signal, and there is no control on the signal propagation inside the buffer. This disadvantage may activate the output uncontrollably, either for too long a period of time creating a conflict between the output and the next transition, or too short a time, being insufficient for a full output transition. The correct timing is controlled through proper transistor sizing.

The circuits depicted in Figs. 4 and 5 eliminate this drawback by creating the signal that tri-states the output stage of the buffer using a combinatorial circuit between the input and the signals that control the gates of the final transistors. Therefore, the output of the buffer is tri-stated after the output transistors are turned on, eliminating any uncertainty. The latch prevents oscillations induced by the feedback signals from the gate terminals of the final transistors to the input of the combinatorial circuit. The TR buffer circuit shown in Fig. 5 is a modified HDR buffer with low threshold voltages and minimum line loading [5].



Fig. 4. A TR buffer driving a highly resistive RC line that starts the nulling process [6] and tri-states the final stage of the buffer through a combinatorial circuit between the input of the buffer and an internal signal of the buffer

The TR buffer illustrated in Fig. 3 saves power and area as compared to the TR buffers shown in Figs. 4 and 5 at the expense of a more careful (and sensitive) design. The advantage of the TR buffers shown in Figs. 4 and 5 is that these circuits provide increased reliability as compared to the TR buffer shown in Fig. 3.

## **IV. SIMULATION RESULTS**

Circuit simulations based on Cadence-Spectre and a  $1.2 \,\mu m$  CMOS technology are described in this section. A distributed RC line simulated by a II48 model is considered, where  $R = 200 \ \Omega$  and C =20 pF. Approximately 16 inverting repeaters of size  $100 \,\mu\text{m}/300 \,\mu\text{m}$  are inserted according to a classical repeater insertion process [1], generating a total delay for the line of approximately 4 ns, with  $D_1 =$ 0.04 ns and  $D_B = 0.21$  ns. A TR buffer as shown in Fig. 3, with the same thresholds as the inverting buffer driving the same distributed RC line and with an output stage of  $100 \,\mu\text{m}/300 \,\mu\text{m}$ , has a delay from input to output of approximately 0.23 ns. As discussed in Section II, each TR buffer is inserted along the distributed RC line approximately every 0.04 ns.



Fig. 5. A TR buffer driving a highly resistive RC line that is based on a modified HDR buffer with low input threshold voltages and high output current

There are 47 possible insertion points for the  $\Pi 48$  interconnect model used to characterize the distributed *RC* interconnect line. Fourteen TR buffers have been inserted in the first part of the line after the  $\Pi$  number 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, and 28. Six TR buffers have been inserted along the

second part of the line at a  $\Pi 3$  distance between consecutive TR buffers, namely at  $\Pi$  number 31, 34, 37, 40, 43, and 46. The line delay between two consecutive TR buffers is on average, 0.05 ns. The total delay of the line with the inserted TR buffers is approximately 1.2 ns as compared to the standard repeater delay of 4 ns, which is approximately 30% of the delay of the line with the inserted inverter repeaters. If TR buffers are inserted after every  $\Pi$  section, the total line delay is further decreased to approximately 0.9 ns.

This gain in speed can be further increased by using larger size TR buffers as well as a larger size of the IB buffer, together with low threshold voltage TR buffers. Note, however, that by decreasing the delay, the power dissipation and area required to drive the line are increased. However, the speed improvement achieved using TR buffers to drive the highly resistive interconnect line can be spectacular. As a rule of thumb, a large gain in speed is achieved if the delay of the line driven by the *IB* buffer is five to ten times or more the delay of a TR buffer.

# V. CONCLUSIONS

A novel repeater insertion methodology based on the use of transparent repeaters for driving high resistivity RC lines has been presented in this paper. The circuit structure of the transparent repeater together with the associated insertion methodology provides a significant gain in speed. One example exhibits a gain in speed of at least 300% as compared to a classical inverter-like repeater insertion methodology. Additional strategies to further improve the speed are also discussed.

#### References

- V. Adler and E. G. Friedman, "Repeater Design to Reduce Delay and Power in Resistive Interconnect," *IEEE Trans-actions on Circuits and Systems II: Analog and Digital Signal Processing*, Vol. CAS II-45, No. 5, May 1998. V. Adler and E. G. Friedman, "Delay and Power Expres-sions for a CMOS Inverter Driving a Resistive-Capacitive Load," *Analog Integrated Circuits and Signal Processing*, Vol. 14, No. 1/2, pp. 29-39. September 1997 [1]
- [2]Vol. 14, No. 1/2, pp. 29-39, September 1997.
- V. Adler and E. G. Friedman, "Repeater Insertion to Re-duce Delay and Power in *RC* Tree Structures," *Proceed-*[3]
- duce Delay and Power in RC Tree Structures," Proceedings of the Asilomar Conference on Signals, Systems, and Computers, pp. 749-752, November 1997.
  [4] S. Dhar and M. A. Franklin, "Optimum Buffer Circuits for Driving Long Uniform Lines," IEEE Journal of Solid-State Circuits, Vol. SC-26, pp. 151-155, January 1991.
  [5] R. M. Secareanu, V. Adler, and E. G. Friedman, "Exploiting Hysteresis in a CMOS Buffer," Proceedings of the IEEE International Conference on Electronics, Circuits, and Sustems pp. 205-208. Sentember 1999. and Systems, pp. 205–208, September 1999. [6] R. M. Secareanu and E. G. Friedman, "A High Speed
- CMOS Buffer for Driving Large Capacitive Loads in Dig-ital ASICs," Proceedings of the IEEE ASIC Conference, pp. 365-368, September 1998.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. **GLSVLSI 2000 Evanston Illinois USA** 

Copyright ACM 2000 1-58113-251-4/00/04 ... \$5.00