**INFORMATION TO USERS** 

This manuscript has been reproduced from the microfilm master. UMI

films the text directly from the original or copy submitted. Thus, some

thesis and dissertation copies are in typewriter face, while others may be

from any type of computer printer.

The quality of this reproduction is dependent upon the quality of the

copy submitted. Broken or indistinct print, colored or poor quality

illustrations and photographs, print bleedthrough, substandard margins,

and improper alignment can adversely affect reproduction.

In the unlikely event that the author did not send UMI a complete

manuscript and there are missing pages, these will be noted. Also, if

unauthorized copyright material had to be removed, a note will indicate

the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by

sectioning the original, beginning at the upper left-hand corner and

continuing from left to right in equal sections with small overlaps. Each

original is also photographed in one exposure and is included in reduced

form at the back of the book.

Photographs included in the original manuscript have been reproduced

xerographically in this copy. Higher quality 6" x 9" black and white

photographic prints are available for any photographs or illustrations

appearing in this copy for an additional charge. Contact UMI directly to

order.

UMI

A Bell & Howell Information Company 300 North Zeeb Road, Ann Arbor MI 48106-1346 USA 313/761-4700 800/521-0600

#### Repeater Insertion for Driving Resistive Interconnect in CMOS VLSI Circuits

by

#### Victor Adler

Submitted in Partial Fulfillment
of the
Requirements for the Degree
Doctor of Philosophy

Supervised by Professor Eby G. Friedman

Department of Electrical and Computer Engineering
The College
School of Engineering and Applied Sciences

University of Rochester Rochester, New York 1998 UMI Number: 9916575

UMI Microform 9916575 Copyright 1999, by UMI Company. All rights reserved.

This microform edition is protected against unauthorized copying under Title 17, United States Code.

UMI 300 North Zeeb Road Ann Arbor, MI 48103

# Dedication

This work is dedicated to my parents, Carolyn and Eric, and my sister, Sara.

### Curriculum Vitae

The author was born in Burlington, Vermont on February 5, 1970. He attended Duke University, Durham, North Carolina from 1988 to 1992 and graduated cum laude with a Bachelor of Science degree in Electrical Engineering and a Bachelor of Arts in Computer Science in 1992. He received the I.B.M. T. J. Watson Memorial Scholarship from 1988 through 1992. From 1988 to 1992, he worked preprofessionally at IBM Microelectronics, Burlington, Vermont. He came to the University of Rochester in the fall of 1992 and began graduate studies in Electrical Engineering. In 1993, he received his Master of Science degree. In 1997 he performed research with Intel Corporation, Santa Clara, California, and in 1998 with Ultima Interconnect Technologies, Sunnyvale, California. He pursued research in very large scale high performance integrated circuits under the direction of Professor Eby G. Friedman from 1994 to 1998 in the areas of superconductive design methodologies and repeater insertion in CMOS integrated circuits.

## Acknowledgments

The time that I have spent at the University of Rochester has enriched my life greatly. My thanks and acknowledgments to those who have made my stay here educational, challenging, and fun.

First, my greatest thanks to my advisor, Professor Eby G. Friedman. He has guided my research and academic and personal growth with the care that few students are fortunate enough to receive. Beyond his knowledge and ideas about integrated circuit design, he has been the positive support required to complete a PhD. An excellent foil for my pessimism, I am grateful to have had the opportunity to work with him.

I thank Mark Bocko for service on my committee and working with me on my superconductive research. My thanks to David Albonesi for serving on my committee and taking his time out to talk to me about microprocessors and work related to my research. I would also like to thank Adam Frank for service on my committee.

To those who filled everyday of my graduate school career with help, challenge, and distraction in every area, I thank my previous and current officemates: Brian Cherkauer, Jose Neves, Tolga Soyata, Ivan Kourtev, Radu Secareanu, Yehea Ismail, and Tianwen (Kevin) Tang. I also thank my water polo teammates throughout the years for keeping me physically fit.

Finally, I would like to thank my family for their support throughout my life.

I have been able to accomplish what I have because of them.

This work was supported in part by the Army Research Office through Grant # DAAH04-MIP-9423886.

#### Abstract

The progress of CMOS integrated circuit technology has permitted transistors to operate at extremely high speeds. Simultaneously, improvements in technology have enlarged circuit die sizes and the number of transistors, thus increasing the length (hence the delay) of on-chip interconnections. This decrease in transistor delay and increase in interconnect delay has shifted the performance bottleneck of CMOS integrated circuits from transistors to interconnect. The circuit level design strategy of repeater insertion to reduce the delay of on-chip interconnect and improve the performance of leading edge CMOS technologies is described in this dissertation. An inverter-interconnect model and optimization algorithms are presented to provide a repeater insertion methodology to reduce interconnect delay.

As the length of the interconnections increases, the resistance and capacitance of that interconnect increases linearly with length. Thus the RC delay increases quadratically, severely degrading circuit performance. This RC delay can be reduced through the insertion of repeaters along an interconnect line. A CMOS inverter is used as a repeater to reduce this quadratic increase of RC delay.

A repeater-interconnect model based on the short-channel  $\alpha$ -power law transistor model is developed to describe repeater insertion in a resistive interconnect line. A closed form expression describing the number and size of repeaters to in-

sert along an interconnect line is presented. The analytical expression is generally within 10% of SPICE.

The repeater insertion model for RC lines is expanded for the more general purpose of repeater insertion in RC tree structures. Local and global optimization algorithms are presented to insert repeaters into RC trees. Applications of the repeater insertion methodologies can be used to either minimize average delay or achieve a target delay. Repeater insertion methods reduce delay over typical cascaded buffers by 25% to 60% and are accurate to within 10% of SPICE. Expressions to analytically determine dynamic and short-circuit power dissipation of repeaters in RC trees are also presented. Thus, an integrated methodology for repeater insertion composed of circuit models, insertion algorithms, and demonstration examples is presented for application to high performance VLSI circuits.

# Contents

| D  | edica | tion                                               | ii  |
|----|-------|----------------------------------------------------|-----|
| Cı | urric | ulum Vitae                                         | iii |
| A  | cknov | wledgments                                         | iv  |
| A۱ | bstra | et                                                 | vi  |
| Li | st of | Tables                                             | x   |
| Li | st of | Figures                                            | xii |
| 1  | Intr  | roduction                                          | 1   |
| 2  | Mo    | dels of Transistors and Interconnect               | 8   |
|    | 2.1   | High Level Repeater Models                         | 9   |
|    | 2.2   | Interconnect Models                                | 10  |
|    | 2.3   | CMOS Transistor I-V Models                         | 15  |
| 3  | Sho   | rt-Channel Model for a Repeater Driving an RC Load | 19  |
|    | 3.1   | Transient Analysis of an RC Loaded CMOS Inverter   | 19  |
|    |       | 3.1.1 Derivation of Analytical Expressions         | 20  |
|    |       | 3.1.2 Analytical Delay Expressions                 | 24  |
|    |       | 3.1.3 Analysis of Delay Expressions                | 24  |
|    | 3.2   | Power Estimation of a CMOS Inverter                | 26  |
|    |       | 3.2.1 Dynamic Power                                | 27  |
|    |       | 3.2.2 Short-Circuit Power                          | 27  |
|    |       | 3.2.3 Resistive Power Dissipation                  | 33  |
|    | 3.3   | Determining the Parameters $I_{do}$ and $V_{do}$   | 35  |
|    | 3 /   | Conclusions                                        | 36  |

| 4            | -     | peater Design for Optimal Speed and        |                          | 37  |
|--------------|-------|--------------------------------------------|--------------------------|-----|
|              | 4.1   | Expressions for an Inverter Driving an R   |                          | 39  |
|              | 4.2   | Delay of a Repeater Chain Driving an Re    |                          | 41  |
|              | 4.3   | Analytical Delay Model Versus SPICE .      |                          | 46  |
|              | 4.4   | Uniform Repeaters Versus Tapered Buffe     | rs and Tapered-Buffer    |     |
|              |       | Repeaters                                  |                          | 50  |
|              | 4.5   | Power Dissipation in Repeater Chains       |                          | 53  |
|              | 4.6   | Conclusions                                |                          | 56  |
| 5            | Rep   | peater Insertion in RC Trees to Minir      | nize Delay               | 57  |
|              | 5.1   | Local Branch Repeater Insertion Algorit    | hm                       | 58  |
|              | 5.2   | Global Tree Repeater Insertion Algorith    | m                        | 63  |
|              | 5.3   | Effectiveness, Accuracy, and Application   | ns of Repeater Insertion |     |
|              |       | Methodologies                              |                          | 68  |
|              |       | 5.3.1 Applications                         |                          | 68  |
|              |       | 5.3.2 Accuracy and Effectiveness           |                          | 70  |
|              |       | 5.3.3 Comparison of Global Optimization    | on to Exhaustive Search. | 72  |
|              | 5.4   | Power Dissipation of Repeaters in RC Tr    |                          | 73  |
|              | 5.5   | Conclusions                                |                          | 75  |
| 6            | Cor   | nclusions                                  |                          | 80  |
| 7            | Fut   | ure Research                               |                          | 83  |
|              | 7.1   | Model Improvements                         |                          | 83  |
|              |       | 7.1.1 Modeling a Repeater Driving an       | RC Load with a Slow      |     |
|              |       | Ramp Input Signal                          |                          | 84  |
|              |       | 7.1.2 Consideration of Saturation Region   | on                       | 84  |
|              |       | 7.1.3 Improved RC Model                    |                          | 86  |
|              | 7.2   | Optimization Algorithms                    |                          | 87  |
|              |       | 7.2.1 Simulated Annealing                  |                          | 87  |
|              |       | 7.2.2 Dominant Frontier                    |                          | 88  |
|              | 7.3   | Calculation of the Overall Cost of Inserti |                          | 88  |
|              | 7.4   | Development of a CAD Tool                  | <del></del>              | 88  |
|              |       | 7.4.1 Simultaneous Wire Sizing             |                          | 89  |
|              |       | 7.4.2 Including Placement Information      |                          | 89  |
|              |       | 7.4.3 Clock Signal Variations              |                          | 91  |
|              | 7.5   | Conclusions                                |                          | 94  |
| R            | blice | graphy                                     |                          | 95  |
|              |       |                                            |                          |     |
| $\mathbf{A}$ | ppen  | idix Publications                          |                          | 105 |

# List of Tables

| 3.1 | Propagation delay $t_{PD}$ and transition time $t_t$ of a minimum-sized inverter driving an $RC$ load (0.8 $\mu$ m CMOS technology)                                                                                                                                                                               | 25 |
|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.2 | Estimate of short-circuit power dissipated by a CMOS inverter (0.8 $\mu$ m CMOS technology)                                                                                                                                                                                                                       | 31 |
| 3.3 | The resistive power dissipated by a CMOS inverter driving an $RC$ load (0.8 $\mu$ m CMOS technology)                                                                                                                                                                                                              | 34 |
| 4.1 | Per cent error between analytical total delay model (both 50% and 90% output delay) versus SPICE for a given number of repeater stages, a repeater size of ( $W_N=1~\mu\mathrm{m},~W_P=3~\mu\mathrm{m}$ ), and an interconnect load of $R=1~\mathrm{K}\Omega$ and $C=1~\mathrm{pF}$ . (0.8 $\mu\mathrm{m}$ CMOS   |    |
| 4.0 | technology)                                                                                                                                                                                                                                                                                                       | 48 |
| 4.2 | Per cent error between analytical total delay model (both 50% and 90% output delay) versus SPICE for a given number of repeater stages, a repeater size of $(W_N = 3 \mu \text{m}, W_P = 9 \mu \text{m})$ , and an interconnect load of $R = 1 \text{ K}\Omega$ and $C = 1 \text{ pF}$ . (0.8 $\mu \text{m}$ CMOS |    |
|     | technology)                                                                                                                                                                                                                                                                                                       | 49 |
| 4.3 | Per cent error between analytical total delay model (both 50% and 90% output delay) versus SPICE for a given number of repeater stages, a repeater size of $(W_N = 3 \mu m, W_P = 9 \mu m)$ , and an                                                                                                              |    |
|     | interconnect load of $R=3$ K $\Omega$ and $C=3$ pF. (0.8 $\mu$ m CMOS technology)                                                                                                                                                                                                                                 | 50 |
| 4.4 | The 90% output time for optimally sized uniform repeaters, tapered-                                                                                                                                                                                                                                               |    |
|     | buffer repeaters, and tapered buffers for various loads as compared                                                                                                                                                                                                                                               | 52 |

| 5.1 | The size and number of repeaters as determined by the local op-      |    |
|-----|----------------------------------------------------------------------|----|
|     | timization algorithm for three different RC tree topologies. (The    |    |
|     | propagation delay is in nanoseconds, # is the number of repeaters    |    |
|     | in a branch, size is the geometric width of the N-channel device     |    |
|     | of the uniform repeater for that branch, and the P-channel to N-     |    |
|     | channel ratio is 3:1.)                                               | 77 |
| 5.2 | The size and number of repeaters as determined by the global op-     |    |
|     | timization (downhill simplex and simulated annealing) algorithms     |    |
|     | for three different RC tree topologies. (The propagation delay is in |    |
|     | nanoseconds, # is the number of repeaters in a branch, size is the   |    |
|     | geometric width of the N-channel device of the uniform repeater      |    |
|     | for that branch, and the P-channel to N-channel ratio is 3:1.)       | 78 |
| 5.3 |                                                                      |    |
|     | timization algorithm to meet a terminal branch target delay of 2.0   |    |
|     | ns for the given RC tree topologies. (The propagation delay is in    |    |
|     | nanoseconds, # is the number of repeaters in a branch, size is the   |    |
|     | geometric width of the N-channel device of the uniform repeater      |    |
|     | for that branch, and the P-channel to N-channel ratio is 3:1.)       | 79 |
| 5.4 | Repeater insertion as determined by the downhill simplex method      |    |
|     | and an exhaustive search for the RC tree shown in Figure 5.10.       | 79 |

# List of Figures

| 1.1 | A timeline of some of the major events in integrated circuit development from 1920 to 2010                                                 | 2  |
|-----|--------------------------------------------------------------------------------------------------------------------------------------------|----|
| 1.2 | An inverter. (a) The logical symbol for an inverter. (b) The equivalent CMOS transistor diagram.                                           | 5  |
| 2.1 | n equal sized CMOS inverting repeaters driving an $RC$ load                                                                                | 10 |
| 2.2 | A schematic example of RC interconnect lines within an integrated circuit                                                                  | 12 |
| 2.3 | Different models to describe $RC$ interconnect: (a) the distributed model representation of an $RC$ load, (b) lumped load model, (c) $\Pi$ | 14 |
| 2.4 | model (d) T model (e) $\Pi$ -2 model (f) T-2 model (g) $\Pi$ -3 model Coupling capacitance between adjacent interconnect lines             | 15 |
| 2.5 | The characteristic current-voltage transfer curves of an enhance-                                                                          |    |
|     | ment N-channel MOSFET                                                                                                                      | 16 |
| 2.6 | A basic MOSFET with four terminals, gate, drain, source, and bulk (substrate)                                                              | 17 |
| 3.1 | A CMOS inverter driving a large RC load representative of a long interconnect                                                              | 20 |
| 3.2 | Comparison of $V_{DS}$ for a CMOS inverter driving different load resistances $R$ and a constant load capacitance ( $C = 100 \text{ fF}$ ) | 21 |
| 3.3 | Output response of a CMOS inverter driving an RC load                                                                                      | 23 |
| 3.4 | Non-step input driving CMOS inverter stage creates short-circuit power                                                                     | 28 |
| 3.5 | Graphical estimation of short-circuit current (0.8 $\mu$ m CMOS technology)                                                                | 29 |
| 3.6 | Ratio of short-circuit power to total transient power versus inter-<br>connect resistance for varying interconnect capacitance             | 32 |
| 4.1 | A CMOS inverter driving an RC load                                                                                                         | 40 |
| 4.2 | n equal sized CMOS inverting repeaters driving an $RC$ load                                                                                | 42 |

| 4.3        | The analytic and SPICE derived output waveforms of an 11-stage repeater chain driving an evenly distributed $RC$ load of 1 K $\Omega$ and                                                                                                                                                                                    | 42       |
|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
| 4.4        | The 90% output delay time for an interconnect line as a function                                                                                                                                                                                                                                                             | 43       |
|            | of the number of repeaters and repeater width. $(R = 1 \text{ K}\Omega, C = 1 \text{ pF}, 0.8 \ \mu\text{m CMOS technology})$                                                                                                                                                                                                | 46       |
| 4.5        | The analytical and simulated 50% and 90% delay times for a 1 K $\Omega$ and 1 pF load evenly distributed across a number of uniformly sized repeaters.                                                                                                                                                                       | 47       |
| 4.6        | The per cent error of the analytical value of the 50% and 90% output delays versus SPICE for various loads and repeater sizes.                                                                                                                                                                                               | 51       |
| 4.7        | Two methods of driving interconnections with tapered buffers: (a) A single tapered buffer (b) A three stage tapered-buffer repeater system. The first stage is a minimum sized repeater. The tapering factor is e                                                                                                            | 52       |
| 4.8        | Short-circuit current and power dissipated in a four-stage repeater with $W_N = 5 \mu \text{m}$ and $W_P = 15 \mu \text{m}$ , $f = 10 \text{ MHz}$                                                                                                                                                                           | 54       |
| 4.9        | The short-circuit and dynamic power dissipation versus the number of stages in a repeater system. Note the small increase in short-circuit power from nine to ten stages due to the increase in peak current with negligible improvement in transition time                                                                  | 55       |
| 5.1        | An example of an $RC$ tree. Ordered triplets $(i, j, k)$ are used to identify specific branches (note that the downstream nodes are to the right of the upstream nodes)                                                                                                                                                      | 59       |
| 5.2        | n equal sized CMOS inverting repeaters driving a branch in an $RC$                                                                                                                                                                                                                                                           | 59       |
| 5.3        | The total delay for a branch as a function of the number of repeaters and repeater sizes. 0.8 $\mu$ m CMOS technology, $C_{rep} = 0$ , $R = 1 \text{ K}\Omega$ ,                                                                                                                                                             |          |
| 5.4        | and $C = 1$ pF                                                                                                                                                                                                                                                                                                               | 61<br>62 |
| 5.5        | The RC tree shown in Figure 5.1 synthesized by the local branch repeater insertion system. The transistor widths are shown below the first repeater of each branch, and the number of repeaters per                                                                                                                          |          |
| 5.6<br>5.7 | branch is shown inside the last repeater of each branch A methodology for globally optimal repeater insertion The <i>RC</i> tree shown in Figure 5.1 synthesized by the global repeater insertion system. The transistor widths are shown below the first repeater of each branch, and the number of repeaters per branch is | 63<br>64 |
|            | shown inside the last repeater of each branch                                                                                                                                                                                                                                                                                | 66       |

| 5.8  | Two possible solution spaces for a non-convex function. (a) An objective function with nearly equivalent minima. (b) Several outstanding minima among many ordinary minima                                   | 68 |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 5.9  | The delay from the input of the RC tree to specific leaves of the tree based on the repeater insertion system as compared to applying optimally tapered buffers. Triplets indicate the leaf nodes as labeled |    |
|      | in Figure 5.5                                                                                                                                                                                                | 71 |
| 5.10 | A section of the RC tree shown in Figure 5.1 used to compare the                                                                                                                                             |    |
|      | global optimization algorithms versus the exhaustive search                                                                                                                                                  | 74 |
| 7.1  | The N-channel transistor of a CMOS inverter driving a large RC                                                                                                                                               |    |
|      | load representative of a long interconnect. $V_{DS}$ is the output voltage                                                                                                                                   |    |
|      | of the operating transistor of a repeater.                                                                                                                                                                   | 85 |
| 7.2  | The clock distribution network design flow of an integrated circuit                                                                                                                                          |    |
|      | modified to include repeater insertion.                                                                                                                                                                      | 90 |
| 7.3  | A variation in the period of the clock signal T is clock jitter                                                                                                                                              | 91 |
| 7.4  | Clock signal distribution in integrated circuits: (a) Schematic of                                                                                                                                           |    |
|      | the clock distribution network with three clocked elements x, y, z                                                                                                                                           |    |
|      | (b) The variation in clock arrival times between sequentially non-                                                                                                                                           |    |
|      | adjacent registers (x and y) and (x and z) is global clock skew,                                                                                                                                             |    |
|      | between sequentially adjacent registers (y and z) is local clock skew.                                                                                                                                       | 93 |

# Chapter 1

#### Introduction

The miniaturization of transistor technology has so far been able to keep up with the desire for increased speed of operation in microelectronic circuits. Because the die size of integrated circuits has become large as compared to the size of the transistor, the speed roadblock has shifted from the speed of the transistor to the delay through the connections among the transistors [1–3].

In 1947 when Bardeen, Brattain, and Shockley first discovered the transistor [4], not only the world of electronics changed, but the impetus for rapid world change was made. William Shockley soon invented the two types of transistors that would become widely used, the bipolar junction transistor (BJT) and the idea for the field-effect transistor (FET) in 1952 [5–7]. Shockley's concept for the FET was implemented by Dacey and Ross in 1955 [8].

The ubiquitous use of FET technology was delayed for several years due to technological processing roadblocks. By the end of the 1950's, a good process for producing FETs was developed by Atalla, Tannenbaum, and Scheibner [9], and later modified by Hoerni in 1960 [10]. This mature process allowed the mass production of silicon FETs to proceed [11].

Of the few initial scientists and engineers who pioneered the development of

Valley in Northern California. It was there at the end of the 50's, that Noyce of Fairchild Semiconductor produced the first fully integrated monolithic circuit. It should be noted, for historical purposes, that Noyce's claim to be the first to invent the integrated circuit did not go unchallenged. Jack Kilby of Texas Instruments produced a hybrid integrated circuit in which the interconnections were made of gold wire rather than part of a planar process [11, 12]. Although these first circuits were initially made with BJT's, monolithic circuits using MOSFET (Metal-Oxide Semiconductor FETs) were soon produced [11].



Figure 1.1: A timeline of some of the major events in integrated circuit development from 1920 to 2010

In 1962, Frank Wanlass developed the first Complementary MOS (CMOS) inverter. The CMOS inverter has become the ubiquitous circuit technology used today, and is fundamental to the research described in this dissertation. By the end of 1964, J. R. Burns provided further analysis of CMOS technology [13]. The advantages of CMOS were shown to be significantly lower power, ease of design, equal rise and fall times, and the ability to produce pull-up devices without having to fabricate area consuming resistors. A disadvantage of CMOS is that more devices are needed to implement a function as compared to NMOS technology,

and thus more area is required. This aspect was a major disadvantage since physical area on a chip was at a high premium at the time and hindered the development of CMOS until the late 70's when the power consumption of the IC's began to become a significant problem, particularly since the level of integration was approaching many tens of thousands of transistors.

Today CMOS is the predominate technology in digital integrated circuits. Integrated circuits can contain tens of millions of transistors. These devices have become small enough that circuit speed is no longer dictated by the size of the transistors on the chip, but rather by the long interconnections carrying the data and control (e.g., clock) signals from one section of a chip to another [3]. As these signals propagate through the interconnect, the waveform shape degrades. These long interconnects therefore greatly affect both the speed and power of the microelectronic circuits.

The microelectronics industry has demonstrated tremendous gains since the early 60's, but in order to continue the advancement of both microelectronics and related application areas, such as computers, integrated circuits must not only operate fast, but also consume as little power as possible. One strategy to improve both of these criteria can be accomplished by examining the methods by which both data and clock signals are distributed. With the continuously increasing size of integrated circuits, the distance that these signals travel has quickly become a limiting factor for both speed and power. Long wires not only create large capacitive loads but create a non-negligible resistive component, degrading the signal waveform properties.

Although the effects of a capacitive load on CMOS circuits have been studied extensively, the importance of resistance is still being fully recognized due to the

increasing size of integrated circuits and the smaller physical size of the transistor dimensions. These large resistances can occur in the long metal wires in large integrated circuits or in the shorter polysilicon connections. There are additional detrimental effects of large interconnect resistance, resulting in increased delay and power dissipation.

With respect to increased delay, with a linear increase in interconnect length, both the interconnect capacitance (C) and interconnect resistance (R) increase linearly, making the RC delay increase quadratically. Although the RC delay is not a precise measure of the time necessary for a signal to propagate through a wire, the total RC delay of a section of a line may be useful as a figure of merit. In order to increase the operating speed of a integrated circuit, it is necessary to reduce the RC delay.

In addition to increased signal propagation, increased power dissipation is another effect of large interconnect impedance. In addition to the inevitable dynamic switching power that may be dissipated, there is a passive power component dissipated by the resistive interconnect. This resistive power dissipation has gone unstudied, and, depending on the size of the interconnect resistance, can make a significant contribution to the total power dissipated within an IC.

Another contribution to the total power dissipation that has come under recent scrutiny is the power attributed to short-circuit (or cross-over) current [14]. Due to the slower, degraded waveforms (from long interconnections), the N-channel and P-channel transistors in the CMOS circuits switch on and off more slowly, forming a direct DC path from power to ground. This short-circuit power component can be a sizable portion of the total power dissipated by a CMOS-based IC.

The total RC delay of an interconnect line can be reduced drastically with



Figure 1.2: An inverter. (a) The logical symbol for an inverter. (b) The equivalent CMOS transistor diagram.

the insertion of a signal amplifier known as a repeater. In CMOS technology, the simplest form of a repeater is produced from a two transistor inverter. Its symbol is shown in Figure 1.2a and the CMOS equivalent circuit is shown in Figure 1.2b. That is, the multiplicative effect that resistance and capacitance has on RC delay can be reduced with the placement of inverters in appropriate locations along an RC interconnect line, thus increasing the speed of the signal propagation. Repeaters accomplish this effect by breaking up the interconnect line such that the resistive and capacitive components do not become excessively large. For example, assume a long interconnect has 5 units of resistance and 10 units of capacitance. The total RC delay would be 50 units. However, if five repeaters are inserted within this line to break the interconnect into five equal pieces, the RC delay would be  $1 \times 2 + 1 \times 2 + 1 \times 2 + 1 \times 2 = 10$  units. If the delay of the five repeaters is less than 40 units, then there is a speed benefit to inserting CMOS repeaters.

An additional benefit of increasing the signal speed due to the placement of

repeaters is reducing short-circuit power. With faster signal transition times, the time during which a DC path exists between the power supply and ground during which short-circuit current flow is decreased.

In order to achieve desired clock frequencies of 100's of MHz or even speeds greater than a GHz, the problem of resistive interconnect must be overcome. Furthermore, given these target clock frequencies, minimal power must be dissipated in portable applications to increase battery life as well as for heat removal purposes.

The topic of the dissertation is research in circuit techniques, specifically repeaters, to reduce the problem of resistive interconnect in high performance CMOS circuits. Some background on CMOS inverter models, interconnect models, inverters, and repeaters is presented in Chapter 2. The interconnect model used in the repeater insertion algorithm described in the following chapters is also explained.

In Chapter 3, a model for an inverter driving a lumped resistive capacitive load is introduced as the foundation for the repeater analysis and placement found in later chapters. The derivation of the analytical expression for a repeater driving RC interconnect is described and compared to SPICE simulations. Dynamic, short-circuit, and resistive power dissipation in interconnect are also investigated. In addition, the short-channel device parameters of the  $\alpha$ -power law are outlined.

Repeater models that consider optimal speed and power are presented in Chapter 4. The model presented in Chapter 3 is expanded to describe repeater insertion in RC lines. The effectiveness of uniform repeater insertion in RC lines is compared to tapered buffers and tapered buffer repeater insertion. The accuracy of the repeater model is compared to SPICE. Power dissipation in repeater chains is also examined.

A repeater insertion methodology for RC trees rather than lines to minimize (or target) delay and an analysis of the power dissipated in RC trees is described in Chapter 5. Both a local and a global repeater insertion method are described for the tree structure. Applications of both methods and power dissipation characteristics in RC trees are discussed.

Conclusions of the thesis are presented in Chapter 6. And, finally, future work that would further improve these research results is discussed in Chapter 7.

## Chapter 2

# Models of Transistors and Interconnect

In order to determine the optimal placement and size of the repeaters to be inserted in an interconnect line, a physical model must be developed and applied. Before proceeding to this model, it is instructive to examine earlier models. A repeater is essentially a digital amplifier. In CMOS technology, the output of a two transistor inverter with an input signal below some threshold  $V_{IL}$  (input low voltage) will be restored to the supply voltage  $V_{OH}$  (output high voltage). Likewise, if the input is above some threshold voltage  $V_{IH}$ , the output will be forced to ground.

The transistor and inverter models which permit the analysis of the operation of a CMOS repeater are reviewed in this chapter. Although many transistor and inverter models have been proposed in the literature [15–22], most of these models are sufficiently complex such that the analysis of a circuit more complex than a simple inverter is often intractable. Therefore attention is focused on applying those models that provide a high degree of both tractability and accuracy. These transistor and interconnect models are used together to analyze various repeater systems. The very simplest models used for repeater analysis will be presented

first to provide some motivation followed by more accurate models for interconnect and transistors.

A description of high level repeater models is presented in Section 2.1. Some models for interconnect are described in Section 2.2. Finally, two major transistor I-V equations are reviewed in Section 2.3.

#### 2.1 High Level Repeater Models

Before discussing interconnect models and transistor current-voltage (I-V) equations, it is useful to examine some higher level models. If the width of an interconnect line in an integrated circuit is constant, both the resistance and capacitance increase linearly with length  $(l_{int})$ ; therefore, delay is proportional to RC which quadratically increases with length as  $l_{int}^2$ . As previously discussed in Chapter 1, the insertion of repeaters can reduce the quadratic nature of the interconnect delay. A single inverter driving a large RC load  $(R_{total}, C_{total})$  is shown in Figure 2.1a. The same load broken up with n repeaters is shown in Figure 2.1b. A simple characterization of delay of an RC line with repeaters inserted along the line is

$$t_{total} = nt_{rep} + n\frac{t_{int}}{n^2} = nt_{rep} + \frac{t_{int}}{n} \qquad (2.1)$$

where  $t_{total}$  is the delay from the input of the first repeater to the output of the last repeater [3, 23].  $t_{rep}$  is the delay of each repeater and  $t_{int}$  is the total RC delay of the complete interconnect. n is the number of repeaters, where n has been placed both above and below the fraction of  $t_{int}$  to indicate that each of the n pieces of interconnect quadratically reduces the delay.

Equation (2.1) is differentiated and set equal to zero to determine a min/max



$$\begin{array}{c|c} R_{total}/n \\ \hline \\ \hline \\ C_{total}/n \end{array}$$

Figure 2.1: n equal sized CMOS inverting repeaters driving an RC load.

value of  $t_{total}$ . The optimal value for n is

$$n = \sqrt{\frac{t_{int}}{t_{rep}}} . {(2.2)}$$

This equation implies that the total delay through a long interconnection with repeaters is minimized when the delay of each repeater is equivalent to the delay of each section of interconnect. Frequently,  $t_{rep}$  is modeled as a resistance (the output resistance of the inverter) and capacitance (the gate capacitance of the inverter). This model may be acceptable for coarse timing information, but more precise knowledge is required for a more accurate placement of the repeaters.

#### 2.2 Interconnect Models

In integrated circuits, the term interconnect refers to the wire that connects various points within the chip die. An example of where RC interconnect occurs on a chip is shown in Figure 2.2. In this example, long resistive interconnect commonly occurs in the clock distribution network that synchronously drives each

block, and resistive interconnect also are common in the global data lines between blocks. In addition, there can be significant resistive interconnect within each block.

In a CMOS technology, different materials are used as wires depending on the type of interconnection. For instance, polysilicon can be used as an interconnection material. Polysilicon is the material used to define the gate electrode of a CMOS transistor and is characterized by a high resistance (approximately 40 ohms per square) and capacitance, so its use as a wire is typically limited. It is convenient to use polysilicon when connecting the gates of transistors with the same input because only a single wire is required. The same gate connection with metal requires much more area, thus the added area of using metal for a local connection often outweighs the reduced line impedance. Furthermore, situations arise where local metal wire can not be used due to the density of a circuit, thus polysilicon would be used as the interconnection material. These cases represent examples of where a high impedance interconnection might be used.

When long interconnections are required, such as global connections between two large functional blocks, some type of metal wire with low resistance and capacitance may be preferable. Polysilicon is typically avoided because the convenience of local connections described above does not exist, and the RC characteristics of polysilicon would significantly degrade the signal shape. Although metal has a low impedance (typically less than 0.1 ohms per square), the length of a inetal wire may still produce a large RC delay. In either case, whether the interconnect is intrablock or interblock wiring, resistive interconnect poses a problem.

There are several ways to model resistive interconnect. A wire could be modeled as a transmission line. The characteristics of a transmission line model are



Figure 2.2: A schematic example of RC interconnect lines within an integrated circuit.

described in [24, 25] and are appropriate when analyzing circuits operating at sufficiently high frequencies where the transition time of the signals is comparable to or less than the time of flight down a wire. A transmission line model is both highly complex and difficult to use in conjunction with a large-signal transistor I-V model, making intractable the development of closed form solutions of a repeater system. Furthermore, as is shown in [26], simpler interconnect models can provide sufficient accuracy while remaining useful at current operating frequencies of large integrated circuits (e.g., greater than 300 MHz). The point at which a resistive model will no longer be useful depends upon the geometry of the driven interconnect. Inductance will need to be considered in extremely long and wide

lines with low resistance. Thus, a hard limit on frequency at which a transmission line model is necessary can not be given but is well above 500 MHz.

A discrete element model to describe the distributed nature of a resistive-capacitive interconnect load is presented in [26]. A small resistor-capacitor network can be used to analyze a distributed load with small error. The number and placement of the resistors and capacitors depends upon the ratio of the interconnect resistance and capacitance to the transistor output resistance and input capacitance, respectively. Some examples of RC networks used to model distributed RC loads are shown in Figure 2.3. The symbol for a distributed RC load (one with infinite sections) is shown in Figure 2.3a. A lumped RC load is shown in Figure 2.3b. RC models of increasing complexity and accuracy, approaching a distributed RC model, are shown in Figures 2.3c through 2.3g.

Another source of impedance in interconnect is coupling capacitance. Coupling capacitance occurs when two or more interconnect lines run adjacent to each other over some distance [27, 28] or are closely spaced with respect to each other [29]. A schematic description of coupling capacitance between adjacent lines is shown in Figure 2.4. The magnitude of the coupling capacitance or signal noise is determined by the following: the area of adjacency, hence the length of adjacency; the distance between the interconnect lines; and the direction of the adjacent signals. Inserting repeaters can drastically reduce the length of adjacency between two interconnect lines. Thus, decreased coupling capacitance is an added benefit of inserting repeaters. Although overcoming coupling capacitance is not a topic of research that is specifically discussed in this dissertation, this benefit of repeater insertion should be noted.

The analysis used in the following chapters further simplifies these intercon-



Figure 2.3: Different models to describe RC interconnect: (a) the distributed model representation of an RC load, (b) lumped load model, (c)  $\Pi$  model (d) T model (e)  $\Pi$ -2 model (f) T-2 model (g)  $\Pi$ -3 model

nect models. The interconnect is considered to be a lumped element, *i.e.*, a single resistor and capacitor. Although this model may create more error than a more complex model, it yields sufficiently accurate results as compared to similar models in SPICE [30]. A lumped load is generally a pessimistic approximation of a distributed load because in a lumped load, all of the interconnect resistance "sees" or is upstream to all of the interconnect capacitance. Practically, only the initial incremental portion of the interconnect,  $\Delta R$ , sees all of the downstream interconnect capacitance, with each successive  $\Delta R$  seeing less capacitance while moving down the interconnect line. For instance, in Figure 2.3g,  $\frac{1}{6}C$  is upstream to all of the resistance,  $\frac{1}{3}C$  is upstream of  $\frac{2}{3}R$ , and so on.

•



Figure 2.4: Coupling capacitance between adjacent interconnect lines.

#### 2.3 CMOS Transistor I-V Models

In order to more accurately model a repeater system, I-V equations that describe the operation of a transistor are required. In 1968, the Shichman-Hodges transistor I-V equations were published [31] which were based on Shockley's original I-V model [6]. Three regions of operation are described: 1) cutoff, 2) linear, and 3) saturation. The operation of a transistor in these three regions is presented here with reference to Figures 2.5 and 2.6. The analysis that follows is for an N-channel silicon MOSFET with the source node connected to ground.

When the input voltage at the gate  $V_{GS}$  is less than some threshold voltage  $V_T$ , the transistor is said to be cutoff, and no drain current can flow. That is, other than a very small leakage current, no current flows from the source to the drain, so  $I_{DS} \approx 0$ .

Once the gate voltage increases above  $V_T$ , the region in which the transistor operates is determined by the relative values of the drain-to-source voltage  $V_{DS}$  and the gate-to-source voltage  $V_{GS}$ . If  $V_{DS} < V_{GS} - V_T$ , the transistor operates in the linear region and the drain current is described by the expression,  $I_{DS} = K(V_{GS} - V_T)V_{DS} - \frac{V_{DS}^2}{2}$ . If  $V_{DS} > V_{GS} - V_T$ , the transistor operates in the saturation region, and  $I_{DS} = \frac{K}{2}(V_{GS} - V_T)^2$  [32]. Summarizing,



Figure 2.5: The characteristic current-voltage transfer curves of an enhancement N-channel MOSFET

$$I_{DS} = \begin{cases} 0 & (V_{GS} \le V_T & : \text{ cutoff region}) \\ K(V_{GS} - V_T)V_{DS} - \frac{V_{DS}^2}{2} & (V_{GS} > V_T, V_{DS} < V_{GS} - V_T & : \text{ linear region}) \\ \frac{K}{2}(V_{GS} - V_T)^2 & (V_{GS} > V_T, V_{DS} > V_{GS} - V_T & : \text{ saturation region}) \end{cases}$$
(2.3)

There are a number of inaccuracies in the Shichman-Hodges equations. Two important inaccuracies are these equations 1) only work well for long-channel FETs and 2) become intractable when analyzing RC loaded inverters if no simplifying assumptions are made.

Sakurai has developed I-V equations for describing the behavior of a short-channel CMOS transistor [33]. The Sakurai  $\alpha$ -power law model overcomes some of the problems associated with the Shichman-Hodges model. First, the  $\alpha$ -power law model takes into account the important short-channel effect of velocity saturation. When a short-channel MOSFET operates in the saturation region, the drain current is no longer proportional to the square of the gate-to-source voltage  $V_{GS}$  due to the effects of velocity saturation. Velocity saturation occurs because



Figure 2.6: A basic MOSFET with four terminals, gate, drain, source, and bulk (substrate).

the electric field of a short-channel transistor is sufficiently great such that the current carriers are unable to travel from the source to the drain due to collisions with the crystal lattice of the semiconductor. Therefore, an increase in the gate voltage only increases the number of carriers and not the velocity of the carriers, so the current no longer increases quadratically with the effective gate voltage  $V_{GS} - V_T$  [34]. Secondly, parasitic drain and source resistances within the transistor must be taken into consideration. Lastly, the  $\alpha$ -power law provides fairly simple expressions that can be used to analyze the behavior of nonlinear digital circuits. The I-V equations for the  $\alpha$ -power law are:

$$I_{DS} = \begin{cases} 0 & (V_{GS} \leq V_T : \text{cutoff region}) \\ (I'_{D0}/V'_{D0})V_{DS} & (V_{DS} < V'_{D0} : \text{linear region}) \\ I'_{D0} & (V_{DS} \geq V'_{D0} : \text{saturation region}) \end{cases}$$
(2.4)

where

$$I'_{D0} = I_{D0} \left( \frac{V_{GS} - V_T}{V_{DD} - V_T} \right)^{\alpha} \tag{2.5}$$

$$V'_{D0} = V_{D0} \left( \frac{V_{GS} - V_T}{V_{DD} - V_T} \right)^{\alpha/2} (2.6)$$

In the  $\alpha$ -power law model,  $I_{do}$  represents the drive current of the MOS device,  $V_{do}$  represents the drain-to-source voltage at which velocity saturation occurs, and  $\alpha$  models the process dependent degree to which velocity saturation affects the drain-to-source current.

With accurate and general transistor and interconnect models, a repeater model that describes CMOS devices driving long resistive interconnect is developed in Chapter 3. This repeater model is expanded to explore the speed and power advantages of inserting repeater chains in interconnect in the following chapters.

# Chapter 3

# Short-Channel Model for a Repeater Driving an RC Load

In this chapter a foundation for modeling an RC loaded repeater is presented. The development and analysis of an analytical expression describing a short-channel CMOS inverter driving a resistive interconnect is presented in Section 3.1. In Section 3.2, the power dissipated by this system is investigated. Short-circuit power is emphasized in this section since this power component has been given little attention until recently and can represent a significant contribution to the total power dissipation, important in today's portable applications. Finally, a procedure for determining the circuit parameter values used in the  $\alpha$ -power law I-V model is provided in Section 3.3.

# 3.1 Transient Analysis of an RC Loaded CMOS Inverter

An analytical expression describing the behavior of an inverter driving a lumped RC load based on Sakurai's  $\alpha$ -power law model is presented in [33]. A diagram of this circuit is shown in Figure 3.1. In subsection 3.1.1, the device model is described and an analytical expression for the output voltage or a transition time to

reach  $V_{out}$  is derived. In subsection 3.1.2, several expressions that characterize the temporal properties of the circuit are presented. In subsection 3.1.3, some results of the analytical expressions are presented along with comparisons to SPICE.



Figure 3.1: A CMOS inverter driving a large RC load representative of a long interconnect

# 3.1.1 Derivation of Analytical Expressions

The  $\alpha$ -power law model [33] accurately describes the effects of short-channel behavior, such as velocity saturation, while providing a tractable equation. The linear region form of the  $\alpha$ -power model is used to characterize the I-V behavior of the ON transistor sourcing or sinking an RC load since a large portion of the circuit operation occurs within this region under the assumption of a step or fast ramp input signal. When the input to the inverter is a unit step or fast ramp,  $V_{out}$  is initially larger than  $V_{GS} - V_T$  for a shorter period of time than if the input to the inverter is a slow ramp. Therefore, the circuit operates in the linear region for a greater portion of the total transition time for a large RC load, particularly for

large load resistances. When the load resistance is large, a large IR voltage drop occurs across the load resistor once the capacitor begins to discharge, thus  $V_{DS}$  is nearly immediately less than  $V_{GS} - V_T$ , as shown in Figure 3.2. The N-channel device operates in the linear region once the step input goes high when driving large RC loads. Note however if the input waveform increases slowly or the load impedance is small, the inverter operates in the saturation region for a longer time before switching into the linear region. In addition, the  $\alpha$ -power law model is less accurate with slow input waveforms.



Figure 3.2: Comparison of  $V_{DS}$  for a CMOS inverter driving different load resistances R and a constant load capacitance (C = 100 fF)

Only the falling output (rising input) waveform is considered in this chapter. The following analysis, however, is equally applicable to a rising output (failing input) waveform. The lumped load is modeled as a resistor in series with a capacitor. The current through the output load capacitance is the same magnitude and opposite sign as the N-channel drain current (the P-channel current is ignored

under the assumption of a step or fast ramp input). The capacitive current is

$$i_C = C \frac{dV_{out}}{dt} = -i_d, (3.1)$$

where C is the output capacitance,  $V_{out}$  is the voltage across the capacitance C,  $i_C$  is the current discharged from the capacitor, and  $i_d$  is the drain current through the N-channel device.

The N-channel linear drain current is given by [33]

$$-C\frac{dV_{out}}{dt} = i_d = \frac{I_{do}}{V_{do}} \left(\frac{V_{GS} - V_T}{V_{DD} - V_T}\right)^{\alpha} V_{ds}, \quad \text{for } V_{GS} \ge V_T, V_{GS} - V_T \ge V_{DS}.$$
(3.2)

In the  $\alpha$ -power law model,  $I_{do}$  represents the drive current of the MOS device and is proportional to W/L,  $V_{do}$  represents the drain-to-source voltage at which velocity saturation occurs with  $V_{GS} = V_{DD}$  and is a process dependent constant, and  $\alpha$  models the process dependent degree to which velocity saturation affects the drain-to-source current.  $\alpha$  is within the range  $1 \le \alpha \le 2$  where  $\alpha = 1$  corresponds to a device operating strongly under velocity saturation, while  $\alpha = 2$  represents a device with negligible velocity saturation.  $V_{DD}$  is the supply voltage, and  $V_T$  is the MOS threshold voltage (where  $V_{TN}$  ( $V_{TP}$ ) is the N-channel (P-channel) threshold voltage). An empirical method to determine technology specific values for  $I_{do}$  and  $V_{do}$  is described in Section 3.3.

Assuming a unit step input is applied to the circuit shown in Figure 3.1,  $V_{out}$  can be derived from (3.2). The equation, rewritten in Laplace form, is

$$SCV_{out} + SU_{do}RCV_{out} + U_{do}V_{out} = CV_{out}(0) + U_{do}RCV_{out}(0), \qquad (3.3)$$

where  $\mho_{do} = \frac{I_{do}}{V_{do}}$  is the saturation conductance.

Equation (3.3) yields

$$V_{out}(t) = V_{out}(0)e^{\frac{-U_{do}}{U_{do}RC+C}t} (3.4)$$

Graphs of  $V_{out}(t)$  for a wide range of resistances and capacitances (within practical limits) driven by a minimum-sized inverter with balanced rise and fall times are shown in Figure 3.3. The analytical expression shown in (3.4) closely approximates SPICE for most of the region of operation for a wide range of load impedances from 10  $\Omega$  to 1000  $\Omega$  and from 10 fF to 1 pF. The maximum error of the output response derived from (3.4) as compared with SPICE (shown in Figure 3.3) is 25% for the specific case where the RC load is 10  $\Omega$  and 10 fF, approaching the unloaded case. As a means of comparison, the capacitance of metal interconnect in a 0.8  $\mu$ m technology is approximately .02 fF/ $\mu$ m<sup>2</sup> with polysilicon being about three times as capacitive. The resistance of metal in the same technology is approximately 0.08 ohms per square with polysilicon being about 10 ohms per square.



Figure 3.3: Output response of a CMOS inverter driving an RC load

### 3.1.2 Analytical Delay Expressions

From (3.4), the propagation delay of a CMOS inverter calculated at the 50% point  $t_{PD}$  is

$$t_{PD} = .693 \frac{C + \mathcal{V}_{do}RC}{\mathcal{V}_{do}} \tag{3.5}$$

The transition time of a CMOS inverter driving a lumped RC load calculated at the 90% point  $t_t$  is

$$t_t = 2.3 \frac{C + \mathcal{V}_{do}RC}{\mathcal{V}_{do}} \qquad . \tag{3.6}$$

Additional delay expressions that are used in section 3.2.2 for determining the short-circuit power are

$$t_{V_{TN}} = \ln\left(\frac{V_{TN}}{V_{DD}}\right) \frac{C + \mathcal{V}_{do}RC}{\mathcal{V}_{do}}$$
(3.7)

and

$$t_{V_{TP}} = \ln\left(\frac{V_{DD} + V_{TP}}{V_{DD}}\right) \frac{C + \mathcal{O}_{do}RC}{\mathcal{O}_{do}} \qquad (3.8)$$

These equations describe the time for the output voltage to change by a threshold voltage from either ground or  $V_{DD}$  for an N-channel or P-channel device, respectively. Note that  $V_{TP}$  is negative.

## 3.1.3 Analysis of Delay Expressions

The accuracy of the analytic model as compared with SPICE is tabulated in Table 3.1 for a wide range of output load resistances and capacitances. The interconnect resistance and capacitance are described in the first two columns of Table 3.1, respectively. The transition times determined by the analytical expression and by SPICE are shown in the third and fourth columns, respectively. The propagation delay times determined by the analytical expression and by SPICE

| verter driving an 1to load (0.5 \(\mu\)in Olylos technology) |             |          |        |          |        |         |          |  |
|--------------------------------------------------------------|-------------|----------|--------|----------|--------|---------|----------|--|
| Load                                                         | Load        | $t_t$    |        | $t_{PD}$ |        | % Error |          |  |
| Resistance                                                   | Capacitance | Analytic | SPICE  | Analytic | SPICE  | $t_t$   | $t_{PD}$ |  |
| 10 Ω                                                         | .01 pF      | 21 ps    | 22 ps  | 6.5 ps   | 8.7 ps | 4%      | 25%      |  |
| 10 Ω                                                         | .1 pF       | 215 ps   | 176 ps | 65 ps    | 70 ps  | 22%     | 7%       |  |
| 10 Ω                                                         | 1 pF        | 2 2 ns   | 1.7 ns | 649 ps   | 680 ps | 27%     | 4%       |  |
| 100 Ω                                                        | .01 pF      | 24 ps    | 22 ps  | 7.2 ps   | 8.8 ps | 6%      | 19%      |  |
| 100 Ω                                                        | .1 pF       | 235 ps   | 187 ps | 71 ps    | 73 ps  | 25%     | 2%       |  |
| 100 Ω                                                        | 1 pF        | 2.4 ns   | 1.9 ns | 712 ps   | 711 ps | 25%     | 0%       |  |
| 1000 Ω                                                       | .01 pF      | 44 ps    | 39 ps  | 13 ps    | 13 ps  | 13%     | 0%       |  |
| 1000 Ω                                                       | .1 pF       | 444 ps   | 365 ps | 133 ps   | 115 ps | 22%     | 16%      |  |
| 1000 Ω                                                       | 1 pF        | 4.4 ns   | 3.6 ns | 1.3ns    | 1.1 ns | 22%     | 18%      |  |

Table 3.1: Propagation delay  $t_{PD}$  and transition time  $t_t$  of a minimum-sized inverter driving an RC load (0.8  $\mu$ m CMOS technology)

are listed in columns five and six, respectively. The error of the analytical expressions versus SPICE for the transition time and propagation delay is shown in the final two columns. A 0.8  $\mu$ m CMOS technology is assumed. Note that the maximum error of the transition time  $t_t$  as compared with SPICE is 27%, and the maximum error of the propagation delay  $t_{PD}$  as compared with SPICE is 25%.

As noted above, (3.5) and (3.6) can be used to estimate the propagation delay and transition time of a CMOS inverter driving a resistive-capacitive interconnect line. Since the shape of the output waveform is now known, (3.7) and (3.8) can also be used with (3.6) to estimate the short-circuit power dissipation of a CMOS gate loading the high impedance interconnect line, as is described in Section 3.2.

The maximum error for the transition time for RC loads ranging from 10  $\Omega$  to 1000  $\Omega$  and 10 fF to 1 pF and for two different short-channel CMOS technologies (0.8  $\mu$ m and 1.2  $\mu$ m CMOS) is 27%. The maximum error for the propagation delay is 25% over the same ranges and technologies. As the capacitance increases to 1 pF, the error of the propagation delay generally decreases to less than 20%. A

similar decrease occurs for the transition time. Furthermore, both errors generally decrease with increasing load resistance.

The improved accuracy with increasing load resistance and capacitance is due to the RC load dominating the device parasitic impedances, specifically, the source and drain capacitance, thereby improving the accuracy of the transistor I-V model for large RC loads. These device parasitic impedances are not included in the I-V model described in (3.2) but are considered by SPICE. This behavior also explains why the accuracy improves as the geometric size of the transistors becomes smaller, making the parasitic device resistances and capacitances smaller. Thus, these expressions for the propagation delay and transition time of a CMOS inverter driving an RC load become more accurate for higher RC loads and more aggressive sub-micrometer technologies, the regime of greatest interest.

# 3.2 Power Estimation of a CMOS Inverter

Power consumption has become one of the premier issues in the design of VLSI circuits. There are two primary contributions to the total transient power dissipated by a CMOS inverter, dynamic power dissipation and short-circuit power dissipation [14, 35–39]. The short-circuit power is often neglected since the dynamic power is assumed to be dominant. As described below and in [14, 35–39], the magnitude of the short-circuit power is load dependent, and it is shown in this chapter that short-circuit power can be a significant portion of the total transient power dissipation.

Dynamic power is briefly discussed in subsection 3.2.1. In subsection 3.2.2, an analysis of short-circuit power is presented, and a closed-form model is proposed. In subsection 3.2.3, the power dissipated by the lossy resistive element of the

RC load is discussed and modeled. Finally, some concluding remarks pertaining specifically to estimating the power of an RC loaded CMOS inverter are offered.

### 3.2.1 Dynamic Power

Dynamic power is due to the energy required to charge and discharge a load capacitance C and is characterized by the familiar equation,  $CV^2f$ , where V is the source voltage and f is the switching frequency. The dynamic power is independent of the load resistance. For example, the dynamic power dissipation of a single CMOS inverter driving an RC load ranges from 35  $\mu$ W to 125  $\mu$ W for capacitive loads ranging from 0.3 pF to 1 pF and assuming a 5 volt power supply with the inverter switching at 10 MHz.

### 3.2.2 Short-Circuit Power

In this subsection an expression for modeling the short-circuit power in a CMOS inverter is presented. This expression is also analyzed and compared to SPICE. Also in this subsection, a comparison of the short-circuit power to the total transient power dissipation as a function of load resistance is presented.

### Analytic Expression of Short-Circuit Power

The logic stage following a large RC load may dissipate significant amounts of short-circuit power due to the degraded waveform originating from the CMOS inverter driving an RC load (see Figure 3.4). During the region where the input signal is transitioning between  $V_{TN}$  and  $V_{DD} + V_{TP}$ , a DC current path exists between  $V_{DD}$  and ground. The excess current dissipated during this region is called the short-circuit (or crossover) current [14]. Short-circuit current occurs



Figure 3.4: Non-step input driving CMOS inverter stage creates short-circuit power

due to a slow input transition, and for a balanced inverter, the peak current occurs near the middle of the input transition. An example of short-circuit current is shown by the solid line in the lower graph of Figure 3.5, *i.e.*, the SPICE-derived data.

The total short-circuit current  $I_{SC}$  can be estimated by modeling  $I_{SC}$  as a triangle. Therefore, the integral of  $I_{SC}$  is the area of a triangle,  $\frac{1}{2}base \times height$ . In terms of the short-circuit current, the height can be modeled as  $I_{peak}$  and the base can be modeled as  $t_{base}$  (see Figure 3.5).  $I_{peak}$  is the maximum saturation current of the load transistor and depends on both  $V_{GS}$  and  $V_{DS}$ , therefore  $I_{peak}$  is both input waveform and load dependent.  $t_{base}$  is the time during which both the P-channel and the N-channel transistors are turned on, permitting a DC current path to exist between  $V_{DD}$  and ground. This time occurs over the region,  $V_{TN} \leq V_{in} \leq V_{DD} + V_{TP}$ . Therefore,  $t_{base}$  is found from the difference between (3.7) and (3.8),  $|(t_{V_{TP}} - t_{V_{TN}})|$ . The area defined by this triangle is  $\frac{1}{2}I_{peak} \times t_{base}$ , which models the total short-circuit current  $I_{SC}$  sourced by a CMOS inverter due to a non-step input [38].



Figure 3.5: Graphical estimation of short-circuit current (0.8  $\mu$ m CMOS technology)

The total short-circuit current multiplied by f and  $V_{DD}$  is the short-circuit power. The short-circuit power dissipation  $P_{SC}$  of the following stage for one transition (either rising or falling edge) can therefore be approximated by

$$P_{SC} = \frac{1}{2} I_{peak} t_{base} V_{DD} f. \tag{3.9}$$

Subtracting (3.7) from (3.8) forms the logarithmic quotient,

$$t_{base} = \left| \ln \left( \frac{V_{TN}}{V_{DD} + V_{TP}} \right) \right| \frac{C + \mathcal{O}_{do}RC}{\mathcal{O}_{do}}. \tag{3.10}$$

By inserting this expression for  $t_{base}$  into (3.9), the short-circuit power dissipation  $P_{SC}$  of a CMOS inverter following a lumped RC load over both the rising

and falling transitions is

$$P_{SC} = \left| \ln\left(\frac{V_{TN}}{V_{DD} + V_{TP}}\right) \right| \frac{C + \mathcal{V}_{do}RC}{\mathcal{V}_{do}} I_{peak} f V_{DD} \qquad (3.11)$$

#### Analysis of the Expression for Short-Circuit Power Dissipation

The short-circuit power derived from (3.11) for a wide range of RC loads between the CMOS inverter stages shown in Figure 3.4 is compared with SPICE in Table 3.2. The RC load of the driving inverter is described in the first two columns of Table 3.2. The short-circuit power predicted by (3.11) and derived from SPICE is shown in the third and fourth columns, respectively. The per cent error between the analytical expression and SPICE is shown in the final column.

For smaller RC loads, hence, faster transition times, there is negligible short-circuit power since a direct path from the power supply to ground does not exist for any significant time. The short-circuit power becomes non-negligible when larger interconnect loads between the two CMOS stages cause a transition time of significant magnitude e.g., a  $t_t$  greater than 0.5 ns for a 0.8  $\mu$ m CMOS inverter. At this borderline value, the analytical  $P_{SC}$  differs from SPICE by a maximum of 41%. As the RC load and transition time increase, the analytical model more closely predicts the short-circuit current derived from SPICE. For RC delays exceeding 0.1 ns, errors less than 15% are attained. Furthermore, the short-circuit power becomes a significant portion of the total power dissipation when the CMOS inverter is loaded by larger RC loads, creating long transition times. It is this condition that is of greatest interest when considering short-circuit power in resistively loaded CMOS inverters.

The error of the analytical expression for  $P_{SC}$  can be bounded by the RC

Table 3.2: Estimate of short-circuit power dissipated by a CMOS inverter (0.8  $\mu$ m CMOS technology)

| Interconnect | Interconnect | Pow      | er (µW)                                    | % Error |
|--------------|--------------|----------|--------------------------------------------|---------|
| Resistance   | Capacitance  | f = 10MF | $f = 10 \text{MHz}, V_{DD} = 5.0 \text{V}$ |         |
|              |              | Analytic | SPICE                                      |         |
| 10 Ω         | .3 pF        | 1.4      | .99                                        | 41%     |
| 10 Ω         | .5 pF        | 3.9      | 3.22                                       | 21%     |
| 10 Ω         | 1 pF         | 12.4     | 11.1                                       | 12%     |
| 100 Ω        | .3 pF        | 1.71     | 1.23                                       | 39%     |
| 100 Ω        | .5 pF        | 4.68     | 3.83                                       | 22%     |
| 100 Ω        | 1 pF         | 13.8     | 12.7                                       | 9%      |
| 1000 Ω       | .3 pF        | 5.85     | 5.2                                        | 12%     |
| 1000 Ω       | .5 pF        | 13.0     | 12.2                                       | 7%      |
| 1000 Ω       | 1 pF         | 34.2     | 33.8                                       | 1%      |

time constant describing the interconnect load impedance. For a 0.8  $\mu$ m CMOS technology, the per cent error is less than 15% for an RC time constant more than 0.1 ns. For an RC time constant less than 0.1 ns, the per cent error increases to approximately 40%.

One source of error in estimating the short-circuit power derived from (3.9) can be found by examining the transition time. The analytical solution to the transition time, (3.6), generally yields pessimistic results when compared to SPICE (see Table 3.1). By inserting these pessimistic transition times into (3.9), the resulting short-circuit power is also pessimistic, as demonstrated in Table 3.2.

Another source of error is caused by signal undershoot of fast transient waveforms. This parasitic-induced undershoot may increase  $V_{DS}$  above  $V_{DD}$  or below ground. This undershoot occurs early during the transition time and causes current to flow opposite to the expected direction, thereby reducing the total shortcircuit current. This behavior, in turn, reduces the total short-circuit power, increasing the discrepancy between SPICE and (3.11), which does not consider transient undershoot. The phenomenon of signal undershoot, where the current is negative, can be seen in Figure 3.5.

#### Short-Circuit Power as Compared to the Total Transient Power

For a given supply voltage and frequency, dynamic power dissipation depends only on the load capacitance and does not depend on the input waveform shape or load resistance. In contrast, the short-circuit power dissipation changes with both input waveform shape and output load resistance and capacitance. The ratio of the short-circuit power to the total transient power (the sum of the dynamic and short-circuit power) of a CMOS inverter with respect to the load resistance R for a given load capacitance C is shown in Figure 3.6. Note that with increasing load resistance, the short-circuit power dissipation cannot be neglected, since, as shown in Figure 3.6, it can comprise more than 20% of the total transient power dissipation.



Figure 3.6: Ratio of short-circuit power to total transient power versus interconnect resistance for varying interconnect capacitance

### 3.2.3 Resistive Power Dissipation

In resistive interconnect, power is not only dissipated when charging and discharging the load capacitance, but power is also dissipated by the load resistance. This power dissipation can be quantified by  $f \int_t (i^2 R)$ , where i is the current through the load resistance and f is the frequency of operation. The identical current that is discharged by the capacitor flows through the resistor. This capacitive current is  $I_C = C \frac{dV_{out}}{dt}$ . Therefore, by taking the derivative of (3.4), the instantaneous current through a resistive load  $i_R(t)$  is given by

$$i_R(t) = \frac{-\mathcal{O}_{do}}{1 + \mathcal{O}_{do}R} V_{out}(0) e^{\frac{-\mathcal{O}_{do}}{\mathcal{O}_{do}RC + C}t} \qquad , \tag{3.12}$$

and the average resistive power dissipation is given by

$$P_R = f \int_0^t \left( \frac{-\mathcal{O}_{do}}{1 + \mathcal{O}_{do}R} V_{out}(0) e^{\frac{-\mathcal{O}_{do}}{\mathcal{O}_{do}RC + C}t} \right)^2 R dt \qquad (3.13)$$

After integration, (3.13) becomes

$$P_R = \frac{fRC\mho_{do}V_{out}^2(0)}{2(1 + \mho_{do}R)} (1 - e^{\frac{-\mho_{do}}{\mho_{do}RC + C}2t}) \quad . \tag{3.14}$$

The resistive power dissipated for different RC loads calculated from (3.14) is shown in Table 3.3. The load resistance R and capacitance C are listed in the first two columns, respectively. The power dissipated by the interconnect resistance determined from (3.14) and from SPICE are shown in the third and fourth columns, respectively. The per cent error of the analytic expression as compared to SPICE is shown in the final column. Note that the per cent error is less than 15% and typically less than 6%.

An expression for estimating the dynamic, short-circuit, and resistive power in CMOS inverter chains has been presented. For RC loads greater than .1 ns

Table 3.3: The resistive power dissipated by a CMOS inverter driving an RC load (0.8  $\mu$ m CMOS technology)

| Load       | Load        | Pow      | rer (µW)                                   | % Error |
|------------|-------------|----------|--------------------------------------------|---------|
| Resistance | Capacitance | f = 10MF | $f = 10 \text{MHz}, V_{DD} = 5.0 \text{V}$ |         |
|            |             | Analytic | SPICE                                      |         |
| 10 Ω       | .01 pF      | .0137    | .0135                                      | 1%      |
| 10 Ω       | .1 pF       | .137     | .139                                       | 1%      |
| 10 Ω       | 1 pF        | 1.37     | 1.39                                       | 1%      |
| 100 Ω      | .01 pF      | .125     | .118                                       | 6%      |
| 100 Ω      | .1 pF       | 1.25     | 1.29                                       | 3%      |
| 100 Ω      | 1 pF        | 12.5     | 13.1                                       | 5%      |
| 1000 Ω     | .01 pF      | .658     | .703                                       | 6%      |
| 1000 Ω     | .1 pF       | 6.58     | 7.61                                       | 13%     |
| 1000 Ω     | 1 pF        | 65.8     | 76.8                                       | 14%     |

(assuming a 0.8  $\mu$ m CMOS technology), the expression for the short-circuit power is accurate to within 15% of SPICE. These larger RC loads are of interest because short-circuit power can account for more than 20% of the total transient power dissipation. Furthermore, another source of power dissipation is introduced by the resistance of long interconnect. An expression for resistive power dissipation is also presented in this section. This expression has an error of less than 15% as compared to SPICE.

When considering power in interconnect, the resistive component cannot be neglected. The resistance of long interconnects not only contributes directly to the power dissipated due to the resistive component, but also causes longer transition times, leading to greater short-circuit power dissipation. Both short-circuit and resistive power dissipation along with dynamic power have been modeled with good accuracy.

# 3.3 Determining the Parameters $I_{do}$ and $V_{do}$

The  $\alpha$ -power law model parameters,  $I_{do}$  and  $V_{do}$ , describe the maximum drain current and drain saturation voltage, respectively, where  $V_{GS} = V_{DD}$  [33]. For increased accuracy of the delay expressions that are presented in section 3.1.2,  $I_{do}$  and  $V_{do}$  may need to be adjusted for a specific CMOS technology. These two parameters that are used as part of the  $\alpha$ -power law model are determined as explained by Sakurai in [40]. With these parameters, an initial estimate of the propagation delay and transition time for any RC load for a specific CMOS technology can be made using (3.5) and (3.6), respectively.

These analytical estimates are compared to SPICE for a variety of RC load impedances. In order to improve the accuracy of the analytical expressions,  $I_{do}$  and  $V_{do}$  can be curve fit to SPICE. This process is performed only once for a given technology.

The adjustment of  $I_{do}$  is performed by determining

$$k_{PD} = \frac{C}{\mathcal{V}_{do}(\frac{t_{PDS}}{602} - RC)} \tag{3.15}$$

and

$$k_{tt} = \frac{C}{U_{do}(\frac{t_{LS}}{2.3} - RC)} \qquad , \tag{3.16}$$

where  $t_{PDS}$  and  $t_{tS}$  are the SPICE derived propagation delay and transition times for the range of RC loads, i.e., C = 10 fF, 100 fF, 1 pF and R = 10, 100, 1000  $\Omega$ . The factors  $k_{PD}$  and  $k_{tt}$  across this range of loads are averaged, and the result is  $k_{avg}$ ,

$$k_{avg} = \frac{1}{2} \sum_{i=1}^{n} \frac{k_{PD}}{n} + \frac{1}{2} \sum_{i=1}^{n} \frac{k_{tt}}{n}$$
 (3.17)

 $V_{do}$  is divided by  $k_{avg}$  or  $I_{do}$  is multiplied by  $k_{avg}$ . These analytical delay expressions produce results that yield values for the propagation delay and tran-

sition time that are the least square error from SPICE for this specific CMOS technology.

#### 3.4 Conclusions

A simple yet accurate expression for the output voltage of a CMOS inverter as a function of time driving a resistive-capacitive load is presented. With this expression, equations characterizing the propagation delay and transition time of a CMOS inverter driving an RC load are presented. These expressions are accurate to within 25% of SPICE for a wide variety of RC loads. Furthermore, since the output waveform of this circuit is accurately modeled, the short-circuit power dissipation of the following CMOS stage loading the interconnect line can be accurately estimated to within 15% for highly resistive loads. The resistive power dissipation can be modeled to within 15% error for RC loads ranging from 0.1 ps to 1 ns. Therefore, due to the simplicity and accuracy of these expressions, the delay and power characteristics of a CMOS inverter driving a high impedance RC interconnect line can be efficiently estimated.

# Chapter 4

# Repeater Design for Optimal Speed and Power in RC Lines

Several methods have been presented in the literature to reduce interconnect delay so that these impedances do not dominate the delay of a critical path. Bakoglu presents a method in which the delay of a repeater is characterized by the input capacitance and output resistance based on the geometric size of the repeaters [23, 32]. Bakoglu equalizes the delay of the repeaters and the interconnect delay to optimize the number and size of the repeaters for a specific *RC* interconnect impedance.

In [41,42], Wu and Shiau describe a repeater implementation to reduce interconnect delay. Their method uses a linearized form of the Shichman-Hodges equations [31] at a specific operating point to determine the proper repeater insertion locations. Nekili and Savaria consider optimal methods for driving resistive interconnect in [43]. They introduce the concept of parallel regeneration in [44] in which precharge circuitry is added to the repeaters to decrease the evaluation time. This technique requires fewer repeaters, however extra area is necessary, adding parasitic capacitance. Furthermore, this technique requires a precharge signal to operate correctly.

Dhar and Franklin present a mathematical treatment for optimal repeater insertion in [45]. Dhar and Franklin present elegant solutions to optimize repeaters with and without area constraints; however, the repeater is modeled as a resistor and a capacitor and no closed form solution is developed. A semi-empirical approach describing the inverter current and with interconnect is presented in [19]. Other repeater insertion methods are described in [46–52].

In this chapter, CMOS inverting repeaters are presented as a simple yet effective way of reducing the total propagation delay and transition time characteristics of a system with highly resistive interconnect. A methodology is presented for determining the number and size of the repeaters to attain the minimum propagation delay based on an analytical expression derived from the  $\alpha$ -power law model for short-channel devices [33,53]. Using the  $\alpha$ -power law model permits the development of a repeater design methodology that considers the short-channel transistor effect of velocity saturation which is not considered in any of the aforementioned repeater methodologies [23, 32, 41–48, 54, 55]. Furthermore, the proposed model is based on nonlinear I-V equations rather than modeling a CMOS inverter as a discrete resistor and capacitor. Unlike previous work, the method presented in this chapter does not separate the device model from the interconnect model.

Alternative methods to uniform repeaters driving RC loads are also considered in this chapter. A tapered-buffer repeater structure provides high drive capability with low input capacitance; however, the additional buffer stages may add significant delay. It is shown here that for even relatively small resistances, uniform repeaters are found to be more effective in driving RC loads than tapered buffers or tapered-buffer repeaters.

In addition to delay, power is considered. With the introduction of portable

and massively parallel applications, power has become an increasingly important factor in the circuit design process [56]. For example, clock distribution networks can account for 40% of the total power dissipated on-chip [57]. A high performance clock distribution network can contain many thousands of repeaters due to the distributed RC nature of a clock tree. Thus, power consumption must be both accurately estimated and minimized when developing design techniques that improve the speed of the signal propagation through long resistive interconnects. Two components to the transient power dissipation are considered. A comparison of the power contribution of both the dynamic power and the short-circuit power in a CMOS inverter driving an RC line is examined. An empirical analysis is presented for determining the optimal number of repeaters to attain the minimum power when considering both short-circuit and dynamic power dissipation.

The chapter is organized as follows: in Section 4.1, a timing model of a CMOS inverter driving a lumped RC load that forms the basis for the following repeater design methodology is reviewed. Equations characterizing the signal delay through a repeater chain are presented in Section 4.2. A comparison of these analytic design expressions versus SPICE are presented in Section 4.3. In Section 4.4, the use of tapered-buffer repeaters versus uniformly sized repeaters is discussed. Power dissipation in repeater chains is examined in Section 4.5. Finally some conclusions are presented in Section 4.6.

# 4.1 Expressions for an Inverter Driving an RC Load

The foundation for the repeater model is reviewed in this section. An analytical model describing the output voltage of a CMOS inverter driving an RC load



Figure 4.1: A CMOS inverter driving an RC load

(see Figure 4.1) given a step input is presented. The information describing the waveform shape permits a more accurate delay estimation as compared to estimating the path delay based on the classical Elmore delay model [58]. Since the Elmore delay adds the products of a resistor (composed of the sum of the linearized model of an inverter and the interconnect resistance) and all of its downstream capacitors, the Elmore delay does not account for the interaction of an inverter with the RC interconnect nor does the Elmore delay consider the shape of the output signal waveform. Thus, by integrating a more accurate timing model of a CMOS inverter into a methodology for inserting repeaters into an RC line, a more efficient circuit implementation can be achieved.

The expression for  $V_{out}$  can be rearranged to determine the time  $t_{out}$  required for a CMOS inverter to reach an output voltage  $V_{out}$  given a step input signal,

$$t_{out} = \frac{\mathcal{O}_{do}RC + C}{\mathcal{O}_{do}} \ln\left(\frac{V_{DD}}{V_{out}}\right) \tag{4.1}$$

Equation (4.1) can be used to express the 50% and 90% output delay with respect

to a step input signal. These time delays are, respectively,

$$t_{50} = .693 \frac{(1 + \mathcal{O}_{do} R_{int}) C_{int}}{\mathcal{O}_{do}}$$
 (4.2)

and

$$t_{90} = 2.3 \frac{(1 + \mathcal{O}_{do}R_{int})C_{int}}{\mathcal{O}_{do}}$$
 (4.3)

These expressions are used in the following section to model the total delay required by a repeater chain to drive a distributed RC load.

# 4.2 Delay of a Repeater Chain Driving an RC Load

Equations (4.1)-(4.3) presented in the previous section provide the analytic basis for modeling the total delay of a repeater chain driving an RC load. Two other expressions are also presented in this section to complete the repeater delay model. The resulting delay model for an n-stage repeater is compared to SPICE and presented in this section.

Analytical expressions describing the behavior of a CMOS inverter driving a lumped RC load (as shown in Figure 4.1) based on Sakurai's  $\alpha$ -power law model are presented in the previous section. Equation (4.1) can be expanded to include the parasitic capacitances of the following inverting repeater, as

$$t_{out} = \frac{(1 + \mathcal{V}_{do}R)(C_{rep} + C_{int})}{\mathcal{V}_{do}} \ln\left(\frac{V_{DD}}{V_{out}}\right) , \qquad (4.4)$$

where  $C_{rep}$  and  $C_{int}$  are the capacitances of the following inverter and the interstage load capacitance (see Figure 4.2), respectively.

The delay required to propagate a signal through a highly resistive interconnect can be reduced if the interconnect is broken up and distributed among a number of repeaters such as shown in Figure 4.2. However, the delay of this signal path



Figure 4.2: n equal sized CMOS inverting repeaters driving an RC load.

will increase if a non-optimal number of repeaters is chosen. In order to choose the optimal number of repeaters for a given RC load, the delay from the input of the first repeater to the output of the last repeater must first be determined.

The analytical expression for the total time  $t_{total}$  from the input to the output of an n-stage repeater system is the sum of several expressions,

$$t_{total} = t_{first \ stage} + (n-2)t_{int. \ stage} + t_{final \ stage} \qquad (4.5)$$

Each term in (4.5) is characterized by a step input to a single inverter driving an RC load, permitting a tractable solution of the delay time. This assumption permits the output waveform to be approximated by (4.4). The output waveform of the first stage is the input waveform of the following repeater assuming that the second repeater turns on quickly when its input threshold is reached. An example of this series of piecewise connections is shown in Figure 4.3. The signal information describing the waveform shape permits a more accurate delay estimation as compared to estimating the path delay based on the classical Elmore delay model [58]. Since the Elmore delay adds the products of a resistor (composed of the sum of the linearized model of a repeater and the interconnect resistance) and all of its downstream capacitors, the Elmore delay does not account for the interaction of a repeater with the RC interconnect nor does the Elmore delay consider the shape of the output signal waveform. Thus, by integrating a more accurate timing

model of a CMOS repeater into an algorithm for inserting repeaters into an RC tree, a more efficient circuit implementation can be achieved.

The first term  $t_{first\ stage}$  is the time required for the output of the first repeater to reach the turn-on voltage of the second repeater assuming the output voltage is initially at  $V_{DD}$ . The term  $t_{int.\ stage}$  describes the time required for each repeater between the first and last stage to transition from  $V_{DD} + V_{TP}$  to  $V_{TN}$  or vice versa. The time required for the output of the final repeater to reach either 10%, 90%, or 50% of  $V_{DD}$  from a threshold voltage is described by the third component of (4.5),  $t_{final\ stage}$  [59]. These three components of (4.5) are described in more detail below with reference to Figure 4.3.



Figure 4.3: The analytic and SPICE derived output waveforms of an 11-stage repeater chain driving an evenly distributed RC load of 1 K $\Omega$  and 1 pF.

The first component of  $t_{total}$ ,  $t_{first\ stage}$ , is the time required for the output signal of the first repeater to drop from  $V_{DD}$  to  $V_{TN}$ , the threshold voltage of the N-channel device (labeled 1 in Figure 4.3) assuming a step input signal.  $V_{TN}$  is chosen as the end point because it is assumed during fast switching that the pull-up

device of the following repeater turns on hard near the voltage at which the pulldown device turns off. In addition, it is assumed that the rising (falling) output of an inverting repeater reaches  $V_{TN}$  ( $V_{DD} + V_{TP}$ ) by the time the falling input reaches  $V_{TN}$  ( $V_{DD} + V_{TP}$ ). Thus the signal waveforms of the intermediate stages consistently operate between  $V_{TN}$  and  $V_{DD} + V_{TP}$ . The time for this switching to occur is

$$t_{V_{TN}} = \frac{(1 + \mathcal{O}_{do_N} R_{int})(C_{int} + C_{rep})}{\mathcal{O}_{do_N}} \ln\left(\frac{V_{DD}}{V_{TN}}\right)$$
(4.6)

This equation also describes the time for the signal to transition from ground to  $V_{DD} + V_{TP}$  when each N-channel transistor is replaced by a P-channel transistor. All of the following equations can be similarly expressed for a P-channel device. Note that  $V_{TP}$  is the P-channel threshold voltage and is negative for an enhancement mode device.

The delay of each successive stage,  $(n-2)t_{int.\ stage}$ , excluding the final stage, is modeled as the time required for the signal to transition from  $V_{DD} + V_{TP}$  to  $V_{TN}$ . Equation (4.6) describes the time for the output signal to change from  $V_{DD}$  to  $V_{TN}$ . Therefore, the time for the signal to transition from  $V_{DD}$  to  $V_{DD} + V_{TP}$  must be subtracted from (4.6). Equation (4.7) describes the time for the output signal to change from  $V_{DD}$  to  $V_{DD} + V_{TP}$ .

$$t_{t_P} = \frac{(1 + \mathcal{O}_{do_N} R_{int})(C_{int} + C_{rep})}{\mathcal{O}_{do_N}} \ln \left(\frac{V_{DD}}{V_{DD} + V_{TP}}\right) \qquad (4.7)$$

Therefore, an intermediate stage delay  $t_{int.\ stage}$  is described by  $(t_{V_{TP}} - t_{t_N})$  for a rising repeater output and  $(t_{V_{TN}} - t_{t_P})$  for a falling repeater output (labeled 2 and 3, respectively, in Figure 4.3). The two preceding expressions are alternately added to the total delay for each corresponding repeater stage up to the input of

the final stage of the chain. The expression  $(t_{V_{TP}} - t_{t_N})$  reduces to

$$t_N = \frac{(1 + \mathcal{O}_{do_N} R_{int})(C_{int} + C_{rep})}{\mathcal{O}_{do_N}} \ln \left(\frac{V_{DD} + V_{TP}}{V_{TN}}\right) \qquad (4.8)$$

 $t_P$  has a similar form of this expression.

The time  $t_{total}$  describes the output of the complete repeater system in terms of either: (1) the delay to reach 10% or 90% of  $V_{DD}$  from the input which is defined as the 90% output delay time  $t_{90}$  or (2) the delay at 50%  $V_{DD}$  which is defined as the 50% delay  $t_{50}$ . In order to determine the total delay to the 90% point,  $t_{final\ stage}$  (labeled 4 in Figure 4.3) is  $t_{90}$  [from (4.3)] minus  $t_{t_N}$  since (4.3) is from  $V_{DD}$  to 10% and the signal transition time to  $V_{DD} + V_{TP}$  must be included. Similarly, to determine the total delay to the 50% point,  $t_{final\ stage}$  is  $t_{50}$  [from (4.2)] minus  $t_{t_N}$ .

Having defined the delay of the components of the repeater system (labeled 1-4 in Figure 4.3), the total time from the step input at the first repeater to the output of an even number of repeaters (for a 90% output change) is

$$t_{total(even)} = t_{V_{TN}} + \frac{(n-2)}{2} (t_N + t_P) + (t_{90} - t_{t_N})$$
 (4.9)

and for an odd number of repeaters, the time is

$$t_{total(odd)} = \frac{(n-1)}{2} (t_N + t_P) + t_{90} (4.10)$$

A plot of  $t_{total}$  versus the size and number of repeater stages n for an example CMOS technology and RC load is shown in Figure 4.4. The optimal implementation of the number and size of the repeaters for this specific RC load is the minimum point on this graph. A similar graph can be determined for any RC load. Thus, (4.9) and (4.10) describe the total delay through an n-stage repeater system. These expressions are compared to SPICE in the following section.



Figure 4.4: The 90% output delay time for an interconnect line as a function of the number of repeaters and repeater width. ( $R=1~\mathrm{K}\Omega,~C=1~\mathrm{pF},~0.8~\mu\mathrm{m}$  CMOS technology)

# 4.3 Analytical Delay Model Versus SPICE

The accuracy of the delay model for a repeater chain presented in the previous section is compared to SPICE in this section. Two different RC loads have been chosen to exemplify the effects of the interconnect resistance and capacitance on the repeater design methodology (the RC loads are 1 K $\Omega$  and 1 pF and 3 K $\Omega$  and 3 pF). These simulations are based on a 0.8  $\mu$ m CMOS technology. The plots shown in Figure 4.5 depict the 90% output delay  $t_{90}$  and the 50% output delay  $t_{50}$  of an RC load of 1 K $\Omega$  and 1 pF distributed evenly among one to 20 repeaters. The size of each repeater is uniform ( $W_N = 3 \mu m$  and  $W_P = 9 \mu m$ ), although this analysis does not restrict the geometric widths to be uniform. The rise and fall time of each individual repeater is ratioed to maintain nearly equal transition times.

The 50% output delay of a chain of repeaters driving an RC load as a function of the number of repeater stages is shown in Figure 4.5 for both the analytic expression and SPICE. The maximum error of the 50% and the 90% output delays is 12% and 8%, respectively. Note that the greatest error occurs when the repeater



Figure 4.5: The analytical and simulated 50% and 90% delay times for a 1  $\rm K\Omega$  and 1 pF load evenly distributed across a number of uniformly sized repeaters.

chain is two or three stages. The repeater model is most accurate when the loaded inverter operates predominately in the linear region. With only two or three repeaters, the inverters operate outside of the linear region for a longer period of time than with more than three repeaters. As shown in Figure 4.5, there is close agreement between the analytical and experimental results for a repeater chain with more than four repeaters.

The error of the analytical delay as compared with the delay derived from SPICE for a given RC load, repeater size, and number of repeaters is shown in Tables 4.1, 4.2, and 4.3 and presented in graphical form in Figure 4.6. In Tables 4.1, 4.2, and 4.3, the number of stages into which the RC load is partitioned is shown in the first column. The propagation delay of the analytic expression and SPICE is shown in the second and third columns, respectively. The error of

Table 4.1: Per cent error between analytical total delay model (both 50% and 90% output delay) versus SPICE for a given number of repeater stages, a repeater size of  $(W_N = 1 \ \mu\text{m}, W_P = 3 \ \mu\text{m})$ , and an interconnect load of  $R = 1 \ \text{K}\Omega$  and  $C = 1 \ \text{pF}$ . (0.8  $\mu\text{m}$  CMOS technology)

| # of   |                      | R:    | = 1 KO | C = 1  pF      |               |       |  |
|--------|----------------------|-------|--------|----------------|---------------|-------|--|
| Stages |                      |       |        | $W_P = 3\mu r$ | n             |       |  |
|        | t <sub>50</sub> (ns) |       |        |                | $t_{90}$ (ns) |       |  |
|        | Analytic             | SPICE | Error  | Analytic       | SPICE         | Error |  |
| 1      | 1.98                 | 2.37  | 16%    | 6.59           | 6.70          | 2%    |  |
| 2      | 3.11                 | 3.37  | 6%     | 5.68           | 5.67          | 1%    |  |
| 3      | 3.37                 | 3.45  | 2%     | 4.55           | 4.70          | 2%    |  |
| 4      | 3.53                 | 3.73  | 5%     | 4.71           | 4.80          | 0%    |  |
| 5      | 3.62                 | 3.74  | 3%     | 4.29           | 4.46          | 2%    |  |
| 6      | 3.70                 | 3.92  | 5%     | 4.47           | 4.61          | 1%    |  |
| 7      | 3.77                 | 3.92  | 3%     | 4.23           | 4.43          | 3%    |  |
| 8      | 3.83                 | 4.05  | 5%     | 4.40           | 4.56          | 2%    |  |
| 9      | 3.89                 | 4.06  | 4%     | 4.24           | 4.45          | 2%    |  |
| 10     | 3.94                 | 4.16  | 5%     | 4.39           | 4.57          | 2%    |  |
| 11     | 4.00                 | 4.18  | 4%     | 4.28           | 4.51          | 4%    |  |
| 12     | 4.04                 | 4.26  | 4%     | 4.42           | 4.61          | 3%    |  |
| 13     | 4.10                 | 4.31  | 5%     | 4.34           | 4.58          | 4%    |  |
| 14     | 4.14                 | 4.37  | 5%     | 4.46           | 4.67          | 3%    |  |
| 15     | 4.19                 | 4.44  | 5%     | 4.40           | 4.66          | 4%    |  |
| 16     | 4.24                 | 4.46  | 4%     | 4.51           | 4.73          | 3%    |  |
| 17     | 4.29                 | 4.53  | 5%     | 4.47           | 4.74          | 5%    |  |
| 18     | 4.33                 | 4.63  | 6%     | 4.57           | 4.88          | 5%    |  |

the analytic expression for the 50% output delay compared to SPICE is presented in the fourth column. The same information but for the 90% output delay time is listed in the fifth through seventh columns.

The deviation of the analytical result from SPICE for both  $t_{50}$  and  $t_{90}$  is shown as a function of the number of stages in Figure 4.6. As shown in Figure 4.6, for large RC loads (e.g., 3 K $\Omega$  and 3 pF), the model becomes less accurate since the repeaters operate for relatively less time within the linear region. At first glance,

Table 4.2: Per cent error between analytical total delay model (both 50% and 90% output delay) versus SPICE for a given number of repeater stages, a repeater size of  $(W_N=3~\mu\text{m},\,W_P=9~\mu\text{m})$ , and an interconnect load of  $R=1~\text{K}\Omega$  and C=1~pF. (0.8  $\mu\text{m}$  CMOS technology)

| # of   | $R=1~\mathrm{K}\Omega,~C=1~\mathrm{pF}$ |                                            |       |                           |       |       |  |
|--------|-----------------------------------------|--------------------------------------------|-------|---------------------------|-------|-------|--|
| Stages |                                         | $W_N = 3\mu \text{m}, W_N = 9\mu \text{m}$ |       |                           |       |       |  |
|        | t                                       | 5 <sub>50</sub> (ns)                       |       | $t_{90} \; (\mathrm{ns})$ |       |       |  |
|        | Analytic                                | SPICE                                      | Error | Analytic                  | SPICE | Error |  |
| 1      | 1.12                                    | 1.13                                       | 0%    | 3.73                      | 3.61  | 3%    |  |
| 2      | 1.49                                    | 1.37                                       | 9%    | 2.62                      | 2.39  | 8%    |  |
| 3      | 1.51                                    | 1.34                                       | 12%   | 2.02                      | 1.89  | 6%    |  |
| 4      | 1.53                                    | 1.46                                       | 5%    | 1.99                      | 1.89  | 5%    |  |
| 5      | 1.56                                    | 1.47                                       | 5%    | 1.82                      | 1.76  | 3%    |  |
| 6      | 1.58                                    | 1.56                                       | 1%    | 1.87                      | 1.82  | 3%    |  |
| 7      | 1.62                                    | 1.58                                       | 2%    | 1.79                      | 1.77  | 1%    |  |
| 8      | 1.65                                    | 1.64                                       | 0%    | 1.85                      | 1.84  | 1%    |  |
| 9      | 1.69                                    | 1.67                                       | 4%    | 1.82                      | 1.81  | 0%    |  |
| 10     | 1.72                                    | 1.73                                       | 0%    | 1.88                      | 1.90  | 1%    |  |
| 11     | 1.76                                    | 1.77                                       | 0%    | 1.87                      | 1.89  | 1%    |  |
| 12     | 1.80                                    | 1.82                                       | 1%    | 1.93                      | 1.96  | 2%    |  |
| 13     | 1.84                                    | 1.86                                       | 1%    | 1.93                      | 1.97  | 2%    |  |
| 14     | 1.88                                    | 1.91                                       | 1%    | 1.99                      | 2.03  | 2%    |  |
| 15     | 1.92                                    | 1.96                                       | 2%    | 2.00                      | 2.05  | 2%    |  |
| 16     | 1.96                                    | 2.00                                       | 2%    | 2.06                      | 2.09  | 1%    |  |
| 17     | 2.00                                    | 2.06                                       | 3%    | 2.07                      | 2.14  | 3%    |  |
| 18     | 2.04                                    | 2.11                                       | 3%    | 2.12                      | 2.21  | 4%    |  |

this behavior may seem to contradict the data indicated in Figure 3.2; however, when each repeater is driving a large RC load, the input waveforms driving the intermediate repeater stages degrade, causing those repeaters with slow input waveforms to operate in the saturation region rather than in the linear region. However, as shown in Tables 4.1, 4.2, and 4.3, with most repeater configurations the error is typically much less than 15%.

Table 4.3: Per cent error between analytical total delay model (both 50% and 90% output delay) versus SPICE for a given number of repeater stages, a repeater size of  $(W_N=3~\mu\text{m},~W_P=9~\mu\text{m})$ , and an interconnect load of  $R=3~\text{K}\Omega$  and C=3~pF. (0.8  $\mu\text{m}$  CMOS technology)

| # of   |                                                                      | R ·   | – 3 KO | C = 3  pF |       |       |
|--------|----------------------------------------------------------------------|-------|--------|-----------|-------|-------|
| Stages | $R=3~	ext{K}\Omega,~C=3~	ext{pF} \ W_N=3\mu	ext{m},~W_N=9\mu	ext{m}$ |       |        |           |       |       |
| Juages | $t_{50} \text{ (ns)} \qquad t_{90} \text{ (ns)}$                     |       |        |           |       |       |
|        | Analytic                                                             | SPICE | Error  | Analytic  | SPICE | Error |
| 1      | 7.53                                                                 | 7.39  | 2%     | 25.0      | 24.4  | 2%    |
| 2      | 8.03                                                                 | 6.11  | 31%    | 13.8      | 11.5  | 20%   |
| 3      | 6.95                                                                 | 5.14  | 35%    | 9.55      | 7.79  | 22%   |
| 4      | 6.46                                                                 | 5.08  | 27%    | 8.45      | 6.94  | 21%   |
| 5      | 6.03                                                                 | 4.81  | 25%    | 7.20      | 6.01  | 20%   |
| 6      | 5.80                                                                 | 4.84  | 20%    | 6.92      | 5.87  | 18%   |
| 7      | 5.59                                                                 | 4.71  | 19%    | 6.32      | 5.47  | 15%   |
| 8      | 5.47                                                                 | 4.77  | 15%    | 6.24      | 5.47  | 14%   |
| 9      | 5.37                                                                 | 4.69  | 14%    | 5.88      | 5.22  | 13%   |
| 10     | 5.30                                                                 | 4.75  | 11%    | 5.88      | 5.29  | 11%   |
| 11     | 5.25                                                                 | 4.73  | 11%    | 5.64      | 5.13  | 10%   |
| 12     | 5.21                                                                 | 4.78  | 9%     | 5.67      | 5.20  | 9%    |
| 13     | 5.18                                                                 | 4.78  | 8%     | 5.50      | 5.11  | 8%    |
| 14     | 5.17                                                                 | 4.82  | 7%     | 5.55      | 5.18  | 7%    |
| 15     | 5.16                                                                 | 4.82  | 7%     | 5.43      | 5.18  | 5%    |
| 16     | 5.16                                                                 | 4.82  | 7%     | 5.49      | 5.11  | 5%    |
| 17     | 5.16                                                                 | 4.90  | 5%     | 5.39      | 5.20  | 4%    |
| 18     | 6.17                                                                 | 4.89  | 5%     | 5.46      | 5.13  | 6%    |

# 4.4 Uniform Repeaters Versus Tapered Buffers and Tapered-Buffer Repeaters

Depending on the magnitude of the RC load, the form of the repeater buffer structure to minimize the total delay may be expected to change. With larger RC loads or large capacitances, a tapered buffer or a tapered-buffer repeater system (as shown in Figs. 4.7a and 4.7b) may decrease the total delay required to propagate a signal along a resistive line. Intuitively, an interconnect line that is



Figure 4.6: The per cent error of the analytical value of the 50% and 90% output delays versus SPICE for various loads and repeater sizes.

highly capacitive and non-negligibly resistive may exhibit characteristics similar to a purely capacitive line. Since a purely capacitive line is optimally driven by a tapered buffer (see Figure 4.7a) [32,60], a highly capacitive and moderately resistive line may possibly be more efficiently driven by a series of tapered buffers. The application of uniform repeaters versus tapered buffers and tapered-buffer repeaters to an RC line is therefore discussed in this section.

An estimate of the total delay of a tapered-buffer repeater system is performed in a manner similar to that presented for a uniform repeater system. Some modifications, however, are made to accommodate the use of tapered buffers.  $C_{rep}$ , for example, is now the capacitance of a minimum-sized inverter since the first stage



Figure 4.7: Two methods of driving interconnections with tapered buffers: (a) A single tapered buffer (b) A three stage tapered-buffer repeater system. The first stage is a minimum sized repeater. The tapering factor is e.

of each tapered-buffer repeater is a minimum-sized inverter. The drive current  $I_{DO}$  of the tapered-buffer repeater is related to the size of the final buffer in each tapered-buffer repeater stage.

The delay for a single tapered-buffer repeater is

$$t_{out} = t_{p,opt} + t_{rep} = \ln(\frac{C_L}{C_i})t_{p0} + \frac{(1 + \mathcal{O}_{do}R)(C_{min} + C_{int})}{\mathcal{O}_{do}} \ln(\frac{V_{DD}}{V_{out}})$$
(4.11)

 $t_{out}$  for a tapered-buffer repeater is integrated into a similar expression as (4.5). The components of (4.11) are as follows:  $C_L$  is the gate capacitance of the final buffer in the repeater;  $C_i$  is the input gate capacitance of a minimum-size inverter; and  $t_{p0}$  is the propagation delay of a minimum-size inverter driving a capacitance  $e \cdot C_i$  [61] since the tapering factor is assumed to be e. For each tapered buffer, the final inverter stage is of size  $W_{opt}$  and the number of stages in the repeater is  $\ln(W_{opt})$  (note that this value must be rounded to an integer).

A comparison of the efficacy of tapered buffers and tapered-buffer repeater systems versus uniformly sized repeaters for various loads is shown in Table 4.4. Furthern, re, the accuracy of the analytical models for both the uniform and tapered-buffer repeaters versus SPICE is also listed in the same table. The single tapered buffer has been optimized for the specified load capacitance. The results listed in columns seven, part I and column 5, part II shown in of Table 4.4

Table 4.4: The 90% output time for optimally sized uniform repeaters, tapered-buffer repeaters, and tapered buffers for various loads as compared with SPICE.

|             | <del></del>              | Single    |                        |               |               |                      |
|-------------|--------------------------|-----------|------------------------|---------------|---------------|----------------------|
| Total RC    |                          |           |                        |               |               | Tapered Buffer       |
| Load        | # of                     | $W_{opt}$ | Analytical             | SPICE         | Error         | SPICE                |
|             | repeaters                |           | $t_{90} \; ({\rm ns})$ | $t_{90}$ (ns) | %             | t <sub>90</sub> (ns) |
| 1 KΩ 1 pF   | 7                        | 13        | 0.90                   | 0.98          | 8             | 2.8                  |
| 1 KΩ 5 pF   | 15                       | 29        | 2.10                   | 2.18          | 4             | 12.1                 |
| 5 KΩ 2 pF   | 33                       | 12        | 2.96                   | 2.75          | 8             | 23.5                 |
| 1 KΩ 20 pF  | 31                       | 56        | 4.20                   | 4.43          | 5             | 47                   |
| 1 KΩ 100 pF | 67                       | 124       | 9.46                   | 11.15         | 15            | > 50                 |
|             | Tapered-Buffer Repeaters |           |                        |               |               |                      |
|             | # of                     | # of      | $W_{opt}$              | Analytical    |               | Error                |
|             | repeaters                | Stages    | $\mu\mathrm{m}$        | $t_{90}$ (ns) | $t_{90}$ (ns) | %                    |
| 1 KΩ 1 pF   | 5                        | 2         | 2                      | 3.20          | 3.03          | 6                    |
| 1 KΩ 5 pF   | 5                        | 3         | 9                      | 7.36          | 5.05          | 45                   |
| 5 KΩ 2 pF   | 9                        | 2         | 2                      | 7.30          | 5.70          | 28                   |
| 1 KΩ 20 pF  | 5                        | 4         | 34                     | 15.05         | 10.1          | 50                   |
| 1 KΩ 100 pF | 11                       | 5         | 75                     | 36.00         | 18.5          | 50                   |

as compared to column five, part I demonstrate the importance of interconnect resistance. Even small resistances have a large effect on the signal delay characteristics. RC loads in which the capacitance is the dominant component of the interconnect impedance are of primary interest when considering tapered-buffer repeaters. However, as shown in Table 4.4, even when driving distributed RC loads as large as 1 K $\Omega$  and 100 pF, uniform repeaters remain more delay efficient than both tapered buffers and tapered-buffer repeaters.

# 4.5 Power Dissipation in Repeater Chains

As the input transition slows, more short-circuit power is dissipated within the repeater stage. The input signal transition time is dependent upon the number of repeaters in the chain. If additional repeaters are inserted into a line to drive a long

resistive interconnect, each repeater drives a smaller RC load with a waveform exhibiting a faster transition time, permitting the input transition of the following repeater to be faster. However, these additional repeaters may increase the short-circuit power of the total repeater system. The peak short-circuit current, which is proportional to the device width, is the other primary factor that determines the magnitude of the short-circuit power [61–63]. An example of short-circuit current and power in a repeater chain is shown in Figure 4.8.



Figure 4.8: Short-circuit current and power dissipated in a four-stage repeater with  $W_N=5~\mu{\rm m}$  and  $W_P=15~\mu{\rm m}$ ,  $f=10~{\rm MHz}$ .

Simulations demonstrate that when device sizes are small, the contribution of short-circuit power is small in comparison to the dynamic power, typically ranging from 1% to 5%. As the geometric width of the repeaters is increased, the contribution of the short-circuit and dynamic power also increases. However, as the geometric width and the number of repeaters increase, dynamic power increases linearly, whereas short-circuit power changes non-linearly. A comparison

of short-circuit power versus dynamic power of a repeater system driving an RC load of 1 K $\Omega$  and 1 pF is shown in Figure 4.9. Both the short-circuit power and the dynamic power dissipated within the repeater chain versus the number of repeaters are shown. For the larger sized repeater, the peak short-circuit power is about 30% of the dynamic power at two stages; at five stages the short-circuit power is 12% of the dynamic power; and at nine stages, about 5%. A five stage repeater system provides the minimum transition time for this RC load. Thus, reducing the repeater size to  $W_N = 15 \mu m$  and  $W_P = 45 \mu m$  from  $W_N = 25 \mu m$  and  $W_P = 75 \mu m$  saves 40% in area ( $\approx 200 \mu m^2$ ), reduces the short-circuit power by 60%, and reduces the dynamic power by 12% in return for a 5% increase in propagation delay. Note that the maximum short-circuit power savings occurs when the input transition time of each repeater is approximately equal to the repeater output transition time [14,61].



Figure 4.9: The short-circuit and dynamic power dissipation versus the number of stages in a repeater system. Note the small increase in short-circuit power from nine to ten stages due to the increase in peak current with negligible improvement in transition time.

#### 4.6 Conclusions

A closed form timing model of a CMOS inverter driving a resistive-capacitive load based on the  $\alpha$ -power law device model has been presented. This analytical expression differs from previous work because the short-channel transistor effect of velocity saturation is considered. The timing model for a CMOS inverter has been expanded to determine the overall delay of a signal propagating through a uniform repeater chain driving a large distributed resistive-capacitive load. Analytical estimates of delay with these design equations are within 16% of SPICE for loads representative of long resistive interconnect.

The performance characteristics of uniform and tapered-buffer repeaters are compared for a variety of RC loads. The resistance in RC lines is found to have a larger than expected effect on the delay of a signal propagating along a long line. Uniform repeaters outperform tapered buffers and tapered-buffer repeaters when driving even relatively low resistive RC loads. It is thus more advantageous to use a number of small uniform repeaters rather than a few (or one) tapered-buffer repeaters.

Power dissipation in CMOS inverters and repeaters driving RC lines has also been investigated. It is also shown that short-circuit power can represent up to 30% of the total dynamic power dissipation. An empirical comparison of power in repeater chains is presented. The application of the repeater expressions developed in this chapter to a specific repeater implementation demonstrate that a 4% increase in input to output propagation delay can be traded off for a 40% savings in area and a 15% savings in power.

## Chapter 5

# Repeater Insertion in RC Trees to Minimize Delay

The timing model of a CMOS inverter driving an RC impedance presented in the previous chapters has been applied to the development of a repeater design methodology and related algorithms for efficiently driving RC tree structures, such as a clock distribution network, so as to reduce both the signal delay and slew rate. In this methodology, the number and size of the repeaters to minimize the propagation delay and transition time from the root node to each leaf node are determined. The repeaters are restricted to the same geometric size and equal RC impedance per interconnect section within each branch. The equal size and section impedance conditions are known as uniform repeater insertion [32, 45], in which balancing the interconnect and repeater delay minimizes the total path delay along an RC line.

The algorithm and software implementation of two proposed methodologies, a local RC branch optimization methodology and a global RC tree methodology, are described in this paper. The global optimization methodology is implemented in two parts, utilizing the downhill simplex for global minimization and simulated annealing to increase the size of the searchable design space.

The efficacies of these two repeater insertion methodologies are compared to a standard cascaded buffer methodology [60, 64–66]. Furthermore, the analytical equations characterizing the CMOS repeaters are shown to be accurate, generally within 10% of SPICE. The application of these local and global algorithms is also discussed in terms of relative run time and global optimality.

The local repeater insertion algorithm for RC trees is discussed in Section 5.1. The global repeater insertion algorithms are discussed in Section 5.2. A comparison of the analytic model versus circuit simulation is presented in Section 5.3. A comparison of the efficiency of the local- and global-optimal repeater insertion methodologies versus using cascaded buffers to drive resistive tree-based interconnect is also described in Section 5.3. Power dissipation in RC trees is examined in Section 5.4. Finally, some concluding comments are offered in Section 5.5.

## 5.1 Local Branch Repeater Insertion Algorithm

The structure of an RC tree is composed of a primary trunk with branching points. Each branch is modeled as a lumped resistance and capacitance, exemplified by the circuit shown in Figure 5.1. The total path delay is from the signal input at the root of the trunk to each end point of the tree (or leaf node).

The time required to drive a single branch or line of an RC tree using uniform repeaters, as described in [59] and shown again in Figure 5.2, is

$$t_{branch} = t_{first \ stage} + (n-2)t_{int. \ stage} + t_{final \ stage}$$
 (5.1)

The components,  $t_{first\ stage}$ ,  $t_{int.\ stage}$ , and  $t_{final\ stage}$ , utilize an expression derived from the Sakurai  $\alpha$ -power law [33] for the delay of a CMOS inverter reaching



Figure 5.1: An example of an RC tree. Ordered triplets (i, j, k) are used to identify specific branches (note that the downstream nodes are to the right of the upstream nodes).



Figure 5.2: n equal sized CMOS inverting repeaters driving a branch in an RC tree.

an output voltage  $V_{out}$  given a step input signal [62],

$$t_{out} = \frac{(1 + \mathcal{V}_{do}R)(C_{rep/branch} + C_{int})}{\mathcal{V}_{do}} \ln\left(\frac{V_{DD}}{V_{out}}\right) . \tag{5.2}$$

 $U_{do}$  is the saturation conductance, a device parameter from the  $\alpha$ -power law model derived from  $\frac{I_{do}}{V_{do}}$ .  $I_{do}$  is the saturation current of the device when  $V_{DS} = V_{DD}$ .  $V_{do}$  is the voltage at which the device begins to operate in the saturation region [33,62].  $C_{rep/branch}$  and  $C_{int}$  are the capacitances of the following inverting repeater and the interstage load capacitance, respectively.

A local optimization methodology and algorithm for inserting uniform repeaters into RC trees is presented in this section. This methodology is particularly appropriate if specific branch delays are being targeted. With the assumption that each branch has a repeater at its source, the minimum delay of each branch is initially determined. The total path delay from the root to each leaf is then minimized according to the expressions summarized in Section 5.1. The method for optimization is depth first, in which the lowest level branches are optimized first followed by each upstream branch. Thus, the RC tree is optimized locally, terminating at the root of the RC tree.

The algorithm to perform this repeater insertion process utilizes a priori information describing the RC impedances and the number of sub-branches of each branch of the RC tree beginning at the root. The lowest level of the RC tree hierarchy is reached when all of the leaf nodes have zero branches. The RC tree is constructed in this top-down fashion with every branch identified by a triplet (i, j, k). In this notation, i is the depth of the branch within the tree, j is the branch number with respect to its parent branch, and k is the branch number of the parent branch with respect to its parent branch. In other words, k is the grandparent of the current branch. Thus k of a branch at depth 3 is equal to j of the parent branch at depth 2. An example of this labeling is shown in Figure 5.1.

A plot of the delay of branch (1, 1, 0) (see Figure 4.4) derived from (5.1) versus the size and number of repeater stages n in a branch is shown again in Figure 5.3 for  $C_{rep} = 0$ . The optimal implementation of a repeater system for a specific RC load in terms of the number and geometric size of each repeater is represented by the minimum point on the graph. A similar graph can be drawn for each RC branch. The optimal number of repeaters inserted within a branch to minimize



Figure 5.3: The total delay for a branch as a function of the number of repeaters and repeater sizes. 0.8  $\mu$ m CMOS technology,  $C_{rep} = 0$ , R = 1 K $\Omega$ , and C = 1 pF.

the total delay is determined from a numerical solution of the data illustrated in Figure 5.3.

Once the tree has been constructed, it is traversed in a depth-first manner to determine the optimal repeater insertion for the final leaf nodes. When all of the branches of a parent have been optimized, the immediate upstream branch (or parent) is optimized while considering the input capacitance of the repeaters of the downstream branches according to the method described in Section 5.1. In Figure 5.1, the branches (3, 1, 1), (3, 2, 1), and (3, 3, 1) are downstream from branch (2, 1, 1).

The pseudocode of the algorithm used to locally insert repeaters into each branch is shown in Figure 5.4. The first function, build\_RCtree, recursively builds each branch starting from the root and its sub-branches based on the specific branch resistances and capacitances. The second function, insert\_repeater, is a recursive function, in which the minimum delay for inserting a uniform repeater system in a particular branch is determined. Note that the shape of the delay

```
(1)
function build_RCtree(node);
begin
   get R;
   get C;
   get number_of_branches;
   if (number_of_branches > 0)
      build RCtree(branch);
   number_of_branches--;
end
(2)
function insert_repeater(tree);
begin
   if (number_of_branches > 0)
      insert_repeater(branch)
   optimize_delay[width,number_of_repeaters]
   number_of_branches--;
end
```

Figure 5.4: The pseudocode of the local branch repeater insertion algorithm.

function describing a system of inserted repeaters in an RC branch is convex, so the local branch optimal repeater insertion system is reached quickly.

The performance improvement and accuracy are discussed more thoroughly in Section 5.3. All of the results presented in Section 5.3 are based on the 90% delay which is defined from the time the input is applied to the root node to the time required for the output to reach  $.9V_{DD}$  at the leaf nodes. As described in greater detail in Section 5.3, the path delay from the input of the RC tree to the final leaf nodes is improved from 25% to 50% by the application of the local repeater insertion algorithm over a typical cascaded buffer insertion method. The accuracy of the local repeater methodology is within at least 10% of SPICE and typically within 5%. An example of the RC tree shown in Figure 5.1 after the local repeater insertion process is applied is depicted in Figure 5.5. Note that the

number of repeaters inserted in each branch is shown inside the last repeater of that branch.



Figure 5.5: The RC tree shown in Figure 5.1 synthesized by the local branch repeater insertion system. The transistor widths are shown below the first repeater of each branch, and the number of repeaters per branch is shown inside the last repeater of each branch.

## 5.2 Global Tree Repeater Insertion Algorithm

A global optimization algorithm to determine the size and number of uniform repeaters inserted within each branch of an RC tree is discussed in this section. The same timing model as described in Section 5.1 is used in the global optimization algorithm. The downhill simplex method of Nelder and Mead [67, 68] in conjunction with simulated annealing [68, 69] is used to implement the multidimensional optimization process. Practically, the implemented version of the simulated annealing technique is a superset of the downhill simplex method. The application of simulated annealing to repeater insertion is explained in more detail below.



Figure 5.6: A methodology for globally optimal repeater insertion.

The flow of the repeater insertion methodology for determining the optimal size and location of each repeater is schematically shown in Figure 5.6. In the downhill simplex method, each parameter variable being optimized is an element in an n-dimensional vector  $\mathbf{x}$ . To insert repeaters into an RC tree, the vector  $\mathbf{x}$  contains the width and number of the uniformly sized and spaced repeaters within each branch. For example, in the RC tree shown in Figure 5.1, x[1] is the width and x[2] is the number of repeaters to be inserted into branch (1,1,0). In this example, 18 elements are in  $\mathbf{x}$ , nine repeater widths and numbers, one pair for each of the nine branches.

The RC tree data is converted to a set of analytical expressions describing the delays from the root node to each leaf node. This set of analytical expressions,

in addition to the initial set of vectors and the objective function, are the inputs to the optimization routine. In order to initialize the downhill simplex algorithm not just one starting point but (n+1) different arbitrary vectors are required. The n-dimensional initialization vectors are not permitted to lie along a straight line. The other input, the objective function, is the single value being minimized. Two useful objective functions appropriate for a repeater insertion algorithm are 1) to minimize the delay from the trunk node to the leaf nodes such as in data paths with multiple fanout points and 2) to target the delay to each node such as in a clock signal path within a clock distribution network [70]. The former objective is specified by minimizing the average delay at each leaf node while the latter objective function minimizes the standard deviation of the predicted delay minus the target delay at each leaf node. In the example RC tree shown in Figure 5.1 and in the example RC trees listed in Tables 5.1 and 5.2, the chosen objective function is minimizing the average of the delays from the root of the tree to each of the leaf nodes of the RC tree. This objective function tends to minimize the delay through the trunk of the RC tree.

The results of the downhill simplex optimization method on uniform repeater insertion in an RC tree are summarized in Section 5.3. The downhill simplex optimization produces a repeater implementation between 10 to 20% faster (with respect to the total path delay) than the application of the locally optimal repeater insertion methodology. In addition, the accuracy of the system of inserted repeaters implemented by the downhill simplex method is generally within 10% of SPICE. The RC tree shown in Figure 5.1 is also shown in Figure 5.7 after the global insertion algorithm has been performed. Note the decrease in circuit area (i.e., the total number of repeaters) and an approximately 20% decrease in

path delay as compared to the circuit implemented by the local repeater insertion methodology as shown in Figure 5.5.



Figure 5.7: The RC tree shown in Figure 5.1 synthesized by the global repeater insertion system. The transistor widths are shown below the first repeater of each branch, and the number of repeaters per branch is shown inside the last repeater of each branch.

The downhill simplex algorithm utilizes a "greedy" methodology. Therefore, the solution process can become trapped in a local minimum. Even a modestly sized RC tree may contain many minima, some solutions of which may be quite distant from the globally optimal solution. However, many minima may be quite close to the global minimum. Regardless, one method of compensating for the greediness of the downhill simplex optimization algorithm is through the application of simulated annealing [69].

The downhill simplex method has been integrated with a simulated annealing algorithm, permitting the repeater insertion algorithm to search for alternative solutions to the nearest local minimum. In order to implement the simulated annealing algorithm, a random thermal excitation is added (subtracted) to (from)

the objective function. In this manner, a local minimum can be avoided due to the added excitation that moves the next possible choice to a different region within the design space. If the initial annealing temperature is set to zero, the simulated annealing method reduces to the original downhill simplex method as described above. For this reason, the simulated annealing method can be considered a superset of the downhill simplex method. Another issue in simulated annealing is that an optimal non-zero initial annealing temperature is difficult to determine. However in these analyses, the initial annealing temperature is set to one-third of the output of the objective function of the first initialization vector.

A second important aspect of simulated annealing is the annealing schedule, the rate at which the thermal excitation is reduced to zero. A constant rate of decrease of the temperature is chosen as a simple annealing schedule. The annealing schedule is set to cool to zero degrees in 1000 uniform steps. In general, the simulated annealing method shows little to no delay improvement over the downhill simplex method. Simulated annealing, however, appears to be useful in those cases where several outstanding minima exist between many other ordinary minima. Examples of two contrasting minima configurations are shown in Figure 5.8. Specifically, a function in which the minima are similar is shown in Figure 5.8a. No great improvement can be achieved by applying simulated annealing to this type of function. However, for the function shown in Figure 5.8b, several outstanding minima among many ordinary minima are apparent. The application of simulated annealing to this function may be effective in order to reach these minima. With several outstanding minima, a high probability exists that the solution determined from the application of simulated annealing may be better than that derived from the application of the downhill simplex method. However, the

number and character of the minima characterizing a possible solution of a system of inserted repeaters is typically unknown beforehand [69].



Figure 5.8: Two possible solution spaces for a non-convex function. (a) An objective function with nearly equivalent minima. (b) Several outstanding minima among many ordinary minima.

## 5.3 Effectiveness, Accuracy, and Applications of Repeater Insertion Methodologies

A comparison of the local and global repeater insertion methodologies is presented in this section. The effectiveness of these repeater insertion algorithms are compared to both a classical cascaded buffer system and a completely passive RC tree (no buffers or repeaters). The system of inserted repeaters within the RC tree is also compared to SPICE to quantify the accuracy of the timing model. Circuit applications of the local and global optimization algorithms are also discussed.

### 5.3.1 Applications

As mentioned previously, achieving a specific target delay may be the desired goal rather than minimizing the path delay. The downhill simplex algorithm can be used to determine a repeater insertion implementation for targeting a specific final leaf node delay. The objective function for this case minimizes the sum

of the squares of the difference between the analytically determined delay and the desired target delay. Alternatively, targeting individual branch delays may be desired. In this case, the local optimization algorithm is preferable because the individual branch delays cannot be controlled within the global optimization algorithms. However, the optimization criteria may be significantly more complex than targeting a global delay depending upon how many internal branches exist compared to the number of leaves.

An example of global targeting of delay is implemented on the RC tree shown in Figure 5.1. The target delay from the root node to each of the leaves as specified in the objective criteria is 2.0 ns. The results of the branch target delays are shown in the next section. In some cases, the analytic model is able to effectively satisfy the branch target delay (within 5%), and in other cases, the branch target delay is satisfied within 15%. Once again, this satisfaction of the target delay is due to the greediness of the downhill simplex optimization method. Either a more restrictive objective criterion and/or a different starting point are required to more closely approach the target delay of each branch.

A comparison of run times or order of operations is important. In order to minimize the final branch delay using the local optimization method, a tree with n total branches results in  $n 3 \times 3$  matrices, resulting in a complexity of O(n). For the downhill simplex method for global optimization, an  $n \times n$  matrix is required, resulting in a complexity of  $O(n^2)$ . However, a limit on the rate of convergence of the simplex can be set to reduce the computational run time. The currently implemented version of the simulated annealing algorithm is impractical on very large RC tree topologies since the algorithm exhibits a complexity of  $O(kn^2)$ , where k is related to the annealing schedule.

#### 5.3.2 Accuracy and Effectiveness

The path delay  $t_{PD}$  from the root node to the end of each branch for three different trees is listed in Tables 5.1 and 5.2. The depth and impedance characteristics of each branch of these three trees are listed in the first three columns. The topology of each tree is characterized by the branch naming convention and indentation in the first column. In the fourth column of Table 5.1, the path delay  $t_{passive}$  from the source to the end of each branch is listed. The RC impedances within the passive RC tree are modeled as a  $\pi 3$  distributed load. In the fifth column, the cascaded buffer delay  $t_{buffer}$  from the tree source to each branch is listed. The cascaded buffer system is a series of optimally tapered buffers placed at the input of each branch so as to drive the capacitive load of each branch (without considering the interconnect resistance) [66]. This delay assumes the cascaded buffer system uses a tapering factor of three [60, 65, 66].

The next superior column in Table 5.1 lists similar information for the local branch repeater insertion methodology described in Section 5.1. The results of the downhill simplex and simulated annealing methods described in Section 5.2 are shown in Table 5.2. For the local optimization, the predicted path delay is shown in column six, and the SPICE simulation and the associated error for the repeater insertion implementation are shown in columns seven and eight, respectively. The number and size of the repeaters are shown in columns nine and ten. Note that the maximum deviation of the analytic result from SPICE is 10% with a typical error of 5% or less.

The signal waveforms at the final branch output of the locally optimized repeater system and the optimally tapered buffer system are shown in Figure 5.9. The performance improvement of the repeater system over the tapered buffer system for this example RC tree is in the range of 25% to 33%. The buffer system does not drive the highly resistive lines effectively, hence longer than expected propagation delays and slower rise times are generated, particularly for highly resistive branches such as, for example, branch (2, 2, 1).



Figure 5.9: The delay from the input of the *RC* tree to specific leaves of the tree based on the repeater insertion system as compared to applying optimally tapered buffers. Triplets indicate the leaf nodes as labeled in Figure 5.5.

For the downhill simplex method, similar information is in Table 5.2. A comparison of SPICE simulations of the downhill simplex method exhibits branch delay improvements of up to 25% over the application of the local optimization method. Performance improvements derived from using the downhill simplex method over the local branch optimization algorithm is guaranteed if one of the points of the initial simplex is the final result of the local optimization method. This improvement can be attributed to the reduction in the size of the repeaters which reduces the load capacitance at the branching nodes. Hence, not only is the delay decreased by globally optimizing the system, but the total area (and power) required by the repeater system is reduced when the downhill simplex method is applied as compared to the local branch repeater insertion algorithm.

Note that, on occasion, branches with similar impedance characteristics and parents can have different repeater implementations. This behavior is explained by the simplex solution falling into a nearby minimum, creating a slightly different repeater implementation.

Results of the simulated annealing method are listed in the final five columns of Table 5.2. Although the circuit delay is often smaller using simulated annealing versus the downhill simplex method, there is insufficient evidence to strictly recommend using simulated annealing as the primary repeater optimization method. Rather, simulated annealing is best used to evaluate a specific repeater implementation or to determine other possible repeater implementations. The computational run time required by the current implementation of the simulated annealing algorithm also far outweighs the delay improvements achieved. The run time of various implemented algorithms is discussed below.

The results of the global delay target implementation are shown in Table 5.3. Analytically, the repeater implementation shown in Table 5.3 comes within 10% of the target and under a 10% deviation from SPICE simulations. Simulated annealing is no more effective than the downhill simplex method. Different starting points of the downhill simplex method can be attempted to determine a preferable target implementation.

## 5.3.3 Comparison of Global Optimization to Exhaustive Search

A comparison of the downhill simplex method to an exhaustive search has been performed. The *RC* tree used for comparison is shown in Figure 5.10 and is a three branch section of the tree shown in Figure 5.1. A relatively small tree is used for comparison due to the number of possible repeater implementations.

A tree with b branches has  $(n \times w)^b$  different possible implementations, where n is the different number of repeaters that can be implemented within each branch and w is the number of different possible sizes of each of the uniformly sized repeaters. The number of possible implementations can therefore be enormous, thus the comparison to an exhaustively evaluated solution has been restricted to a tree with three branches. In the exhaustive search, the number of repeaters in each branch ranges from one to ten, and the repeater size in each branch ranges from 1.0  $\mu$ m to 25.0  $\mu$ m in increments of 0.5  $\mu$ m.

The results of the exhaustive search and the global repeater insertion using the downhill simplex method are shown in Table 5.4. The position of each branch within the tree and its RC characteristics are described in columns one through three. The results of applying repeater insertion based on the objective function for the global optimization are shown in columns four through six. The same results for the exhaustive search are shown in the last three columns. The objective function minimizes the average root-to-leaf delay. In this comparison, the results derived from the exhaustive search match almost exactly the results derived from the heuristic search given the restrictions of the repeater size applied during the exhaustive search.

## 5.4 Power Dissipation of Repeaters in RC Trees

Transient power dissipation in repeaters is composed of two components: the dynamic power dissipated by switching the capacitance of the interconnect and the repeaters and the short-circuit power dissipated when an input signal simultaneously turns on both the P-channel and N-channel transistors [14]. Both of these power components are examined in this section.

Figure 5.10: A section of the RC tree shown in Figure 5.1 used to compare the global optimization algorithms versus the exhaustive search.

The dynamic power dissipation is quantified by

$$CV^2f, (5.3)$$

where V is the voltage to which the capacitance is switched, typically  $V_{DD}$ , f is the frequency of the switching activity, and C is the total capacitance being charged and discharged. In the case of a repeater system driving an RC tree, C is the sum of the capacitances of the RC tree plus the sum of the gate and active diffusion capacitances of the transistors within the repeater system.

An expression for the short-circuit power of a CMOS inverter can be approximated by [62]

$$P_{SC} = \frac{1}{2} I_{peak} t_{base} V_{DD} f . (5.4)$$

 $I_{peak}$  is the maximum short-circuit current sourced by the inverter.  $t_{base}$  is the time that the input waveform is switching from the threshold voltage of the P-channel transistor to the threshold voltage of the N-channel transistor and is [62]

$$t_{base} = \left| \ln\left(\frac{V_{TN}}{V_{DD} + V_{TP}}\right) \right| \frac{C + \mathcal{O}_{do}RC}{\mathcal{O}_{do}}.$$
 (5.5)

Therefore, the short-circuit power is

$$P_{SC} = \left| \ln\left(\frac{V_{TN}}{V_{DD} + V_{TP}}\right) \right| \frac{C + \mathcal{V}_{do}RC}{\mathcal{V}_{do}} I_{peak} f V_{DD} . \tag{5.6}$$

The value of  $I_{peak}$  is based on (12) from [71] and is

$$I_{peak} = I_{DSAT} \left(2 - \frac{V_{DD} - V_O(t_{INV})}{V_{DSAT_p}}\right) \left(\frac{V_{DD} - V_O(t_{INV})}{V_{DSAT_p}}\right). \tag{5.7}$$

 $I_{DSAT}$  is the saturation current at the saturation voltage  $V_{DSAT}$ .  $V_O(t_{INV})$  is the output voltage when the input reaches the logic threshold voltage  $V_{INV}$ . In a uniform repeater structure, the short-circuit power in each repeater stage within a branch is the same because the transition times of the waveforms between each repeater and the geometric widths of the transistors making up the repeater are designed to be the same for each stage.

The total power dissipated by the *RC* tree with inserted repeaters as shown in Figure 5.5 is 30.3 mW when operating at a frequency of 100 MHz (as compared to a simulated 36.3 mW). The analytical model of the power dissipation is based on a total switched capacitance of 11.64 pF. A dynamic power of 29 mW and a short-circuit power of 1.3 mW make up the total dissipated power. In this example, the short-circuit power is 4.5% of the dynamic power. However, the relative contribution of short-circuit power to the total transient power is dependent on the number of repeaters in each *RC* branch.

### 5.5 Conclusions

A design system for determining the optimal number and size of uniform repeaters to insert into an RC tree has been described. An accurate timing model based on a short-channel I-V model which considers the shape of the signal waveform is used within this system to achieve a more accurate and efficient repeater implementation. Analytical estimates of the total propagation delay of example *RC* trees with inserted repeaters agree within 10% of SPICE. One local optimization method and two global optimization methods have been implemented. The global optimization method utilizes the downhill simplex algorithm in conjunction with simulated annealing to increase the size of the design space.

Depending upon the application, either delay targeting or delay minimization of the interconnect in RC trees may be appropriate goals. Both of these goals can be accomplished by the application of the repeater insertion methods presented in this chapter. The global repeater insertion algorithm is applied to the total path delay, or the root-to-leaf delays, while the local repeater insertion algorithm is applied to satisfy a specific branch delay objective. Simulated annealing for repeater insertion is discouraged except for small RC tree topologies due to the significant increase in computational run time.

Delay improvements of 25% to 60% over a typical cascaded buffer insertion methodology are achieved by inserting repeaters. The global repeater insertion methodology reduces the propagation delay, circuit area, and power dissipation as compared to the local optimization method. The power dissipation of the inserted repeaters is also examined. The analytically derived estimate of the sum of the dynamic and short-circuit power is within 16% of the total power dissipation derived from SPICE. Thus, an integrated design system for effectively and accurately inserting repeaters into an RC tree is presented in this chapter.

Table 5.1: The size and number of repeaters as determined by the local optimization algorithm for three different RC tree topologies. (The propagation delay is in nanoseconds, # is the number of repeaters in a branch, size is the geometric width of the N-channel device of the uniform repeater for that branch, and the P-channel to N-channel ratio is 3:1.)

|         |             |        |               |              | Local Optimization |          |       |    |                 |
|---------|-------------|--------|---------------|--------------|--------------------|----------|-------|----|-----------------|
| Branch  | R           | C      | $t_{vassive}$ | $t_{buffer}$ | $t_{PD}$           | $t_{PD}$ | Error | #  | Size            |
|         |             |        |               |              | Analytical         | SPICE    |       |    | $\mu\mathrm{m}$ |
| (1,1,0) | 1 ΚΩ        | 1 pF   | 9.0           | 1.95         | 1.05               | 1.16     | 9%    | 11 | 19              |
| (2,1,1) | $400\Omega$ | .05 pF | 9.7           | 1.73         | 1.51               | 1.54     | 2%    | 5  | 23              |
| (3,1,1) | $200\Omega$ | .5 pF  | 9.75          | 2.41         | 1.77               | 1.70     | 4%    | 5  | 25              |
| (3,2,1) | $200\Omega$ | .5 pF  | 9.75          | 2.41         | 1.77               | 1.70     | 4%    | 5  | 25              |
| (3,3,1) | $200\Omega$ | .5 pF  | 9.75          | 2.41         | 1.77               | 1.70     | 4%    | 5  | 25              |
| (2,2,1) | $700\Omega$ | 1 pF   | 9.45          | 2.98         | 1.71               | 1.67     | 2%    | 7  | 18              |
| (2,3,1) | $500\Omega$ | .5 pF  | 9.28          | 2.46         | 1.51               | 1.48     | 2%    | 5  | 19              |
| (3,1,3) | $300\Omega$ | .1 pF  | 9.3           | 2.57         | 1.70               | 1.67     | 2%    | 5  | 9               |
| (3,2,3) | $300\Omega$ | .1 pF  | 9.3           | 2.57         | 1.70               | 1.67     | 2%    | 5  | 9               |
| (1,1,0) | 700Ω        | .8 pF  | 7.95          | 1.36         | .91                | 1.02     | 10%   | 9  | 24              |
| (2,1,1) | $100\Omega$ | .5 pF  | 7.98          | 1.70         | 1.12               | 1.09     | 3%    | 5  | 35              |
| (2,2,1) | $200\Omega$ | .7 pF  | 8.18          | 1.86         | 1.27               | 1.23     | 3%    | 5  | 36              |
| (3,1,2) | $700\Omega$ | .6 pF  | 8.44          | 3.02         | 1.77               | 1.66     | 7%    | 5  | 15              |
| (3,2,2) | $100\Omega$ | .1 pF  | 8.19          | 2.14         | 1.42               | 1.40     | 1%    | 5  | 16              |
| (2,3,1) | 1ΚΩ         | 1.6 pF | 9.70          | 3.87         | 2.02               | 1.97     | 2%    | 11 | 20              |
| (3,1,3) | $300\Omega$ | .5 pF  | 9.79          | 3.72         | 2.33               | 2.22     | 5%    | 5  | 20              |
| (3,2,3) | 600Ω        | .1 pF  | 9.74          | 3.39         | 2.25               | 2.16     | 4%    | 5  | 6               |
| (1,1,0) | 200Ω        | 5 pF   | 5.73          | 2.14         | .85                | .88      | 3%    | 9  | 79              |
| (2,1,1) | 1 ΚΩ        | 1 pF   | 9.34          | 3.62         | 1.87               | 1.90     | 2%    | 9  | 19              |
| (3,1,1) | 400Ω        | .8 pF  | 9.54          | 3.95         | 2.30               | 2.17     | 6%    | 5  | 22              |
| (3,2,1) | 1.5 KΩ      |        | 9.43          | 3.45         | 2.18               | 2.08     | 5%    | 5  | 4               |
| (3,3,1) | 1.5 KΩ      | .1 pF  | 9.43          | 3.45         | 2.18               | 2.08     | 5%    | 5  | 4               |
| (3,4,1) | $400\Omega$ | .8 pF  | 9.54          | 3.95         | 2.30               | 2.17     | 6%    | 5  | 22              |
| (2,2,1) | 2 KΩ        | .5 pF  | 7.45          | 3.61         | 1.79               | 1.82     | 2%    | 9  | 9               |
| (3,1,2) | 800Ω        | .2 pF  | 7.55          | 3.57         | 2.10               | 2.05     | 2%    | 5  | 8               |
| (3,2,2) | 800Ω        | .2 pF  | 7.55          | 3.57         | 2.10               | 2.05     | 2%    | 5  | 8               |
| (2,3,1) |             |        |               | 3.61         | 1.79               | 1.82     | 2%    | 9  | 9               |
| (3,1,3) | 800Ω        | .2 pF  | 7.55          | 3.57         | 2.10               | 2.05     | 2%    | 5  | 8               |
| (3,2,3) | 800Ω        | .2 pF  | 7.55          | 3.57         | 2.10               | 2.05     | 2%    | 5  | 8               |
| (2,4,1) |             |        | 9.34          | 3.62         | 1.87               | 1.90     | 2%    | 9  | 19              |
| (3,1,4) | 400Ω        | .8 pF  | 9.54          | 3.95         | 2.30               | 2.17     | 6%    | 5  | 22              |
| (3,2,4) | 1.5 KΩ      | .1 pF  | 9.43          | 3.45         | 2.18               | 2.08     | 5%    | 5  | 4               |
| (3,3,4) | 1.5 KΩ      | .1 pF  | 9.43          | 3.45         | 2.18               | 2.08     | 5%    | 5  | 4               |
| (3,4,4) | $400\Omega$ | .8 pF  | 9.54          | 3.95         | 2.30               | 2.17     | 6%    | 5  | 22              |

Table 5.2: The size and number of repeaters as determined by the global optimization (downhill simplex and simulated annealing) algorithms for three different RC tree topologies. (The propagation delay is in nanoseconds, # is the number of repeaters in a branch, size is the geometric width of the N-channel device of the uniform repeater for that branch, and the P-channel to N-channel ratio is 3:1.)

|         |                      |        | Downhill Simplex |          |       |    |                 | Simulated Annealing |          |       |    |                 |
|---------|----------------------|--------|------------------|----------|-------|----|-----------------|---------------------|----------|-------|----|-----------------|
| Branch  | R                    | C      | $t_{PD}$         | $t_{PD}$ | Error | #  | Size            | $t_{PD}$            | $t_{PD}$ | Error | #  | Size            |
|         |                      |        | Analytical       | SPICE    |       |    | $\mu\mathrm{m}$ | Analytical          | SPICE    |       |    | $\mu\mathrm{m}$ |
| (1,1,0) | 1 ΚΩ                 | 1 pF   | .88              | .93      | 5%    | 7  | 12.7            | .9                  | .97      | 7%    | 8  | 15.7            |
| (2,1,1) | $400\Omega$          | .05 pF | 1.14             | 1.15     | 1%    | 2  | 5.2             | 1.16                | 1.20     | 3%    | 3  | 8.5             |
| (3,1,1) | $200\Omega$          | .5 pF  | 1.53             | 1.61     | 5%    | 4  | 5.7             | 1.46                | 1.49     | 4%    | 2  | 8.7             |
| (3,2,1) | $200\Omega$          | .5 pF  | 1.51             | 1.57     | 4%    | 2  | 6.0             | 1.44                | 1.48     | 3%    | 4  | 10.6            |
| (3,3,1) | $200\Omega$          | .5 pF  | 1.51             | 1.58     | 4%    | 2  | 5.9             | 1.49                | 1.54     | 3%    | 2  | 7.4             |
| (2,2,1) | 700Ω                 | 1 pF   | 1.67             | 1.76     | 5%    | 6  | 7.3             | 1.63                | 1.69     | 4%    | 7  | 9.1             |
| (2,3,1) | $500\Omega$          | .5 pF  | 1.35             | 1.45     | 7%    | 3  | 7.2             | 1.35                | 1.40     | 4%    |    | 10.7            |
| (3,1,3) | $300\Omega$          | .1 pF  | 1.48             | 1.53     | 3%    | 2  | 4.1             | 1.46                | 1.46     | 0%    | 2  | 6.6             |
| (3,2,3) | $300\Omega$          | .1 pF  | 1.52             | 1.59     | 4%    | 2  | 2.6             | 1.46                | 1.45     | 1%    | 2  | 6.7             |
| (1,1,0) | $700\Omega$          | .8 pF  | .72              | .85      | 15%   | 7  | 16.4            | .71                 | .79      | 10%   | 7  | 18.7            |
| (2,1,1) | $100\Omega$          | .5 pF  | .98              | 1.08     | 9%    | 2  | 8.4             | .94                 | 1.01     | 7%    | 2  | 9.7             |
| (2,2,1) | $200\Omega$          | .7 pF  | 1.05             | 1.14     | 8%    | 3  | 17.1            | 1.08                | 1.14     | 5%    | 2  | 15.5            |
| (3,1,2) | $700\Omega$          | .6 pF  | 1.56             | 1.61     | 3%    | 5  | 9.6             | 1.58                | 1.57     | 1%    | 5  | 10.8            |
| (3,2,2) | $100\Omega$          | .1 pF  | 1.13             | 1.20     | 6%    | 2  | 6.4             | 1.17                | 1.15     | 2%    | 2  | 6.3             |
| (2,3,1) | 1ΚΩ                  | 1.6 pF | 1.79             | 1.80     |       | 11 | 17.2            | 1.79                | 1.83     | 2%    |    | 15.3            |
| (3,1,3) | $300\Omega$          | .5 pF  | 2.10             | 2.11     | 0%    | 2  | 11.3            | 2.09                | 2.09     | 0%    |    | 13.0            |
| (3,2,3) | $600\Omega$          | .1 pF  | 1.94             | 1.93     | 1%    | 2  | 5.6             | 1.95                | 1.95     | 0%    | 2  | 5.0             |
| (1,1,0) | $200\Omega$          | 5 pF   | .82              | .86      | 5%    | 8  | 75.4            | .83                 | .90      | 8%    | 9  | 63.1            |
| (2,1,1) | 1 ΚΩ                 | 1 pF   | 1.72             | 1.77     | 3%    | 8  | 15.4            | 1.75                | 1.86     | 6%    | 8  | 12.6            |
| (3,1,1) | $400\Omega$          | .8 pF  | 2.19             | 2.22     | 1%    | 5  | 10.9            | 2.33                | 2.47     | 6%    | 4  | 7.1             |
| (3,2,1) | 1.5 KΩ               | .1 pF  | 1.9              | 1.95     | 3%    | 3  | 3.1             | 2.08                | 1.99     | 5%    | 4  | 13.1            |
| (3,3,1) | 1.5 KΩ               | .1 pF  | 1.9              | 1.95     | 3%    | 3  | 3.1             | 2.17                | 2.36     | 8%    | 8  | 1.8             |
| (3,4,1) | $400\Omega$          | .8 pF  | 2.19             | 2.22     | 1%    | 5  | 10.8            | 2.40                | 2.63     | 10%   | 3  | 5.9             |
| (2,2,1) | 2 KΩ                 | .5 pF  | 1.69             | 1.72     | 2%    | 8  | 7.5             | 1.9                 | 1.99     | 7%    | 7  | 8.0             |
| (3,1,2) | $\Omega$ 008         | .2 pF  | 1.99             | 2.01     | 1%    | 4  | 4.7             | 2.36                | 2.40     | 2%    | 13 | oxdot           |
| (3,2,2) | 800Ω                 | .2 pF  | 1.99             | 1.96     | 2%    | 3  | 5.1             | 2.18                | 2.02     | 8%    | 2  | 21.6            |
| (2,3,1) | $2~\mathrm{K}\Omega$ | .5 pF  | 1.70             | 1.74     | 3%    | 8  | 7.2             | 1.70                | 1.79     | 5%    | 7  | 7.2             |
| (3,1,3) | 800Ω                 | .2 pF  | 1.99             | 1.97     | 1%    | 3  | 5.4             | 2.02                | 2.06     | 2%    | 5  | 5.3             |
| (3,2,3) | $\Omega$ 008         | .2 pF  | 1.99             | 1.97     | 1%    | 3  | 5.3             | 2.01                | 2.06     | 2%    | 2  | 4.6             |
| (2,4,1) | 1 ΚΩ                 | 1 pF   | 1.75             | 1.77     | 1%    | 8  | 14.8            | 1.84                | 2.04     | 10%   | 13 | 15.3            |
| (3,1,4) | $400\Omega$          | .8 pF  | 2.21             | 2.21     | 0%    | 4  | 11.2            | 2.32                | 2.50     | 7%    | 8  | 14.0            |
|         | 1.5 KΩ               | .1 pF  | 2.02             | 1.96     | 3%    | 3  | 3.0             | 2.23                | 2.19     | 2%    | _  | 13.5            |
| (3,3,4) | 1.5 KΩ               | .1 pF  | 2.03             | 1.96     | 3%    | 3  | 3.0             | 2.25                | 2.26     | 0%    |    | 11.4            |
| (3,4,4) | 400Ω                 | .8 pF  | 2.19             | 2.42     | 10%   | 4  | 10.4            | 2.33                | 2.50     | 7%    | 8  | 14.1            |

Table 5.3: The size and number of repeaters as determined by the global optimization algorithm to meet a terminal branch target delay of 2.0 ns for the given RC tree topologies. (The propagation delay is in nanoseconds, # is the number of repeaters in a branch, size is the geometric width of the N-channel device of the uniform repeater for that branch, and the P-channel to N-channel ratio is 3:1.)

|         |             |        | Downhill Simplex |          |       |   |                 |  |  |  |
|---------|-------------|--------|------------------|----------|-------|---|-----------------|--|--|--|
| Branch  | R           | C      | $t_{PD}$         | $t_{PD}$ | Error | # | Size            |  |  |  |
|         |             |        | Analytical       | SPICE    |       |   | $\mu\mathrm{m}$ |  |  |  |
| (1,1,0) | 1 ΚΩ        | 1 pF   | 1.37             | 1.55     | 10%   | 3 | 5.31            |  |  |  |
| (2,1,1) | 400Ω        | .05 pF | 1.60             | 1.52     | 5%    | 3 | 3.76            |  |  |  |
| (3,1,1) | $200\Omega$ | .5 pF  | 2.05             | 2.05     | 0%    | 3 | 4.39            |  |  |  |
| (3,2,1) | $200\Omega$ | .5 pF  | 2.06             | 2.07     | 1%    | 3 | 4.21            |  |  |  |
| (3,3,1) | $200\Omega$ | .5 pF  | 2.07             | 2.09     | 1%    | 3 | 4.04            |  |  |  |
| (2,2,1) | 700Ω        | 1 pF   | 2.22             | 2.22     | 0%    | 5 | 6.34            |  |  |  |
| (2,3,1) | 500Ω        | .5 pF  | 1.95             | 1.92     | 2%    | 4 | 4.83            |  |  |  |
| (3,1,3) | 300Ω        | .1 pF  | 2.11             | 2.06     | 3%    | 3 | 3.63            |  |  |  |
| (3,2,3) | $300\Omega$ | .1 pF  | 2.11             | 2.06     | 3%    | 3 | 3.53            |  |  |  |

Table 5.4: Repeater insertion as determined by the downhill simplex method and an exhaustive search for the *RC* tree shown in Figure 5.10.

|         |      |       | Down          | hill Simplex |                 | Exhaustive    |           |         |  |
|---------|------|-------|---------------|--------------|-----------------|---------------|-----------|---------|--|
| Branch  | R    | C     | $t_{PD}$ (ns) | # of         | Size            | $t_{PD}$ (ns) | # of      | Size    |  |
|         |      |       | Analytical    | Repeaters    | $\mu\mathrm{m}$ | Analytical    | Repeaters | $\mu$ m |  |
| (1,1,0) | 1 ΚΩ | 1 pF  | .88           | 8            | 15.9            | .88           | 8         | 16.0    |  |
| (2,1,1) | 700Ω | 1 pF  | 1.55          | 7            | 13.5            | 1.55          | 7         | 13.5    |  |
| (2,2,1) | 500Ω | .5 pF | 1.27          | 4            | 10.8            | 1.27          | 4         | 10.5    |  |

## Chapter 6

## Conclusions

The rapid improvement in integrated circuit technology has provided technological and design challenges at every turn. Transistors have become small and fast enough that complex functions can be implemented on a single chip. Because of these increasingly higher levels of integration, on-chip interconnections have become so long as to limit the operating frequency of circuits.

Although technological improvements such as copper interconnect [72, 73] may temporarily slow the effect that interconnect delay will impose on integrated circuits, circuit-level solutions are required to control interconnect delay. In [32], Bakoglu presents several circuit-level methods to reduce interconnect delay. One example is the use of a CMOS inverter inserted along the interconnect to reduce the quadratic increase of the resistive-capacitive impedance characteristics of the interconnect.

In order to accurately describe the effect of a repeater driving an RC load, both the interconnect impedance and repeater need to be properly modeled; in addition, these individual models should be correctly integrated to provide an overall repeater-interconnect delay model. Furthermore, the short-channel nature of the transistors require a set of I-V equations that are more accurate than

the classical Shichman-Hodges expressions [31]. The Sakurai  $\alpha$ -power law model [33] can be used to accurately describe short-channel transistors while providing tractable expressions suitable for developing useful design expressions. The expressions governing the operation of the repeater have been integrated with a lumped RC interconnect model to accurately represent a repeater driving a section of interconnect.

The average error of this model has been demonstrated in this dissertation to be less than 10% versus SPICE for a wide range of RC loads. This repeater-interconnect model forms the basis for the repeater insertion methodology and algorithms used in both RC lines and trees. The transient power dissipation of the repeater-interconnect model has also been investigated. While dynamic power dissipation is the straightforward product of the frequency, the square of the voltage swing, and the load capacitance, the short-circuit power is also strongly dependent upon the shape of the input signal. An expression describing the short-circuit power dissipation of a repeater driving an RC load is presented. The error of this expression is less than 15% for a wide variety of RC loads.

The original model of an inverter driving a section of RC interconnect has been expanded to describe the delay of a chain of repeaters. The deviation of the model from SPICE is typically under 10%. Short-circuit power in repeater chains has also been explored. It is shown that when the size of the repeater is small, the contribution of short-circuit power versus dynamic power is between 1% to 5%. Short-circuit power, however, can reach up to 30% of the dynamic power dissipation in repeater chains. While dynamic power dissipation scales linearly with repeater size, short-circuit power dissipation changes non-linearly with the size and number of repeaters in a repeater chain.

Finally, an expression for the delay of a repeater chain is an integral element when optimizing the process for inserting repeaters in an RC tree. Local and global optimization techniques have been developed, and both are described for minimizing the delay in an RC tree. The local optimization method is useful for minimizing or targeting the delay on a branch-by-branch basis, an important capability for building high performance RC trees such as clock distribution networks. On the other hand, a global optimization technique is useful for minimizing or targeting the delay from the source of an RC tree to the leaf nodes of the tree which are important for circuit structures such as high fanout data networks. These algorithms exhibit a 25-60% improvement over a typical cascaded buffer system and a deviation from SPICE typically under 10%.

The research presented in this thesis describes a new method for driving high speed interconnect. Although there has been previous research in repeater insertion to reduce interconnect delay, to the author's knowledge, this is the first research to model repeater insertion with a short-channel transistor model representative of current technologies and to apply this model to both RC lines and trees. The research presented here provides the basis for faster, more energy efficient CMOS-based integrated circuits.

## Chapter 7

## **Future Research**

The possibilities for future research in repeater insertion in RC trees are many. In this chapter, possible future work is broken into four categories: improvements in the circuit model; investigation of different methods of optimization; development of cost criteria; and real world implementation issues. Each of these four categories are discussed below.

## 7.1 Model Improvements

The tradeoff between model simplicity and accuracy always exists, and the model presented in this dissertation maintains a good balance between these two characteristics. However, in order to develop a more general model, several improvements to the transistor-interconnect model are necessary. These improvements include: modeling a repeater driving an RC load with a slow ramp input; consideration of the saturation region; and a multi-segment RC interconnect model.

## 7.1.1 Modeling a Repeater Driving an RC Load with a Slow Ramp Input Signal

The repeater model described in Chapter 3 assumes a step input, or that the inputs are very fast, approaching a step input signal. This model permits some further simplifying assumptions to be made to the differential equations describing the inverter-interconnect interaction. These equations also work with a fast ramp input signal as well. Previous research has presented analytical solutions of a slow ramp input signal driving an inverter loaded by a capacitor [33, 74]. These solutions are complex and become analytically intractable with a resistive-capacitive load. In addition, the  $\alpha$ -power law model does not accurately model a slow input signal. This inaccuracy is caused by both transistors being on during a slow ramp input signal. Thus, either a different device model such as in [17] is necessary, or the resulting data would have to be determined numerically.

Although currently the repeater model described in Chapter 3 exhibits accuracy to within 10% of SPICE, future work in repeater models should include analytical or numerical expressions that describe the output of an RC loaded inverter with a slow ramp input signal to increase the overall generality of the repeater delay model.

#### 7.1.2 Consideration of Saturation Region

The Sakurai  $\alpha$ -power law I-V equations have been chosen to model the repeater system for its ability to accurately describe short-channel operation and because these equations are relatively tractable in the analysis of more complex circuits while accurately modeling the large signal I-V behavior of short-channel transistors. This capability permits the development of a model that characterizes the system of a repeater driving an RC load. It was shown in Chapter 3

that with a step input, repeaters driving an RC load operate predominantly in the linear region. With a ramp input, the repeater is expected to operate in the saturation region for a longer period of time. Therefore, the repeater operating in the saturation region may need additional examination.

Although the equation characterizing the saturation region portion of the  $\alpha$ power law is relatively simple, there is difficulty in using this I-V equation to
analyze the interconnect model presented in Chapter 3. The  $\alpha$ -power law equation
that describes the saturation region is

$$I_D = I_{D0} \left( \frac{V_{GS} - V_T}{V_{DD} - V_T} \right)^{\alpha} (7.1)$$

Note that  $V_{DS}$  does not appear in this saturation I-V equation. The only output current dependent bias voltage is the gate-to-source input voltage  $V_{GS}$ . A strategy is therefore needed to include the effect of the load capacitance and resistance which is considered in  $V_{DS}$  when developing a repeater model that considers the saturation region, as shown in Figure 7.1.



Figure 7.1: The N-channel transistor of a CMOS inverter driving a large RC load representative of a long interconnect.  $V_{DS}$  is the output voltage of the operating transistor of a repeater.

In order to determine the effect of  $V_{DS}$  on  $I_{DS}$  in the saturation region, the

following equation may be useful [75],

$$I_{DS} = \frac{K}{2} (V_{GS} - V_T)^{\alpha} [1 + \lambda V_{DS}] \qquad (7.2)$$

This equation describes channel length modulation in short-channel transistors.  $\lambda$  is known as the the channel length modulation factor and describes the degree to which channel length modulation affects the current drive capability of the transistor in the saturation region. Utilizing this expression for the drain current in the saturation region, the operational characteristics of the transistor can be better captured in the repeater model. The challenge of integrating the two expressions, one for linear and one for saturation, into an overall repeater delay model still remains.

### 7.1.3 Improved RC Model

Sakurai presents in [26] interconnect models that provide a more accurate approximation of an interconnect than a lumped load interconnect model. These interconnect models are described in Chapter 2. Implementing multiple section RC models result in multiple pole solutions in the time domain. Thus, no direct analytical expression to determine the output response for a given input signal to a repeater multi-pole-interconnect section has been found. Possible solutions to this problem are to use the dominant pole technique used in analysis tools such as RICE [76] or AWE [77–82] or to determine an "effective capacitance" such as described in [83]. These methods yield fairly accurate solutions when doing analysis; however, these methods may be too complex to develop useful closed form design expressions.

## 7.2 Optimization Algorithms

Although optimization is not the primary focus of the research presented in this dissertation, an investigation of other optimization methods to accomplish the repeater insertion task is worthwhile future work. The order of complexity and robustness of the optimization algorithm are two critical factors in addition to the ease of implementation in terms of choosing which algorithms are preferable to accomplish the repeater insertion task.

For example, in order to determine the minimum average delay from the root node to the leaf nodes of an RC tree, the optimal implementation could be derived directly from the gradient of the function. However, the gradient is difficult to determine and does not lend itself well to automatic implementation. In addition, if the cost criteria (or the objective function) become more complex than the average leaf delay, then the gradient may be impossible to determine. Thus, a more flexible optimization method may be necessary to determine the optimal repeater system. Although there are many optimization algorithms, an improvement to simulated annealing and dominant frontier are discussed in this section.

### 7.2.1 Simulated Annealing

Improving the application of simulated annealing to the repeater insertion methodology is one area for future research. A possible variation on simulated annealing is Large-Step Markov Chains [84]. In this case, the optimization can be viewed as a zero or low temperature simulated annealing algorithm over the local minimum determined by a greedy optimization algorithm. Again, the difficulty of the "kick move" size, *i.e.*, how large a step toward a different minima, must be empirically determined.

#### 7.2.2 Dominant Frontier

The dominant frontier [84] is a pseudo-exhaustive method of optimization in which the problem is partitioned into discrete sections with each section being optimized for a condition that describes an interaction with another section. In the case of the dominant frontier method, each branch of an RC tree is characterized by a triplet: 1) an upstream capacitance determined by the repeater size inserted into the branch, 2) a downstream capacitance caused by inserting repeaters in the child branches, and 3) a delay determined by both the upstream and downstream capacitances and the optimal repeater insertion. Thus, the triplet is represented by  $(C_{L_{up}}, C_{L_{down}}, Delay)$ . The goal of this algorithm is to determine a set of triplets that will optimize the repeater insertion for a given objective function.

## 7.3 Calculation of the Overall Cost of Inserting Repeaters

As mentioned previously, high performance VLSI circuits comprise not only high speed operation but other criteria such as power and area. Expressions for determining both the dynamic and short-circuit power dissipation are presented in Chapter 3. Implementation of concurrent optimization of area and dynamic power dissipation would be a useful improvement upon the existing work.

## 7.4 Development of a CAD Tool

This section describes several areas of research in which repeaters can be more effectively inserted to satisfy a target performance goal. In addition, there are areas of research which are required for implementation in industry. Wire sizing and simultaneous wire sizing and repeater insertion, placement information, and

improved reliability are all issues that are applicable to a repeater insertion CAD tool.

#### 7.4.1 Simultaneous Wire Sizing

Wire sizing is a method to tradeoff the capacitance of a line with the resistance of a line. In particular, wire sizing becomes useful when reducing the fringing/coupling capacitance between adjacent interconnect lines. Wire sizing and simultaneous repeater insertion and wire sizing have both been discussed within the literature [51, 85–96]; however, the repeaters and wire sizes are typically restricted to discrete library elements or widths, respectively.

In order to implement wire sizing within the framework of a repeater insertion system, geometric information about the interconnect is required. In addition, extraction data describing the coupling/fringe capacitances is also necessary. With this information, simultaneous wiring sizing integrated with repeater insertion would be a worthwhile research result.

## 7.4.2 Including Placement Information

It may occur that a repeater, due to lack of physical space, can not be inserted in a location recommended by the algorithms presented in this paper since a blockage caused by other circuitry may exist. Given information describing the blockage circuitry, the repeater could be moved to a different position and resized to retain acceptable delay characteristics. If a continuous optimization strategy is used, excessive degrees of freedom may be developed. Therefore the repeater selection should preferably come from a restricted library.

A possible flow for developing the clock distribution network of an integrated circuit that includes repeater insertion is show in Figure 7.2. The process would

be iterative with the first pass providing an estimate of the size and number of repeaters based on the initial preliminary information provided by the design system. A second pass of the repeater insertion algorithm would provide more refined repeater specifications given detailed placement information.



Figure 7.2: The clock distribution network design flow of an integrated circuit modified to include repeater insertion.

#### 7.4.3 Clock Signal Variations

Reliable operation of a clock tree with repeaters must consider two important problems in clock signal distribution: clock skew and clock jitter. The effects of repeater systems on clock skew and clock jitter are explained below.

Clock Jitter: Reliable synchronous circuit operation depends upon a consistent clock signal frequency. The fluctuation of the clock frequency at a specific location within an integrated circuit is clock jitter. Formally, the clock signal is a periodic signal s(t) with period T, such that T is the smallest value for which the following equation holds,

$$s(t) = s(t + nT), \tag{7.3}$$

where n is an integer specifying a period number. Ideally, the clock period T is a constant value at all times, i.e.,  $\frac{\partial T}{\partial t} = 0$ . If the value of this derivative is non-zero, then the clock signal exhibits a variable period—this phenomenon is known as clock jitter [97]. A timing diagram of clock jitter is shown in Figure 7.3.



Figure 7.3: A variation in the period of the clock signal T is clock jitter.

There are several sources of clock jitter. One source of clock jitter is the off-chip clock source. Another possible source of clock jitter is caused by simultaneous switching noise [98–100]. In large CMOS IC's, thousands of transistors may

switch almost simultaneously drawing a large current from the power supply. This  $L\frac{di}{dt}$  noise voltage can cause clock jitter in integrated circuits. Furthermore, by inserting repeaters into the clock distribution system, additional switching noise is added. The active transistors within the repeaters are also affected by the power supply noise, thus possibly further increasing jitter.

The repeater models described in this dissertation permit computation of the maximum current required by the repeater system from the power supply. This information allows a range of voltages under which the power distribution system can be designed. However the effect of power supply noise on the switching characteristics of the repeaters is less clear. Future work could include studying the reliable operation of the active transistors in a repeater system with a noisy power supply to control clock jitter.

#### Clock Signal Distribution and Process Parameter Variations

Clock skew: The variation in clock signal arrival times to different clocked elements is clock skew [57,64]. Furthermore, clock skew can be broken up into two categories, global (chip-wide) and local. Local clock skew is the clock skew that occurs between two sequentially adjacent clocked elements [70]. Local clock skew can cause catastrophic circuit failure when

$$T_{skew} \ge T_{clock} - T_{PD}$$
 (7.4)

or

$$|T_{skew}| \ge T_{PD} . (7.5)$$

In these equations,  $T_{skew}$  is the clock skew,  $T_{clock}$  is the minimum clock period, and  $T_{PD}$  is the propagation delay of the logic between the two clocked register elements [70]. A circuit schematic representing elements affected by global and



Figure 7.4: Clock signal distribution in integrated circuits: (a) Schematic of the clock distribution network with three clocked elements x, y, z (b) The variation in clock arrival times between sequentially non-adjacent registers (x and y) and (x and z) is global clock skew, between sequentially adjacent registers (y and z) is local clock skew.

local clock skew is shown in Figure 7.4a with their respective clock signals shown in Figure 7.4b.

One cause of clock skew is process parameter variations. Process parameter variations are inevitable in any fabrication process. However, with the addition of repeaters, process parameter variations may pose a greater problem in clock signal distribution. With a clock tree composed solely of passive interconnect (i.e., no repeaters), process parameter variations will affect the clock tree wire widths, changing the value of the resistance and capacitance of the interconnect line. However, the effects of process parameter variations on the active repeaters may be more significant, causing the expected delay of a path in a clock tree to shift out of the operational bounds dictated by the acceptable clock skew of that local data path.

The effect of process parameter variations on clock skew in repeater systems could be examined through a variety of methods. One method is through Monte Carlo simulations to determine a typical range of signal delay by a clock tree utilizing repeaters [101]. Another method is to statistically analyze the results of a repeater insertion within an RC tree by the methodology presented in this dissertation. Since the performance of a repeater system can be estimated by an analytical expression, the parameters that define the performance of the repeater system can be arranged in a vector  $\mathbf{x}$ . Each parameter in the vector is composed of two components: the expected or designed component  $\mathbf{d}$ , and the process variable component  $\mathbf{s}$ . This vector expression can be written as

$$\mathbf{x} = \mathbf{d} + \mathbf{s} \tag{7.6}$$

The design space can be explored by setting d to the desired nominal values and varying s from the worst case to the best case values [102]. With this information, the margin of operation that needs to be satisfied to create a high yield circuit can be determined.

#### 7.5 Conclusions

There remain a number of research issues to be explored in the area of repeater insertion with respect to the research presented in this dissertation. This chapter summarizes future research in the development of repeater insertion for more general modeling. In addition, other possible optimization methods are presented. Finally, some of the hurdles of implementing repeater insertion in an industrial CAD tool are discussed and explored.

## **Bibliography**

- R. H. Dennard, F. H. Gaensslen, H. N. Yu, V. L. Rideout, E. Bassous, and A. R. LeBlanc, "Design of Ion-implanted MOSFET's with Very Small Physical Dimensions," *IEEE Journal of Solid-State Circuits*, Vol. SC-9, pp. 256– 268, May 1974.
- [2] B. Davari, R. H. Dennard, and G. G. Shahidi, "CMOS Scaling for High Performance and Low Power-The Next Ten Years," *Proceedings of the IEEE*, Vol. 83, pp. 595-606, April 1995.
- [3] H. B. Bakoglu and J. D. Meindl, "Optimal Interconnect Circuits for VLSI," Proceedings of the IEEE International Solid-State Circuits Conference, pp. 164-165, February 1984.
- [4] J. Bardeen and W. H. Brattain, "The Transistor, A Semiconductor Triode," *Physical Review*, Vol. 74, No. 2, p. 230, July 1948.
- [5] W. Shockley, *Electrons and Holes in Semiconductors*. D. Van Nostrand Co. Inc., 1950.
- [6] W. Shockley, "A Unipolar 'Field-Effect' Transistor," *Proceedings of the I.R.E.*, Vol. 40, pp. 1365-1376, November 1952.
- [7] S. M. Sze, *Physics of Semiconductor Devices*. John Wiley and Sons. Inc., 1969.
- [8] G. C. Dacey and I. M. Ross, "The Field-Effect Transistor," Bell Systems Technical Journal, Vol. 34, No. 6, pp. 1149-1189, November 1955.
- [9] M. M. Atalla, M. Tannenbaum, and E. J. Scheibner, "Stabilization of Silicon Surface by Thermally Growing Oxides," *Bell Systems Technical Journal*, Vol. 38, No. 3, pp. 749-783, May 1959.
- [10] J. A. Hoerni, "Planar Silicon Transistors and Diodes," Proceedings of the IRE International Electron Device Meeting, Vol. 14, p. 9, October 1961.
- [11] C.-T. Sah, "Evolution of the MOS Transistor From Conception to VLSI," Proceedings of the IEEE, Vol. 76, No. 10, pp. 1280–1326, October 1988.

- [12] J. S. Kilby, "Invention of the Integrated Circuit," *IEEE Transactions on Electron Devices*, Vol. ED-23, No. 7, pp. 648-654, July 1976.
- [13] J. R. Burns, "Switching Response of Complementary-Symmetry MOS Transistor Logic Circuits," RCA Review, Vol. 25, No. 4, pp. 627-661, December 1964.
- [14] H. J. M. Veendrick, "Short-Circuit Dissipation of Static CMOS Circuitry and Its Impact on the Design of Buffer Circuits," *IEEE Journal of Solid-State Circuits*, Vol. SC-19, No. 4, pp. 468-473, August 1984.
- [15] G. A. S. Halasz, M. R. Wordeman, D. P. Kern, E. Ganin, S. Rishton, D. S. Zicherman, H. Schmid, M. R. Polcari, H. Y. Ng, P. J. Restle, T. H. P. Chang, and R. H. Dennard, "Inverter Performance of Deep Submicron MOSFETs," *IEEE Electron Device Letters*, Vol. EDL-9, No. 12, pp. 633-635, December 1988.
- [16] H.-J. Park and M. Soma, "Analytical Model for Switching Transitions of Submicron CMOS Logics," *IEEE Journal of Solid-State Circuits*, Vol. SC-32, No. 6, pp. 880-889, June 1997.
- [17] S. Dutta, S. S. M. Shetti, and S. L. Lusky, "A Comprehensive Delay Model for CMOS Inverters," *IEEE Journal of Solid-State Circuits*, Vol. SC-30, No. 8, pp. 864-871, August 1995.
- [18] A. Bhavnagarwala, V. De, B. Austin, and J. Meindl, "Optimal Circuit Design for Low Power CMOS GSI," Proceedings of the IEEE International ASIC Conference and Exhibit, pp. 313-316, September 1996.
- [19] M. Hafed and N. Rumin, "CMOS Inverter Current and Delay Model Incorporating Interconnect Effects," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. VI.86-89, May 1998.
- [20] J. Kong, S. Hussain, and D. Overhauser, "Improving Digital MOS Macro-model Accuracy," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 1.578-581, April 1995.
- [21] J. Kong and D. Overhauser, "Combining RC-Interconnect Effects with Non-linear MOS Macromodeling," *Proceedings of the IEEE International Symposium on Circuits and Systems*, pp. 1.570-573, April 1995.
- [22] J.-T. Kong and D. Overhauser, "Methods to Improve Digital MOS Macromodel Accuracy," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. CAD-14, No. 7, pp. 868-881, July 1995.

- [23] H. B. Bakoglu and J. D. Meindl, "Optimal Interconnection Circuits for VLSI," *IEEE Transactions on Electron Devices*, Vol. ED-32, No. 5, pp. 903-909, May 1985.
- [24] W. W. Happ and S. C. Gupta, "Time-Domain Analysis and Measurement Techniques for Distributed RC Structures I. Analysis in the Reciprocal Time Domain," Journal of Applied Physics, Vol. 40, No. 1, pp. 109–122, January 1969.
- [25] R. J. Antinone and G. W. Brown, "The Modeling of Resistive Interconnects for Integrated Circuits," *IEEE Journal of Solid-State Circuits*, Vol. SC-18, No. 2, pp. 200-203, April 1983.
- [26] T. Sakurai, "Approximation of Wiring Delay in MOSFET LSI," *IEEE Journal of Solid-State Circuits*, Vol. SC-18, No. 4, pp. 418-426, August 1983.
- [27] M. Shoji, Theory of CMOS Digital Circuits and Circuit Failures. Princeton University Press, 1992.
- [28] J. T. Wallmark, "Noise Spikes in Digital VLSI Circuits," *IEEE Transactions on Electron Devices*, Vol. ED-29, No. 3, pp. 451-458, March 1982.
- [29] D. Li, A. Pua, P. Srivastava, and U. Ko, "A Repeater Optimization Methodology for Deep Sub-Micron, High-Performance Processor," *Proceedings of the IEEE Conference on Computer Design*. pp. 726-731, Oct. 1996.
- [30] A. Vladimirescu and S. Liu, "The Simulation of MOS Integrated Circuits Using SPICE2," ERL Memo M80/7, University of California, Berkeley, October 1980.
- [31] H. Shichman and D. A. Hodges, "Modeling and Simulation of Insulated-Gate Field-Effect Transistor Switching Circuits," IEEE Journal of Solid-State Circuits, Vol. SC-3, No. 3, pp. 285-289, September 1968.
- [32] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI. Addison-Wesley Publishing Company, 1990.
- [33] T. Sakurai and A. R. Newton, "Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas," *IEEE Journal of Solid-State Circuits*, Vol. SC-25, No. 2, pp. 584-594, April 1990.
- [34] B. G. Streetman, Solid State Electronic Devices. Prentice Hall, Inc., 1995.
- [35] L. Bisdounis, S. Nikolaidis, O. Koufopavlou, and C. E. Goutis, "Modeling the CMOS Short-Circuit Power Dissipation," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 4.469-4.472, May 1996.

- [36] A. M. Hill and S.-M. Kang, "Statistical Estimation of Short-Circuit Power in VLSI Circuits," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 4.105-4.108, May 1996.
- [37] A. Hirata, H. Onodera, and K. Tamaru, "Estimation of Short-Circuit Power Dissipation and Its Influence on Propagation Delay for Static CMOS Gates," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 4.751-4.754, May 1996.
- [38] V. Adler and E. G. Friedman, "Delay and Power Expressions for a CMOS Inverter Driving a Resistive-Capacitive Load," *Proceedings of the IEEE International Symposium on Circuits and Systems*, pp. 4.101-4.104, May 1996.
- [39] S. R. Vemuru and N. Scheinberg, "Short-Circuit Power Dissipation Estimation for CMOS Logic Gates," IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, Vol. CAS I-41, No. 11, pp. 762-766, November 1994.
- [40] T. Sakurai and A. R. Newton, "Delay Analysis of Series-Connected MOS-FET Circuits," *IEEE Journal of Solid-State Circuits*, Vol. SC-26, No. 2, pp. 122-131, February 1991.
- [41] C. Y. Wu and M. Shiau, "Accurate Speed Improvement Techniques for RC Line and Tree Interconnections in CMOS VLSI," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 2.1648-2.1651, May 1990.
- [42] C. Y. Wu and M. Shiau, "Delay Models and Speed Improvement Techniques for RC Tree Interconnections Among Small-Geometry CMOS Inverters," *IEEE Journal of Solid-State Circuits*, Vol. SC-25, No. 5, pp. 1247-1256, October 1990.
- [43] M. Nekili and Y. Savaria, "Optimal Methods of Driving Interconnections in VLSI Circuits," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 21-23, May 1992.
- [44] M. Nekili and Y. Savaria, "Parallel Regeneration of Interconnections in VLSI & ULSI Circuits," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 2023-26, May 1993.
- [45] S. Dhar and M. A. Franklin, "Optimum Buffer Circuits for Driving Long Uniform Lines," *IEEE Journal of Solid-State Circuits*, Vol. SC-26, No. 1, pp. 32-40, January 1991.

- [46] C. Tretz and C. Zukowski, "CMOS Transistor Sizing for Minimization of Energy-Delay Product," Proceedings of the IEEE Great Lakes Symposium on VLSI, pp. 168-173, March 1996.
- [47] C. Zukowski and C. Tretz, "Transistor Sizing in CMOS Logic Chains to Minimize Energy-Delay Product," Proceedings of the Workshop on Academic Electronics in New York State, pp. 221-226, June 1996.
- [48] C. J. Alpert and A. Devgan, "Wire Segmenting for Improved Buffer Insertion," *Proceedings of the IEEE/ACM Design Automation Conference*, pp. 588-593, June 1997.
- [49] C. J. Alpert, A. Devgan, and S. T. Quay, "Buffer Insertion for Noise and Delay Optimization," *Proceedings of the IEEE/ACM Design Automation Conference*, pp. 362-367, June 1998.
- [50] J. Culetu, C. Amir, and J. MacDonald, "A Practical Repeater Insertion Method in High Speed VLSI Circuits," Proceedings of the IEEE/ACM Design Automation Conference, pp. 392-395, June 1998.
- [51] J. Lillis, C.-K. Cheng, and T.-T. Y. Lin, "Optimal Wire Sizing and Buffer Insertion for Low Power and a Generalized Delay Model," *IEEE Journal of Solid-State Circuits*, Vol. SC-31, No. 3, pp. 437-446, March 1996.
- [52] J. Lillis and C.-K. Cheng, "Timing Optimization for Multi-Source Nets: Characterization and Optimal Repeater Insertion," *Proceedings of the IEEE/ACM Design Automation Conference*, pp. 214-219, June 1997.
- [53] T. Sakurai and A. R. Newton, "A Simple Short-Channel MOSFET Model and its Application to Delay Analysis of Inverters and Series-Connected MOSFETs," *Proceedings of the IEEE International Symposium on Circuits and Systems*, pp. 105-108, May 1990.
- [54] C. C. N. Chu and D. F. Wong, "A New Approach to Simultaneous Buffer Insertion and Wire Sizing," Proceedings of the IEEE International Conference on Computer-Aided Design, pp. 614-621, November 1997.
- [55] J. Cong, L. He, C.-K. Kong, and P. H. Madden, "Performance Optimization of VLSI Interconnect Layout," *Integration: the VLSI Journal*, Vol. 21, pp. 1–94, 1996.
- [56] A. P. Chandrakasan, S. Sheng, and R. W. Broderson, "Low-Power CMOS Digital Design," *IEEE Journal of Solid-State Circuits*, Vol. SC-27, No. 4, pp. 473-483, April 1992.

- [57] D. Dobberpuhl et al., "A 200 MHz 64-b Dual Issue CMOS Microprocessor," IEEE Journal of Solid-State Circuits, Vol. SC-27, No. 11, pp. 1555-1567, November 1992.
- [58] W. C. Elmore, "The Transient Response of Damped Linear Networks with Particular Regard to Wideband Amplifiers," Journal of Applied Physics, Vol. 19, No. 1, pp. 55-63, January 1948.
- [59] V. Adler and E. G. Friedman, "Repeater Design to Reduce Delay and Power in Resistive Interconnect," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 2148-2151, June 1997.
- [60] R. C. Jaeger, "Comments on 'An Optimized Output Stage for MOS Integrated Circuits'," IEEE Journal of Solid-State Circuits, Vol. SC-10, No. 3, pp. 185-186, June 1975.
- [61] J. M. Rabaey, Digital Integrated Circuits. Prentice Hall, 1996.
- [62] V. Adler and E. G. Friedman, "Delay and Power Expressions for a CMOS Inverter Driving a Resistive-Capacitive Load," Analog Integrated Circuits for Signal Processing, Vol. 14, No. 1/2, pp. 29-40, September 1997.
- [63] V. Adler and E. G. Friedman, "Timing and Power Models for CMOS Repeaters Driving Resistive Interconnect," Proceedings of the IEEE ASIC Conference, pp. 201-204, September 1996.
- [64] C. Mead and L. Conway, Introduction to VLSI Systems. Addison-Wesley, 1980.
- [65] B. S. Cherkauer and E. G. Friedman, "A Unified Design Methodology for CMOS Tapered Buffers," *IEEE Transactions on VLSI Systems*, Vol. VLSI-3, No. 1, pp. 99-111, March 1995.
- [66] B. S. Cherkauer and E. G. Friedman, "Design of Tapered Buffers with Local Interconnect Capacitance," *IEEE Journal of Solid-State Circuits*, Vol. SC-30, No. 2, pp. 151-155, February 1995.
- [67] J. A. Nelder and R. Mead, "A Simplex Method for Function Minimization," Computer Journal, Vol. 7, No. 4, pp. 308-313, January 1965.
- [68] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, *Numerical Recipes in C, The Art of Scientific Computing*. Cambridge University Press, 1988.
- [69] M. P. Vecchi and S. Kirkpatrick, "Global Wiring by Simulated Annealing," IEEE Transactions on Computer-Aided Design, Vol. CAD-2, No. 4, pp. 215-222, October 1983.

- [70] E. G. Friedman, Clock Distribution Networks in VLSI Circuits and Systems. IEEE Press, 1995.
- [71] A. Hirata, H. Onodera, and K. Tamaru, "Estimation of Short-Circuit Power Dissipation for Static CMOS Gates," *IEICE Transactions on Fundamentals* of Electronics, Communications and Computer Sciences, Vol. E79-A, No. 3, pp. 304-311, March 1996.
- [72] J. Ryan and T. Kikkawa, "Copper Interconnect Technology Challenges," Proceedings of the IEEE Symposium on VLSI Technology, pp. 143-145, June 1998.
- [73] T. Nogami and S. Lopatin, "Current Status and Challenges for Copper Interconnect Technology," *Proceedings of the Symposium on Semiconductors and Integrated Circuits Technology*, pp. 89-94, June 1998.
- [74] A. I. Kayssi, K. A. Sakallah, and T. M. Burks, "Analytical Transient Response of CMOS Inverters," IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, Vol. CAS I-39, No. 1, pp. 42-45, January 1992.
- [75] T. Sakurai and A. R. Newton, "A Simple MOSFET Model for Circuit Analysis," *IEEE Transactions on Electron Devices*, Vol. ED-38, No. 4, pp. 887-893, April 1991.
- [76] C. L. Ratzlaff and L. T. Pillage, "RICE: Rapid Interconnect Circuit Evaluation Using AWE," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. CAD-23, No. 6, pp. 763-776, June 1994.
- [77] S.-Y. Kim, N. Gopal, and L. Pillage, "Time-Domain Macromodels for VLSI Interconnect Analysis," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. CAD-13, No. 10, pp. 1257-1270, October 1994.
- [78] J. E. Bracken, V. Raghavan, and R. A. Rohrer, "Interconnect Simulation with Asymptotic Waveform Evaluation (AWE)," *IEEE Transactions on Cir*cuits and Systems. Part I, Fundamental Theory and Applications, Vol. CAS-39, No. 11, pp. 869-878, November 1992.
- [79] J. Y. Lee, X. Huang, and R. A. Rohrer, "Pole and Zero Sensitivity Calculation in Asymptotic Waveform Evaluation," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. CAD-11, No. 5, pp. 586-597, May 1992.

- [80] R. Gupta, S.-Y. Kim, and L. T. Pileggi, "Domain Characterization of Transmission Line Models and Analyses," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. CAD-15, No. 2, pp. 184-193, February 1996.
- [81] F. Dartu, N. Menezes, J. Qian, and L. T. Pillage, "A Gate-Delay Model for High-Speed CMOS Circuits," Proceedings of the IEEE/ACM Design Automation Conference, pp. 576-580, June 1994.
- [82] F. Dartu, N. Menezes, and L. T. Pileggi, "Performance Computation for Precharacterized CMOS Gates with RC Loads," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. CAD-15, No. 5, pp. 544-53, May 1996.
- [83] J. Qian, S. Pullela, and L. Pillage, "Modeling the "Effective Capacitance" for the RC Interconnect of CMOS gates," IEEE Transactions on Computer-Aided Design, Vol. CAD-13, No. 12, pp. 1526-1535, December 1994.
- [84] A. Kahng, "Personal communication."
- [85] S. S. Sapatnekar, "Wire Sizing as a Convex Optimization Problem: Exploring the Area-Delay Tradeoff," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. CAD-15, No. 8, pp. 1001-1011, August 1996.
- [86] N. Menezes, R. Baldick, and L. T. Pileggi, "A Sequential Quadratic Programming Approach to Concurrent Gate and Wire Sizing," Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, pp. 144-151, November 1995.
- [87] N. Menezes, R. Baldick, and L. T. Pileggi, "A Sequential Quadratic Programming Approach to Concurrent Gate and Wire Sizing," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. CAD-16, No. 8, pp. 867-881, August 1997.
- [88] J. P. Fishburn, "Shaping a VLSI Wire to Minimize Elmore Delay," Proceedings of the European Design and Test Conference, pp. 244-251, May 1997.
- [89] J. Cong and K.-S. Leung, "Optimal Wiresizing Under the Distributed Elmore Delay Model," *Proceedings of the IEEE International Conference on Computer-Aided Design*, pp. 634-639, November 1993.
- [90] J. Cong and C.-K. Koh, "Simultaneous Buffer and Wire Sizing for Performance and Power Optimization," Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, pp. 206-212, November 1994.

- [91] J. Cong and L. He, "Optimal Wiresizing for Interconnects with Multiple Sources," Proceedings of the IEEE International Conference on Computer-Aided Design, pp. 568-574, November 1995.
- [92] J. J. Cong and K.-S. Leung, "Optimal Wiresizing under Elmore Delay Model," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. CAD-14, No. 3, pp. 321-336, March 1995.
- [93] J. Cong and L. He, "An Efficient Approach to Simultaneous Transistor and Interconnect Sizing," Proceedings of the IEEE International Conference on Computer-Aided Design, pp. 181-186, November 1996.
- [94] J. Cong, C. Koh, and K. Leung, "Simultaneous Buffer and Wire Sizing for Performance and Power Optimization," Proceedings of the IEEE International Symposium on Low Power Electronics and Design, pp. 271-276, August 1996.
- [95] J. Cong, Z. Pan, L. He, and C.-K. Koh, "Interconnect Design for Deep Submicron ICs," Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, pp. 478-487, November 1997.
- [96] J. Cong, L. He, C.-K. Koh, and Z. Pan, "Global Interconnect Sizing and Spacing with Consideration of Coupling Capacitance," *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pp. 628-635, November 1997.
- [97] B. Razavi, Monolithic Phase-Locked Loops and Clock Recovery Circuits. New York: IEEE Press, 1996.
- [98] A. Efendovich, Y. Afek, C. Sella, and Z. Bikowsky, "Multifrequency Zero-Jitter Delay-Locked Loop," *IEEE Journal of Solid-State Circuits*, Vol. SC-29, No. 1, pp. 67-70, January 1994.
- [99] J. Dunning, G. Garcia, J. Lundberg, and E. Nuckolls, "An All-Digital Phase-Locked Loop with 50-Cycle Lock Time Suitable for High-Performance Microprocessors," *IEEE Journal of Solid-State Circuits*, Vol. SC-30, No. 4, pp. 412-422, April 1995.
- [100] T. H. Lee, K. S. Donnelly, J. T. Ho, M. G. Johnson, and T. Ishikawa, "A 2.5 V CMOS Delay-Locked Loop for an 18 Mbit, 500 Megabyte/s DRAM," IEEE Journal of Solid-State Circuits, Vol. SC-29, No. 12, pp. 1491-1496, December 1994.
- [101] G. U. Jensen, B. Lund, T. A. Fjeldly, and M. Shur, "Monte Carlo Simulation of Semiconductor Devices," Computer Physics Communications, Vol. 67, pp. 1-61, August 1991.

[102] S.-M. Kang and Y. Leblebici, CMOS Digital Integrated Circuit: Analysis and Design. United States: McGraw-Hill, 1996.

### **Publications**

#### Repeater Publications

#### Journal Articles:

- V. Adler and E. G. Friedman, "Uniform Repeater Insertion in RC Trees," IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, (in submission).
- V. Adler and E. G. Friedman, "Repeater Design to Reduce Delay and Power in Resistive Interconnect," *IEEE Transactions on Circuits and Systems II:* Analog and Digital Signal Processing, Vol. CAS II-45, No. 5, pp. 607-616, May 1998.
- V. Adler and E. G. Friedman, "Delay and Power Expressions for a CMOS Inverter Driving a Resistive-Capacitive Load," *Analog Integrated Circuits for Signal Processing*, Vol. 14, No. 1/2, pp. 29-39, September 1997.

#### Conference Papers:

- V. Adler and E. G. Friedman, "Optimizing RC Tree Delay in High Speed ASICs Through Repeater Insertion," Proceedings of the IEEE ASIC Conference, pp. 375-379, September 1998.
- V. Adler and E. G. Friedman, "A Repeater Timing Model and Insertion Algorithm to Reduce Delay in RC Tree Structures," Proceedings of the IEEE International Conference on Electronics, Circuits and Systems, pp. 2.67-2.70, September 1998.
- V. Adler and E. G. Friedman, "Repeater Insertion to Reduce Delay and Power in RC Tree Structures," Proceedings of the Asilomar Conference on Signals, Systems, and Computers, pp. 749-752, November 1997.
- V. Adler and E. G. Friedman, "Repeater Design to Reduce Delay and Power in Resistive Interconnect," *Proceedings of the IEEE International Symposium on Circuits and Systems*, pp. 2148-2151, June 1997.

- V. Adler and E. G. Friedman, "Timing and Power Models for CMOS Repeaters Driving Resistive Interconnect," *Proceedings of the IEEE ASIC Conference*, pp. 201-204, September 1996.
- V. Adler and E. G. Friedman, "Delay and Power Expressions for Short-Channel CMOS Inverter Driving Resistive Interconnect," Proceedings of the Workshop on Academic Electronics in New York State, pp. 207-220, June 1996
- V. Adler and E. G. Friedman, "Delay and Power Expressions for a CMOS Inverter Driving a Resistive-Capacitive Load," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 4.101-4.104, May 1996.
- V. Adler and E. G. Friedman, "A Delay Expression for a Short-Channel CMOS Inverter Driving a Resistive-Capacitive Load," *Proceedings of the IEEE Electron Devices Activities in Western New York Conference*, p. 18, November 1995.

## Superconductive Circuit Design Methodology Publications Journal Articles:

- K. Gaj, Q. P. Herr, V. Adler, D. K. Brock, E. G. Friedman, and M. J. Feldman, "Towards a Systematic Design Methodology for Large Multi-Gigahertz Rapid Single Flux Quantum Circuits," *IEEE Transactions on Applied Superconductivity* (in submission).
- K. Gaj, Q. P. Herr, V. Adler, A. Krasniewski, E. G. Friedman, and M. J. Feldman, "Tools for the Computer-Aided Design of Multi-Gigahertz Superconducting Digital Circuits," *IEEE Transactions on Applied Superconductivity*, Vol. 9, 1999 (in press).
- Q. P. Herr, N. Vukovic, C. A. Mancini, K. Gaj, Q. Ke, V. Adler, E. G. Friedman, A. Krasniewski, M. F. Bocko, and M. J. Feldman, "Design and Low Speed Testing of a Four-Bit RSFQ Multiplier-Accumulator," *IEEE Transactions on Applied Superconductivity*, Vol. AS-7, No. 2, pp. 3168-3171, June 1997.
- V. Adler, C. H. Cheah, K. Gaj, D. K. Brock, and E. G. Friedman, "A Cadence-Based Design Environment for Single Flux Quantum Circuits," *IEEE Transactions on Applied Superconductivity*, Vol. AS-7, No. 2, pp. 294-3297, June 1997.

#### **Conference Papers:**

- Q. P. Herr, N. Vukovic, C. A. Mancini, K. Gaj, Q. Ke, V. Adler, E. Friedman, A. Krasniewski, M. F. Bocko, and M. J. Feldman, "Development and Testing of a Four-Bit RSFQ Multiplier-Accumulator," *Proceedings of the Applied Superconductivity Conference*, p. 149 (abstract), August 1996.
- V. Adler, C. H. Cheah, K. Gaj, D. K. Brock, and E. G. Friedman, "A Cadence-Based Design Environment for Single Flux Quantum Circuits," *Proceedings of the Applied Superconductivity Conference*, p. 157 (abstract), August 1996.
- V. Adler and E. G. Friedman, "A Design Environment for Single Flux Quantum Circuits," Proceedings of the IEEE Electron Devices Activities in Western New York Conference, p. 10, November 1994.

# IMAGE EVALUATION TEST TARGET (QA-3)













© 1993, Applied Image, Inc., All Rights Reserved

