# Unified Logical Effort—A Method for Delay Evaluation and Minimization in Logic Paths With *RC* Interconnect

Arkadiy Morgenshtein, Eby G. Friedman, *Fellow, IEEE*, Ran Ginosar, *Senior Member, IEEE*, and Avinoam Kolodny, *Member, IEEE* 

Abstract—The unified logical effort (ULE) model for delay evaluation and minimization in paths composed of CMOS logic gates and resistive wires is presented. The method provides conditions for timing optimization while overcoming the limitations of standard logical effort (LE) in the presence of interconnects. The condition for optimal gate sizing in a logic path with long wires is also presented. This condition is achieved when the delay component due to the gate input capacitance is equal to the delay component due to the effective output resistance of the gate. The ULE delay model unifies the problems of gate sizing and repeater insertion: In the case of negligible interconnect, the ULE method converges to standard LE optimization, yielding tapered gate sizes. In the case of long wires, the solution converges toward uniform sizing of gates and repeaters. The technique is applied to various types of logic paths to demonstrate the influence of wire length, gate type, and technology.

*Index Terms*—Delay minimization, interconnect, logical effort (LE), power.

# I. INTRODUCTION

**T** IMING MODELING and optimization are fundamental tasks in digital circuit design. The method of logical effort (LE) was first proposed by Sutherland *et al.* [1], [2] for the fast evaluation and optimization of delay in CMOS logic paths [see Fig. 1(a)]. The technique has since been adopted as a basis for several computer-aided-design (CAD) tools, thanks to the simplicity and elegance of the model. The optimization rule of LE, however, only addresses logic gates and does not consider on-chip wires. As VLSI circuits continue to scale, the contribution of wires to the delay increases and cannot be neglected. The useful LE rule that path delay is minimum when the efforts of each of the stages are equal breaks down, because interconnects

Manuscript received June 20, 2007; revised December 27, 2007, June 12, 2008, and December 23, 2008. First published July 28, 2009; current version published April 23, 2010.

A. Morgenshtein was with the Department of Electrical Engineering, The Technion—Israel Institute of Technology, Haifa 32000, Israel. He is now with the Core CAD Technologies Group, Intel Corporation, Haifa 31015, Israel (e-mail: morgenshtein@gmail.com).

E. G. Friedman is with the Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY 14627 USA (e-mail: friedman@ece.rochester.edu).

R. Ginosar is with the VLSI Systems Research Center, Department of Electrical Engineering, The Technion—Israel Institute of Technology, Haifa 32000, Israel (e-mail: ran@ee.technion.ac.il).

A. Kolodny is with the Department of Electrical Engineering, The Technion—Israel Institute of Technology, Haifa 32000, Israel (e-mail: kolodny@ee. technion.ac.il).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2009.2014239



Fig. 1. Cascaded strings of logic gates. (a) LE optimization for gates without wires is based on equal stage efforts, e.g.,  $g_1h_1 = g_2h_2$ . (b) In the case of gates with wires, the rule of equal effort breaks down due to fixed wire parameters.

have fixed capacitances which do not correlate with the characteristics of the gates [see Fig. 1(b)]. The same issue arises when arbitrary fan-outs and fixed branch loads are present in the circuit structure. This behavior is described by the authors of the LE method as "one of the most dissatisfying limitations of logical effort" [3].

The objective of this paper is to develop a simple method for minimizing delays in logic paths containing both gates and interconnect, including any fan-out loads. Currently, timing optimization is typically treated separately in two scenarios: 1) logic gates without wires (using the standard LE method) and 2) long wires without logic (using repeater insertion [5]). We introduce the *unified LE* (ULE) method for the delay evaluation and optimization of logic paths with general logic gates and RC wires. ULE treats a broad scope of design problems with a single analytic model, combining logic and interconnect delay optimization.

This paper is composed of the following sections. Related work is surveyed and discussed in Section II. The ULE model is developed in Section III. Timing optimization based on the ULE model referring to resistive and capacitive wires is presented in Section IV. A condition for optimal gate sizing in logic paths with wires is also described in Section IV, which provides an intuitive approach to the problem, namely, that the delay component due to the gate capacitance is equal to the delay component due to the effective resistance of the gate. Examples of ULE optimization are presented in Section V. The convergence of the model to existing optimization techniques is shown for specific cases. Gate sizing by ULE for long wires is analyzed in Section VI. Simulation results of benchmark circuits are presented in Section VII, comparing ULE optimization with the results of an industrial CAD tool optimizer. A discussion of advanced design constraints and the applicability of ULE is presented in Section VIII. Finally, a summary of the paper as well as topics for future research are provided in Section IX.

#### II. RELATED WORK

Research has been developed to increase the accuracy of the LE model by considering I/O coupling and ramp input effects [9], as well as internodal charge and deep-submicrometer effects [10]. While increasing the accuracy of the LE method for logic gate delays, the research described in these papers does not address the issue of interconnects. In [11], the LE model is extended to relate transistor size to the speed and energy consumption of circuits without considering the RC wires among the gates. An optimization methodology using LE is proposed in [12] for logic blocks driving interconnects with uniform and nonuniform repeaters. This paper, however, does not address sizing in the presence of interconnects between the logic gates.

Traditional timing optimization procedures have been developed assuming capacitive interconnects [13]-[15], focusing on optimally tapered buffers. In [16] and [17], the wire capacitance between the gates is assumed to be correlated to the gate size, resulting in a fixed tapering factor similar to the LE model. In [15], local interconnect capacitances are considered to be independent of the gate size, and the optimization process is based on constant capacitance-to-current ratio tapering. In order to accurately consider resistive interconnects, post-routing design steps have been added, involving wire segmentation and repeater insertion [5]-[8], [12]. These optimization techniques include equal sizing and spacing of the repeaters [5], as well as tapering the repeater size and wire segments [12]. Most of these techniques for timing optimization in interconnects have been developed independent of the LE model, focusing on inverters as repeaters (or buffers) driving long wires rather than on general logic paths with wire segments.

The LE delay expression has been combined with the Elmore delay model [21] in [18], [19], and [24]. The combined model is used in [18] and [19] for optimal wire segmentation with general logic gates rather than repeaters. The work described in these publications, however, does not consider optimal gate sizing. The authors of [24] use the combined delay model to derive the optimal number and size of equally spaced uniform buffers for insertion into long wires. None of these previous publications, however, provides a general method for logic gate size optimization for circuit speed in the presence of an interconnect. This topic in circuit optimization is addressed in this paper, covering logic circuits with both capacitive and resistive interconnect segments including arbitrary branch fan-out.

## III. DELAY MODEL OF LOGIC GATES WITH WIRES

The LE model is modified here to include the interconnect delay. This change is achieved by extending the gate LE delay by the wire delay, establishing the ULE model.

A circuit comprising logic gates and wires is shown in Fig. 2. The interconnect is represented by a  $\pi$ -model. Following [20],



Fig. 2. Cascaded logic gates with resistive-capacitive interconnect.

the Elmore delay model [21] is used to describe the wire delay. The total combined delay expression is

$$D_i = R_i \cdot (C_{p_i} + C_{w_i} + C_{i+1}) + R_{w_i} \cdot (0.5 \cdot C_{w_i} + C_{i+1})$$
(1)

where  $R_i$  is the effective output resistance of the gate i,  $C_{p_i}$  is the parasitic output capacitance of gate i,  $C_{w_i}$  and  $R_{w_i}$  are the wire capacitance and resistance of segment i, respectively, and  $C_{i+1}$  is the input capacitance of gate i + 1.

This expression is rewritten, similar to [18], [19], and [24], by introducing the delay of a minimum-sized inverter as a technology constant  $\tau = R_0 \cdot C_0$ , where  $R_0$  and  $C_0$  are the output resistance and input capacitance of a minimum sized inverter, respectively

$$D_{i} = \tau \cdot d_{i} = \tau \cdot \left[ \frac{R_{i}}{R_{0}} \cdot \frac{(C_{w_{i}} + C_{i+1} + C_{p_{i}})}{C_{0}} + \frac{R_{w_{i}}}{R_{0} \cdot C_{0}} \cdot (0.5 \cdot C_{w_{i}} + C_{i+1}) \right]. \quad (2)$$

The stage delay, normalized with respect to a minimum inverter delay  $\tau$ , is expressed in LE terms

$$d_{i} = g_{i} \cdot \left(h_{i} + \frac{C_{w_{i}}}{C_{i}}\right) + \frac{R_{w_{i}} \cdot (0.5 \cdot C_{w_{i}} + C_{i+1})}{\tau} + p_{i} \quad (3)$$

where  $g_i = (R_i \cdot C_i)/(R_0 \cdot C_0)$  is the LE related to the gate topology,  $h_i = C_{i+1}/C_i$  is the electrical effort describing the drive capability, and  $p_i = (R_i \cdot C_{p_i})/(R_0 \cdot C_0)$  is the delay factor of the parasitic impedance. The capacitance and resistance of the gate are related to the scaling factor  $x_i$  as  $C_i = C_0 \cdot g_i \cdot x_i$ , and  $R_i = R_0/x_i$ , respectively.

The capacitive interconnect effort  $h_w$  and resistive interconnect effort  $p_w$  are, respectively

$$h_{w_i} = \frac{C_{w_i}}{C_i} \tag{4}$$

$$p_{w_i} = \frac{R_{w_i} \cdot (0.5 \cdot C_{w_i} + C_{i+1})}{\tau}.$$
 (5)

As shown in (4),  $h_w$  expresses the influence of the wire capacitance on the electrical effort of the gate. The component  $p_w$  in (5) is the delay of the loaded wire in terms of the gate delay ( $\tau$ ). The component  $R_w \cdot 0.5 \cdot C_w / \tau$  is technology specific.

The final expression of the ULE delay for a single stage is

$$d = g \cdot (h + h_w) + (p + p_w). \tag{6}$$

The ULE delay expression for an N-stage logic path with RC wires is

$$d = \sum_{i=1}^{N} g_i \cdot (h_i + h_{w_i}) + (p_i + p_{w_i}).$$
(7)



Fig. 3. Delay components in characterizing ULE for long wires.

Note that in the case of short wires, the resistance  $R_w$  of the wire may be neglected, eliminating  $p_w$  and leaving only the capacitive interconnect effort  $h_w$  in the expression. When the wire impedance along the logic path is negligible, the extended delay expression reduces to the standard LE delay equation.

# IV. DELAY MINIMIZATION USING ULE

As a first step in the path delay optimization process, consider a two-stage portion of a logic path with wires (as shown in Fig. 2). The ULE expression of the total delay is

$$d = g_i \cdot (h_i + h_{w_i}) + (p_i + p_{w_i}) + g_{i+1} \cdot (h_{i+1} + h_{w_{i+1}}) + (p_{i+1} + p_{w_{i+1}}).$$
(8)

Substituting  $C_{i+1} = h_i \cdot C_i$  into (8) in the presence of a resistive interconnect, the delay can be expressed in terms of  $h_i$  as

$$d = g_i \cdot \left(h_i + \frac{C_{w_i}}{C_i}\right) + p_i + \frac{R_{w_i} \cdot (0.5 \cdot C_{w_i} + h_i \cdot C_i)}{R_0 \cdot C_0} + g_{i+1} \cdot \left(\frac{C_{i+2} + C_{w_{i+1}}}{h_i \cdot C_i}\right) + p_{i+1} + p_{w_{i+1}}.$$
(9)

The condition for optimal gate sizing is determined by equating the derivative of the delay with respect to the gate size to zero (see [4] for derivation details)

$$\left(g_i + \frac{R_{w_i} \cdot C_i}{R_0 \cdot C_0}\right) \cdot h_i = g_{i+1} \cdot \left(h_{i+1} + h_{w_{i+1}}\right).$$
(10)

For a logic path without wires  $(h_{w_i} = 0, R_{w_i} = 0)$ , the optimum condition of ULE (10) converges to the optimum condition of LE [1]:  $g_i \cdot h_i = g_{i+1} \cdot h_{i+1}$ .

To provide an intuitive interpretation of the expression, the expression can be rewritten by multiplying by  $R_0 \cdot C_0$  and using the relationships  $h_i = C_{i+1}/C_i$ ,  $C_i = C_0 \cdot g_i \cdot x_i$ , and  $R_i = R_0/x_i$ . The resulting optimum condition is

$$(R_i + R_{w_i}) \cdot C_{i+1} = R_{i+1} \cdot (C_{i+2} + C_{w_{i+1}}).$$
(11)

The meaning of (11) is that the optimum size of gate i + 1 is achieved when the delay component  $(R_i + R_{w_i}) \cdot C_{i+1}$  due to the gate capacitance is equal to the delay component  $R_{i+1} \cdot (C_{i+2} + C_{w_{i+1}})$  due to the effective resistance of the gate. Note that the wire parameters  $R_w$  and  $C_w$  are considered fixed when deriving this intuition for gate sizing.

A schematic model describing the related delay components is shown in Fig. 3. Note that the other delay components  $(R_i \cdot$   $C_{w_i}, 0.5 \cdot R_{w_i} \cdot C_{w_i}, R_{w_{i+1}} \cdot (0.5 \cdot C_{w_{i+1}} + C_{i+2}))$  are independent of the size of gate i + 1 and do not influence the optimum size. Also note that, in the presence of wires, the condition for minimum path delay does not correspond to equal delay or equal effort at every stage along the path.

The optimum condition (11) can be further developed for any gate i based on the characteristic that the total delay is the sum of the upstream and downstream delay components

$$D_{C_{i}} = (R_{i-1} + R_{w_{i-1}}) \cdot C_{i}$$
  
=  $(R_{i-1} + R_{w_{i-1}}) \cdot C_{0} \cdot g_{i} \cdot x_{i}$   
$$D_{R_{i}} = R_{i} \cdot (C_{i+1} + C_{w_{i}})$$
  
=  $\frac{R_{0}}{x_{i}} \cdot (C_{i+1} + C_{w_{i}})$   
$$D_{i} = D_{C_{i}} + D_{R_{i}} + const.$$
 (12)

Thus, when the total delay is minimum, the sum of the differential of the delay components with respect to the sizing factor  $x_i$  is equated to 0

$$\frac{\partial D_{C_i}}{\partial x_i} = \left(R_{i-1} + R_{w_{i-1}}\right) \cdot C_0 \cdot g_i$$
$$\frac{\partial D_{R_i}}{\partial x_i} = -\frac{R_0}{x_i^2} \cdot \left(C_{i+1} + C_{w_i}\right) \tag{13}$$
$$\frac{\partial D_C}{\partial D_C} = \frac{\partial D_C}{\partial D_C}$$

$$\frac{\partial D_i}{\partial x_i} = \frac{\partial D_{C_i}}{\partial x_i} + \frac{\partial D_{R_i}}{\partial x_i} = 0.$$
(14)

The solution of (14) provides an expression for the optimal sizing factor  $x_{i_{opt}}$ 

$$x_{i_{\text{opt}}} = \sqrt{\frac{R_0}{\left(R_{i-1} + R_{w_{i-1}}\right)} \cdot \frac{\left(C_{i+1} + C_{w_i}\right)}{C_0 \cdot g_i}}.$$
 (15)

When  $x_{i_{opt}}$  is substituted into the expression in (11), a general optimum condition can be determined

$$(R_{i-1} + R_{w_{i-1}}) \cdot C_i = R_i \cdot (C_{i+1} + C_{w_i}) = \sqrt{[(R_{i-1} + R_{w_{i-1}}) \cdot C_0 \cdot g_i] \cdot [R_0 \cdot (C_{i+1} + C_{w_i})]}.$$
(16)

An intuitive interpretation of (16) is that the minimum delay is achieved when the downstream delay component (due to  $C_i$ ) and the upstream delay component (due to  $R_i$ ) of an optimally sized gate are both equal to the geometric mean of the upstream and downstream delays that would be obtained if the gate (with LE  $g_i$ ) is arbitrarily sized

$$D_{R_{i_{\text{opt}}}} = D_{C_{i_{\text{opt}}}} = GM[D_{R_i}, D_{C_i}].$$
(17)

The dependence of the delay on the sizing factor is exemplified in Fig. 4. Observe that choosing sizing factors different from  $x_{opt}$  will increase the delay. The total delay  $D_i$  comprises four components: the constant delays  $0.5 \cdot R_{w_{i-1}}C_{w_{i-1}}$  and  $0.5 \cdot R_{w_i}C_{w_i}$  and the variable delays  $D_{C_i} = (R_{i-1} + R_{w_{i-1}}) \cdot C_i$  and  $D_{R_i} = R_i \cdot (C_{i+1} + C_{w_i})$  that are dependent on the sizing factor  $x_i$ . The value of the sizing factor  $x_{opt}$  is determined by the intersection of the three curves  $D_{R_i}$ ,  $D_{C_i}$ , and



Fig. 4. Dependence of delay on the sizing factor (for a NAND gate with  $L_i = 100 \,\mu$ m,  $L_{i-1} = 1 \,\text{mm}$ ,  $C_{i-1} = C_0$ , and  $C_{i+1} = 10C_0$ ).

 $D* = GM[D_{R_{i_{\min}}}, D_{C_{i_{\min}}}]$ , as described in (17) and shown in Fig. 4.

The drive ability of a gate is related to the size of the gate and can be represented by a ratio of input capacitances [1]. The optimum condition in (10) can be rewritten to develop an expression for the input capacitance of each gate based on the ULE model

$$C_{i_{\text{opt}}} = \sqrt{\frac{g_i}{g_{i-1} + \frac{R_{w_{i-1}} \cdot C_{i-1}}{R_0 \cdot C_0}} \cdot C_{i-1} \cdot (C_{i+1} + C_{w_i})}$$
$$= \underbrace{\sqrt{C_{i-1} \cdot C_{i+1}}}_{\text{LE}} \cdot \underbrace{\sqrt{\left(1 + \frac{C_{w_i}}{C_{i+1}}\right)}}_{\text{wire capacitance}}$$
$$\cdot \underbrace{\sqrt{\frac{g_i}{g_{i-1} + \frac{R_{w_{i-1}} \cdot C_{i-1}}{R_0 \cdot C_0}}}_{\text{IE}} \cdot (18)$$

logical efforts and wire resistance

Note that the first part of the resulting expression is similar to the condition described by the LE model for a path of identical gates. The second component expresses the influence of the interconnect capacitance. The last component is related to the resistance of the wire and the difference among the individual LEs (types of logic gates) along the path. The expression in (18) illustrates the quadratic relationship between the size of the neighboring gates. The gate size based on ULE can be determined by solving a set of N polynomial expressions for the Ngates along the path. The expressions of optimal ULE sizing are extended to include fixed side branches and multiple fan-outs in Section VIII.

In order to simplify the solution, a relaxation method can be used. The technique is based on an iterative calculation along the path while applying the optimum conditions [4]. Each capacitance along the path is iteratively replaced by the capacitance determined from applying the optimum expression (18) to two neighboring logic gates.



Fig. 5. Optimization of ULE sizing (normalized with respect to  $C_0$ ) for a chain of nine NAND gates with equal wire segments for a variety of lengths. For zero wire length, the solution converges to LE optimization. For long wires, the solution converges to a fixed size  $x_{opt}$ . The parameters of a 65-nm CMOS process include  $R_0 = 8800 \ \Omega$  and  $C_0 = 0.74$  fF. Intermediate wires:  $r_w = 1.0 \ \Omega/\mu m$  and  $c_w = 0.15$  fF/ $\mu m$ . Global wires:  $r_w = 0.04 \ \Omega/\mu m$  and  $c_w = 0.23$  fF/ $\mu m$ .

# V. EXAMPLE LOGIC PATHS

The ULE technique is applied to two example logic paths to demonstrate the properties of gate sizing. Parameters from [22] are used for a 65-nm CMOS technology. The first example logic path is shown in Fig. 5 and consists of nine identical stages. The input capacitance of the first and last gates are  $10 \cdot C_0$  and  $100 \cdot C_0$ , respectively. The size of the logic gates along the path is shown in Fig. 5 for several values of wire length L between stages. The solutions range between two limits (bold lines in the plot): 1) For zero wire lengths, the solution converges to LE optimization [1], and 2) for long wires, the gate size in the middle stages of the path converges to a fixed value  $x_{opt} \cong 50$  (the dashed line) similar to repeater insertion methods [5], [19]. The concept of equal optimal sizing  $x_{opt}$  for long wires is explained in the following section.

A second example is shown in Fig. 6. The logic chain is similar to the previous case, but the input and output gate capacitances are equal to  $10 \cdot C_0$ ; hence, the total electrical effort H = 1. In this case, no gate scaling is performed by LE in the absence of wires. Note that the ULE optimization process provides a sizing solution for a variety of wire lengths: It satisfies LE optimization (no scaling) in the case of zero wire length and converges to a fixed size for long wires.

#### VI. ULE GATE SIZING FOR LONG WIRES

As described in the previous section, in the case of long wire segments, the gate sizing optimization process converges to the scale factor  $x_{opt}$ . This scale factor is independent of wire length in the case of equal interconnect segments. In this section, the



Fig. 6. Optimization of ULE sizing (normalized to  $C_0$ ) for a chain of NAND gates with total electrical effort H = 1 and with equal wire segments for a variety of lengths.



Fig. 7. Delay components of optimum ULE for long wires.

delay model of a logic gate with long wires is investigated in terms of the optimal size.

When long wires are assumed, the impedances  $C_{w_i}$  and  $R_{w_{i-1}}$  of (18) dominate the gate impedances. A schematic model of this case is shown in Fig. 7.

The scale factor of a general gate can be derived from (15) for the case of long wires

$$x_{\text{opt}_{i}} \cong \sqrt{\frac{R_{0} \cdot C_{w_{i}}}{R_{w_{i-1}} \cdot C_{0} \cdot g_{i}}} = \underbrace{\sqrt{\frac{c_{w} \cdot R_{0}}{r_{w} \cdot C_{0} \cdot g_{i}}}}_{\text{constant}} \cdot \sqrt{\frac{L_{i}}{L_{i-1}}} \quad (19)$$

using the relationships  $C_{w_i} = c_w \cdot L_i$  and  $R_{w_i} = r_w \cdot L_i$ , where  $r_w$  and  $c_w$  are the resistance and capacitance of the wire per unit length and  $L_{i-1}$  and  $L_i$  are the length of the wires before and after the logic gate  $g_i$ , respectively. Note that the scale factor of the gate in the case of long wires depends only upon the ratio of the length of the adjacent wires.

A general optimum condition can be derived, similar to (16)

$$R_{w_{i-1}} \cdot C_i = R_i \cdot C_{w_i} = \sqrt{\left[R_{w_{i-1}} \cdot C_0 \cdot g_i\right] \cdot \left[R_0 \cdot C_{w_i}\right]}.$$
(20)

The meaning of (20) is that the minimum delay is achieved when the downstream and upstream delay components of an optimally sized gate are both equal to the geometric mean of the upstream and downstream delays that would be obtained for an arbitrary sized gate.

In the special case of equal wire segments, the capacitance and resistance of all the segments are equal to  $C_w$  and  $R_w$ , respectively. In this case, the scaling factor  $x_{opt}$  is independent of the wire length, and (19) reduces to

$$x_{\text{opt}_i} = \sqrt{\frac{R_0 \cdot c_w}{r_w \cdot C_0 \cdot g_i}}.$$
(21)

Note that this expression can be used as an extension of the basic repeater sizing equation, while the size can be determined for any logic gate according to the LE. For the special case of inverter-based repeater insertion (with a LE g = 1), (21) reduces to

$$x_{\rm opt} = \sqrt{\frac{R_0 \cdot c_w}{r_w \cdot C_0}}.$$
 (22)

This optimal sizing factor is the same as for optimal repeater scaling [5]. In addition, similar to (20), the optimal sizing condition for a repeater is

$$R_{\rm rep} \cdot C_w = C_{\rm rep} \cdot R_w. \tag{23}$$

The best sizing of a repeater is achieved when the delay component  $R_w \cdot C_{\text{rep}}$  due to the repeater capacitance is equal to the delay component  $R_{\text{rep}} \cdot C_w$  due to the effective resistance of the repeater.

The application of ULE to repeater insertion provides a solution to some specific design problems. Two examples are presented here.

Wire layout constraint: Given a wire of total length L comprising two unequal segments of lengths  $L_1$  and  $L_2$ , the optimal size of the repeater located between the segments is

$$x_{\rm rep_{opt}} = \sqrt{\frac{c_w \cdot R_0}{r_w \cdot C_0}} \cdot \sqrt{\frac{L_2}{L_1}}.$$
 (24)

*Cell size constraint*: Given a repeater of size  $x_{rep}$  dividing a wire of total length L into two segments, the optimal segment lengths  $L_{1_{opt}}$  and  $L_{2_{opt}} = L - L_{1_{opt}}$  are related by

$$\frac{L_{2_{\text{opt}}}}{L_{1_{\text{opt}}}} = x_{\text{rep}}^2 / \left(\frac{c_w \cdot R_0}{r_w \cdot C_0}\right). \tag{25}$$

#### VII. COMPARISON WITH BENCHMARK CIRCUITS

ULE optimization is verified by comparison with the results of the Cadence Virtuoso Analog Optimizer [23], a commercial numerical optimizer that uses a circuit simulator for delay modeling. The Analog Optimizer uses the least square and C version Feasible Sequential Quadratic Programming numerical algorithms to determine the value of the design variables that satisfy specific design objectives. The optimal solution is achieved



Fig. 8. Delay of a carry-lookahead adder for various wire segment lengths after gate size optimization by LE, ULE, and Analog Optimizer (AO). Each pair of adder stages is interconnected by a wire segment in a 65-nm CMOS technology. For short wires, all methods yield the same results. For longer wires, LE becomes increasingly inaccurate while ULE optimization is comparable to the numerical results obtained by the Analog Optimizer.

TABLE I COMPARISON OF COMPUTATIONAL RUN TIME OF ANALOG OPTIMIZER AND ULE FOR VARIOUS NUMBERS OF STAGES IN A RIPPLE-CARRY ADDER

|                      | Run Time [minutes] |    |    |    |
|----------------------|--------------------|----|----|----|
| Number of stages     | 2                  | 4  | 6  | 8  |
| AO (1% precision)    | 25                 | 43 | 60 | 82 |
| AO (5% precision)    | 18                 | 25 | 32 | 39 |
| ULE (0.1% precision) | < 1 sec            |    |    |    |

by detecting the sensitivity of the expression to each design variable, iteratively changing the variables and performing circuit simulations. The numerical methods in the Analog Optimizer can be used to satisfy a variety of design specifications. In this paper, minimum delay is the design goal. The design variable used by Analog Optimizer is the size of the gates along the critical path. Two circuits are considered: 1) a 4-b carry-lookahead adder and 2) a 4-b ripple-carry adder, designed for 65-nm CMOS technology [22]. The critical paths in both circuits are optimized according to (18) for different interstage wire lengths. The ULE results are compared with the results of the Analog Optimizer tool.

A comparison of the resulting delay, evaluated by circuit simulation, is shown in Fig. 8. The delay after ULE optimization is close to the results achieved by the Analog Optimizer tool (within 9%), while the standard LE technique becomes increasingly inaccurate as the wire lengths grow.

The low complexity and efficient computational time of ULE makes the algorithm a competitive alternative for integration into electronic design automation (EDA) toolsets that optimize complex logic structures with interconnect. The ULE and Analog Optimizer are compared in zero in terms of the computational run time as a function of the length of the logic path. Both techniques are used to optimize the critical path in a ripple-carry adder with a varying number of full adder stages. Note that the run time of the Analog Optimizer is orders of magnitude longer than the ULE run time, as listed in Table I.



Fig. 9. Logic path segment including RC interconnect and two branches.  $R_b$  and  $C_b$  are the resistance and capacitance of the branch wires, respectively, and  $C_f$  is the fan-out load capacitance.

## VIII. ULE OPTIMIZATION IN PATHS WITH BRANCHES

ULE optimization can be extended to address the general design case where the logic path may include branches or gates with multiple fan-out. The extended delay model is exemplified by the circuit shown in Fig. 9, defining a theoretical framework for delay minimization in circuits with side branches and multiple fan-out. The circuit shows the general structure containing a side branch with RC interconnect and/or a fan-out load with arbitrary capacitance. A similar circuit can be used to extend the LE model [1], [2] using only a capacitive load at the branch.

The ULE expression of the total delay of stages i and i + 1 containing branches and fan-outs can be written, similarly to (9), as

$$d = g_{i} \cdot \left[ h_{i} + h_{w_{i}} + \frac{C_{b1_{i}} + C_{f1_{i}}}{C_{i}} + \frac{C_{b2_{i}} + C_{f2_{i}}}{C_{i}} \right] + \frac{R_{w_{i}}}{\tau} \\ \times \left[ 0.5 \cdot C_{w_{i}} + h_{i} \cdot C_{i} + C_{b2_{i}} + C_{f2_{i}} \right] + g_{i+1} \\ \times \left[ \frac{C_{w_{i+1}} + C_{i+2} + C_{b1_{i+1}} + C_{f1_{i+1}} + C_{b2_{i+1}} + C_{f2_{i+1}}}{h_{i} \cdot C_{i}} \right] \\ + \frac{R_{w_{i+1}}}{\tau} \cdot \left[ 0.5 \cdot C_{w_{i+1}} + C_{i+2} + C_{b2_{i+1}} + C_{f2_{i+1}} \right]$$
(26)

where  $\tau = R_0 \cdot C_0$  is the minimum inverter delay.

The ULE condition for gate sizing is determined by equating the derivative of the delay with respect to the gate size to zero

$$\left(g_{i} + \frac{R_{w_{i}} \cdot C_{i}}{\tau}\right) \cdot h_{i}$$

$$= g_{i+1} \cdot \left(h_{i+1} + h_{w_{i+1}} \underbrace{\frac{C_{b1_{i+1}} + C_{f1_{i+1}} + C_{b2_{i+1}} + C_{f2_{i+1}}}{C_{i+1}}}_{\text{branches and fanouts}}\right).$$

$$(27)$$

The branch wire resistance  $R_{b_i}$  is not a part of the optimum condition since the resistance is not along the path where the Elmore delay is calculated. Note that in those circuits without multiple fan-out or branch interconnects, this general ULE condition for gate sizing converges to (10).



Fig. 10. Equivalent circuit with the effective branch and fan-out capacitances  $C_{bf1}$  and  $C_{bf2}$  in parallel with the path capacitances.

By applying (27) to each gate on the path in an iterative procedure, (19) can replaced by

$$C_{i} = \sqrt{\frac{g_{i} \cdot C_{i-1} \cdot (C_{w_{i}} + C_{i+1} + C_{b1_{i}} + C_{f1_{i}} + C_{b2_{i}} + C_{f2_{i}})}{g_{i-1} + \frac{R_{w_{i-1}} \cdot C_{i-1}}{\tau}}}$$

$$= \sqrt{C_{i-1}C_{i+1}}$$

$$\times \sqrt{\frac{1 + \frac{C_{w_{i}}}{C_{i+1}} + \underbrace{\frac{(C_{b1_{i}} + C_{f1_{i}} + C_{b2_{i}} + C_{f2_{i}})}{C_{i+1}}}_{\text{branches and fanouts}}}}$$

$$\times \sqrt{\frac{g_{i}}{g_{i-1} + \frac{R_{w_{i-1}} \cdot C_{i-1}}{\tau}}}.$$
(28)

From the relationship  $(g_i \cdot \tau)/C_i = R_i$ ; an intuitive interpretation of the optimum condition can be derived similar to (11)

$$(R_{i-1}+R_{w_{i-1}}) \cdot C_i$$

$$= R_i \cdot \left( C_{w_i}+C_{i+1} \underbrace{\underbrace{C_{bf_1}}_{C_{b1_{i+1}}+C_{f1_{i+1}}}_{\text{branches and fanouts}}}_{\text{branches and fanouts}} \right).$$

$$(29)$$

The load of the side branches is represented by  $C_{bf1}$  and  $C_{bf2}$ . These capacitances are the effective capacitive load of the branch wires and fan-out gates shown in Fig. 10. Note that the resistances  $R_{b1}$  and  $R_{b2}$  of the wires on the fan-out branches do not affect the Elmore delay of the path.

These ULE optimum expressions can be generalized for any combination of side branch wires and fan-out gates by determining the total effective capacitance of the fan-out branches for each stage of the path

$$C_{\rm BF} = \sum_{1}^{n} Cb_n + \sum_{1}^{m} Cf_m \tag{30}$$

where n and m are the number of branch wires and fan-out gates in a path stage, respectively. The general ULE conditions for gate sizing are determined from (30) similar to (27), (28), and (29)

$$\left(g_i + \frac{R_{w_i} \cdot C_i}{\tau}\right) \cdot h_i = g_{i+1} \cdot \left(h_{i+1} + h_{w_{i+1}} + \frac{C_{\mathrm{BF}_{i+1}}}{C_{i+1}}\right) \quad (31)$$

$$C_{i} = \sqrt{C_{i-1}C_{i+1}} \cdot \sqrt{1 + \frac{C_{w_{i}}}{C_{i+1}}} + \frac{C_{BF_{i}}}{C_{i+1}} \cdot \sqrt{\frac{g_{i}}{g_{i-1} + \frac{R_{w_{i-1}}C_{i-1}}{\tau}}}_{(32)}$$

$$(R_{i-1} + R_{w_{i-1}}) \cdot C_i = R_i \cdot (C_{w_i} + C_{i+1} + C_{BF_i}).$$
(33)

Note that in those circuits without multiple fan-out gates or branch interconnects, these general ULE conditions for gate sizing converges to (10), (11), and (18).

### IX. SUMMARY AND FUTURE WORK

Delay minimization in logic paths with wires is an important issue in the high complexity integrated circuit design process. The interconnect is a dominant factor in performance driven circuits and must be explicitly considered throughout the design process. The characteristics of the wires are not correlated with those of the gates, making the standard LE model highly inaccurate. In fact, gate sizing in the presence of interconnects does not correspond to equal effort of all of the stages along a path.

The ULE method is proposed for the delay evaluation and minimization of logic paths with general gates and RC wires. The ULE method provides conditions to achieve minimum delay. Optimal gate sizing in logic paths with wires is achieved when the delay component due to the gate capacitance is equal to the delay component due to the effective resistance of the gate. The ULE method converges to the standard LE when the wire resistance and capacitance are negligible. Gate sizing determined by the proposed ULE method makes ULE suitable for both manual calculations and integration into existing EDA tools.

ULE optimization is compared with the industrial Analog Optimizer tool, showing close agreement in terms of delay. Thanks to the simplicity of the delay model, the computational run time of ULE optimization is several orders of magnitude lower than the example industrial tool. This enhanced efficiency with similar accuracy demonstrates the high potential of ULE for integration into EDA tools.

The ULE method can be combined with known heuristics for buffer and repeater insertion. This combination is effective due to the fixed wire lengths dictated in many design flows. Further research is required to develop solutions that combine simultaneous optimal gate sizing with wire segmentation.

#### ACKNOWLEDGMENT

The authors would like to thank N. Gibrat-Wormser and D. Pedahel for contributing to the ULE evaluation and the reviewers for their helpful suggestions.

#### REFERENCES

- I. Sutherland, B. Sproull, and D. Harris, Logical Effort—Designing Fast CMOS Circuits. San Mateo, CA: Morgan Kaufmann, 1999.
- [2] I. E. Sutherland and R. F. Sproull, "Logical effort: Designing for speed on the back of an envelope," in *Proc. Univ. California/Santa Cruz Conf. ARVLSI*, 1991, pp. 1–16.
- [3] I. Sutherland, B. Sproull, and D. Harris, *Logical Effort—Designing Fast CMOS Circuits*. San Mateo, CA: Morgan Kaufmann, 1999, sec. 10.4, Interconnect, p. 175.
- [4] A. Morgenshtein, E. G. Friedman, R. Ginosar, and A. Kolodny, "Unified logical effort—A method for delay evaluation and minimization in logic paths with RC interconnect," Technion, Haifa, Israel, CCIT Tech. Rep. #612, 2007. [Online]. Available: http://www.ee.technion.ac.il/matrics/papers/UnifiedLogicalEffort-tr.pdf, EE Pub. no. 1569
- [5] H. B. Bakoglu, Circuits, Interconnections and Packaging for VLSI. Reading, MA: Addison-Wesley, 1990, pp. 194–219.
- [6] H. B. Bakoglu and J. D. Meindl, "Optimal interconnection circuits for VLSI," *IEEE Trans. Electron Devices*, vol. ED-32, no. 5, pp. 903–909, May 1985.
- [7] A. Nalamalpu and W. Burleson, "Repeater insertion in deep submicron CMOS: Ramp-based analytical model and placement sensitivity analysis," in *Proc. IEEE Int. Symp. Circuits Syst.*, May 2000, pp. 766–769.
- [8] V. Adler and E. G. Friedman, "Repeater design to reduce delay and power in resistive interconnect," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 45, no. 5, pp. 607–616, May 1998.

- [9] B. Lasbouygues, S. Engels, R. Wilson, P. Maurine, N. Azemard, and D. Auvergne, "Logical effort model extension to propagation delay representation," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 25, no. 9, pp. 1677–1684, Sep. 2006.
- [10] A. Kabbani, D. Al-Khalili, and A. J. Al-Khalili, "Delay analysis of CMOS gates using modified logical effort model," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 24, no. 6, pp. 937–947, Jun. 2005.
- [11] J. Ebergen, J. Gainsley, and P. Cunningham, "Transistor sizing—How to control the speed and energy consumption of a circuit," in *Proc. IEEE Int. Symp. Asynchronous Circuits Syst.*, Apr. 2004, pp. 51–61.
- [12] S. Srinivasaraghavan and W. Burleson, "Interconnect effort—A unification of repeater insertion and logical effort," in *Proc. IEEE Comput. Soc. Annu. Symp. VLSI*, Feb. 2003, pp. 55–61.
- [13] H. C. Lin and L. W. Linholm, "An optimized output stage for MOS integrated circuits," *IEEE J. Solid-State Circuits*, vol. SSC-10, no. 2, pp. 106–109, Apr. 1975.
- [14] R. C. Jaeger, "Comments on 'An optimized output stage for MOS integrated circuits'," *IEEE J. Solid-State Circuits*, vol. SSC-10, no. 2, pp. 185–186, Jun. 1975.
- [15] B. S. Cherkauer and E. G. Friedman, "Design of tapered buffers with local interconnect capacitance," *IEEE J. Solid-State Circuits*, vol. 30, no. 2, pp. 151–155, Feb. 1995.
- [16] B. S. Cherkauer and E. G. Friedman, "A unified design methodology for CMOS tapered buffers," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 3, no. 1, pp. 99–111, Mar. 1995.
- [17] S. R. Vemuru and A. R. Thorbjornsen, "Variable-taper CMOS buffer," *IEEE J. Solid-State Circuits*, vol. 26, no. 9, pp. 1265–1269, Sep. 1991.
- [18] K. Venkat, "Generalized delay optimization of resistive interconnections through an extension of logical effort," in *Proc. IEEE Int. Symp. Circuits Syst.*, May 1993, pp. 2106–2109.
- [19] M. Moreinis, A. Morgenshtein, I. Wagner, and A. Kolodny, "Logic gates as repeaters (LGR) for area-efficient timing optimization," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 14, no. 11, pp. 1276–1281, Nov. 2006.
- [20] C. Chu and D. F. Wong, "Closed form solution to simultaneous buffer insertion/sizing and wire sizing," ACM Trans. Design Autom. Electron. Syst., vol. 6, no. 3, pp. 343–371, Jul. 2001.
- [21] W. C. Elmore, "The transient response of damped linear networks with particular regard to wide band amplifiers," *J. Appl. Phys.*, vol. 19, no. 1, pp. 55–63, Jan. 1948.
- [22] "Predictive Technology Model (PTM)," Nanoscale Integration and Modeling (NIMO) Group, Arizona State Univ., Phoenix, AZ, 2007. [Online]. Available: http://www.eas.asu.edu/~ptm/
- [23] "Virtuoso Analog Design Environment User Guide," 2007. [Online]. Available: http://www.d.umn.edu/~htang/Cadence\_doc/anasimhelp. pdf
- [24] A. Cao, R. Lu, and C. K. Koh, "Post-layout logic duplication for synthesis of domino circuits with complex gates," in *Proc. ASP-DAC*, Jan. 2005, pp. 260–265.
- [25] P. V. Buch, H. Savoj, and L. P. P. Van Ginneken, "Timing optimization in presence of interconnect delays," U.S. Patent 6 553 338, Apr. 22, 2003.



Arkadiy Morgenshtein received the B.S.E.E. degree, M.S. degree in biomedical engineering, M.B.A. degree, and Ph.D. degree in electrical engineering from The Technion—Israel Institute of Technology, Haifa, Israel, in 1999, 2003, 2006, and 2008, respectively.

From 1999 to 2008, he was a Teaching and Research Assistant with the Department of Electrical Engineering, The Technion. From 2001 to 2004, he was a Research Engineer with Rafael, a national research and development organization. Since 2008, he

has been with Core CAD Technologies Group, Intel Corporation, Haifa, where he is engaged in the research and development of power optimization tools. His current research interests include low-power VLSI design and interconnect optimization.



**Eby G. Friedman** (F'00) received the B.S. degree from Lafayette College in 1979, and the M.S. and Ph.D. degrees from the University of California, Irvine, in 1981 and 1989, respectively, all in electrical engineering.

From 1979 to 1991, he was with Hughes Aircraft Company, rising to the position of manager of the Signal Processing Design and Test Department, responsible for the design and test of high performance digital and analog IC's. He has been with the Department of Electrical and Computer Engineering at the

University of Rochester since 1991, where he is a Distinguished Professor. He is also a Visiting Professor at the Technion—Israel Institute of Technology. His current research and teaching interests include high performance synchronous digital and mixed-signal microelectronic design and analysis with application to high speed portable processors and low power wireless communications. He is the author of about 350 papers and book chapters, several patents, and the author or editor of ten books in the fields of high speed and low power CMOS design techniques, high speed interconnect, and the theory and application of synchronous clock and power distribution networks.

Dr. Friedman is the Regional Editor of the Journal of Circuits, Systems and Computers, Chair of the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS steering committee, and a Member of several editorial boards and conference technical program committees. He previously was the Editor-in-Chief of the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, a Member of the editorial board of the *Proceedings of the IEEE*, a Member of the Circuits and Systems (CAS) Society Board of Governors, Program and Technical chair of several IEEE conferences, and a recipient of the University of Rochester Graduate Teaching Award, and the College of Engineering Teaching Excellence Award. He is a Senior Fulbright Fellow.



**Ran Ginosar** (S'79–M'82–SM'07) received the B.Sc. degree (*summa cum laude*) in electrical and computer engineering from The Technion–Israel Institute of Technology, Haifa, Israel, in 1978 and the Ph.D. degree in electrical engineering and computer science from Princeton University, Princeton, NJ, in 1982.

After working with AT&T Bell Laboratories for one year, he joined The Technion faculty in 1983, where he is currently the Head of the VLSI Systems Research Center, Department of Electrical Engi-

neering. He was a Visiting Associate Professor at the University of Utah, Salt Lake City, from 1989 to 1990 and a Visiting Faculty Member at the Strategic CAD Laboratory, Intel Corporation, from 1997 to 1999. His research interests include asynchronous circuits and systems, synchronization, networks-on-chip, many-core architectures, neuroprocessors, and electronic imaging.



levels.

Avinoam Kolodny (M'81) received the Ph.D. degree in microelectronics from The Technion—Israel Institute of Technology, Haifa, Israel, in 1980.

He joined Intel Corporation, where he was engaged in research and development in the areas of device physics, VLSI circuits, electronic design automation, and organizational development. Since 2000, he has been a Member of the Department of Electrical Engineering, The Technion. His current research is focused primarily on interconnects in VLSI systems, at both physical and architectural