# **Energy Metrics for Power Efficient Crosslink and Mesh Topologies**

Inna Vaisband, Eby G. Friedman Department of *Electrical and Computer Engineering* University of Rochester Rochester, NY, 14627, USA vaisband@ece.rochester.edu, friedman@ece.rochester.edu

Abstract - Clock distribution networks are an essential element of a synchronous digital circuit, a significant power consumer and highly sensitive to process, voltage, and temperature variations. Mesh- and crosslink-based topologies reliably compensate for skew variations in these networks, albeit with a significant increase in dissipated power as compared to variation-sensitive low power clock trees. Existing crosslinkbased methods, however, only address skew from an algorithmic perspective at the network topology level. Guidelines for inserting crosslinks within a buffered low power clock tree are provided in this paper. Physical constraints, such as the size of the crosslink and exact location between the driving and load buffers, are analytically described. Metrics to determine the most energy efficient non-tree topology are provided based on closed-form expressions, and verified with simulation.

# I. INTRODUCTION

Modern on-chip clock networks distribute the global clock signal to the sequential elements at up to several gigahertz frequencies, dissipating significant power. Accurate circuit operation is therefore highly dependent on the clock signal characteristics [1]. The clock signal, however, is subject to process, voltage, and temperature (PVT) variations that affect the clock skew schedule [1], limiting performance and functionality.

Non-tree topologies [2-14] have been introduced for variation-tolerant design of high performance clock distribution networks. The density of the non-tree elements in these topologies may vary from a few additional connections (or crosslinks) [12-14] to a completely dense mesh structure [2-11] covering the network with crosslinks. The crosslink connections between the clock tree segments provide alternative paths for the clock signal, maintaining temporal balance while mitigating skew variations between the connected segments. Thus, tolerance to variations increases with the number of crosslinks. The dynamic and short-circuit power dissipated by the inserted crosslinks however also increases with the number of connections [13]. In some integrated circuits, an efficient power - clock skew tradeoff can be achieved with a mesh-based topology, while in other circuits, a crosslink-based network is preferable to produce a variation-tolerant, low power clock distribution network.

In this paper, skew variations and power consumption in crosslink-based clock distribution networks are analyzed based on a simplified clock tree model. These concepts are generalized and guidelines for inserting crosslinks within a Ran Ginosar, Avinoam Kolodny Department of *Electrical Engineering* Technion–Israel Institute of Technology Haifa, 32000, Israel ran@ee.technion.ac.il, kolodny@ee.technion.ac.il

buffered clock tree are provided. Closed-form metrics for crosslink-based topologies are also provided to compare the energy characteristics of non-tree topologies. The metrics and design expressions are verified with simulation.

The rest of the paper is organized as follows. Skew and power tradeoffs for different non-tree clock distribution networks are presented in Section II, including guidelines for inserting crosslinks. Metrics to determine the most power efficient non-tree topology are provided in Section III and discussed in Section IV based on simulation results. The paper is summarized in Section V.

# II. SKEW MITIGATION TECHNIQUES

Skew variations in clock distribution networks limit performance and may cause circuit malfunctions. Existing skew variation mitigation techniques include non-tree clock distribution networks [2-14] such as crosslink- and meshbased topologies. A crosslink-based topology [12-14] is an asymetric tree-based structure with a varying density of nontree wire segments, each connecting two segments within a clock tree. The design of a crosslink-based clock network depends on three characteristics: the location of the crosslinks within a clock tree (in terms of the crosslink connected segments), the specific crosslink location between the connected segments, and the size of the crosslink. Alternatively, crosslinks may connect a group of adjacent segments within a specific level of a clock tree, forming a symmetric mesh-based [2-11] topology (see Figure 1). Mesh- and crosslink-based topologies are discussed, respectively, in Sections II.A and II.B.



Figure 1. A clock network composed of the source, trunk, segments, and sinks, (a) clock tree, (b) crosslink, and (c) intermediate- and sink-level mesh topologies

## A. Mesh-Based Clock Topology

Mesh structures balance the delay and lower the skew between nearby segments, mitigating skew variations [2-11] while consuming high power. Mesh-based clock distribution networks are utilized in a variety of commercial high performance microprocessors [8-11], controlling the effects

This research is supported in part by the National Science Foundation under Grant Nos. CCF-0811317 and CCF-0829915, grants from the New York State Office of Science, Technology & Academic Research to the Center for Advanced Technology in Electronic Imaging Systems, and by grants from Intel Corporation, Qualcomm Corporation, and Samsung Electronics.

of skew variations. Both uniform and nonuniform mesh topologies have recently been investigated, demonstrating a tolerance to variations in dense grids. The number of crosslinks and the total mesh wirelength, however, increase with mesh density, resulting in high power consumption [2-11]. Mesh reduction [6], sizing of the buffers driving the crosslinks [5], and cost function-based algorithms to reduce power consumption [4-6] have been suggested as possible solutions. High power consumption however remains the primary disadvantage of mesh-based clock distribution networks. Several techniques, such as the Skew Bound method in [4] and the Sliding Window Scheme in [5], have recently been proposed to estimate the skew and power of mesh-based clock networks.

Connecting the nodes within a clock mesh affects the local clock delays within all of the connected segments. The crosslinks that connect paths with non-sequentially-adjacent sinks however do not affect circuit operation [1] and waste power. The regularity of mesh-based topologies prevents these crosslinks from being removed. Additional degrees of design freedom are available in crosslink-based topologies, while potentially dissipating less power.

## B. Crosslink-Based Clock Topology

The sensitivity of clock distribution trees to PVT variations increases with circuit speed and technology scaling, resulting in large skew variations. Given a clock tree that satisfies useful skew constraints, crosslinks can be inserted that maintain a useful skew schedule while lowering variations. To design variation tolerant, low power crosslink-based topologies, guidelines should be established regarding 1) the selection of which clock tree segments should be connected by a crosslink, 2) the crosslink location between the selected segments, and 3) the physical characteristics of the crosslink. This topic is considered in this section.

Power and skew tradeoffs are illustrated in a simplified clock network (see Figure 2(a)), where two clock tree segments with inputs Clk<sub>In1</sub> and Clk<sub>In2</sub>, and outputs Clk<sub>Out1</sub> and  $Clk_{Out2}$  are connected with a crosslink X, modeled as a lumped RC wire. An ideal step input signal driving each CMOS inverter is assumed in the analytic expressions, permitting the driver to be modeled as a linear resistor  $R_{ON}$ [15]. A model of the section impedance is depicted in Figure 2(b). The input resistance of segment 1 (2), represented by  $R_1(R_2)$  in Figure 2(b), is composed of the wire resistance connected in series with the transistor. The load capacitance, represented by  $C_1(C_2)$ , shown in Figure 2(b), is composed of the wire capacitance connected in parallel with the input gate capacitance. The skew at the output of the section, shown in Figure 2(b), is due to the skew T between the inputs  $Clk_{lnl}$ and Clk<sub>In2</sub> of the section plus the difference in the propagation delay between Clk<sub>In1</sub> and Clk<sub>Out1</sub>, and Clk<sub>In2</sub> and *Clk<sub>Out2</sub>* (due to different *RC* loads).

The skew and power consumed by two clock tree segments with a crosslink are determined by solving a set of differential equations for the voltage at nodes  $Clk_{Outl}$  and  $Clk_{Out2}$  under the assumption,  $\tau = R_1(\frac{1}{2}C_{\chi} + C_1) \approx R_2(\frac{1}{2}C_{\chi} + C_2)$ .



Figure 2. Two clock tree segments with a crosslink (a) gate level representation, and (b) impedance model

A closed-form expression for the total energy  $E_X^{(i)}$  consumed by the *i*<sup>th</sup> section once the first input switches and until the output capacitors are charged is

$$E_{X}^{(i)} = \left[ \frac{T}{R_{1} + R_{2} + R_{X}} + \tau \left[ \frac{R_{1} + R_{2}}{R_{1} R_{2}} - \frac{1 - e^{-\frac{T}{\tau}}}{R_{1} + R_{2}} + \frac{\left(1 - e^{-\left(\frac{R_{1} + R_{2} + R_{X}}{R_{X}}\right)\frac{T}{\tau}\right)}{\left(R_{1} + R_{2}\right) \cdot \left(R_{1} + R_{2} + R_{X}\right)^{2}} \right]_{i} \nu_{DD}^{2} .$$
(1)

The first term in (1) describes the short-circuit energy, which increases linearly with *T*. The derivative of the second term, which is the dynamic energy required to charge the output capacitance, is negative, yielding the maximum dynamic power consumption at T = 0. The theoretical upper bound of the total energy  $E_{X,MAX}^{(i)}$  is

$$\mathcal{E}_{X}^{(i)} \leq E_{X,MAX}^{(i)} = \left[\frac{T}{R_{1} + R_{2} + R_{X}} + \tau \left(\frac{R_{1} + R_{2}}{R_{1} R_{2}}\right)\right]_{i} V_{DD}^{2} \quad .$$
(2)

Similar to  $T_x = T \cdot 2^{-2R/R_x}$  [7], where the expression for the skew with a crosslink assumes  $R_1 = R_2 = R$  and  $C_1 = C_2 = C$ , the skew is

$$T_X = \frac{V_1(t = t_{50\%}) - V_2(t = t_{50\%})}{V_2(t = t_{50\%})} = T \cdot 2^{-(R_1 + R_2)/R_X} ,$$
(3)

where  $t_{50\%}$  is  $V_1(t = t_{50\%}) = \frac{1}{2}V_{DD}$ . Guiding principles for crosslink insertion are provided in this section based on (1) - (3).

Rule 1. Location of crosslinks within a clock tree: The first design issue is to determine where to insert a crosslink to reduce skew variations between sequentially-adjacent registers, while preserving useful skew in balanced clock trees. Inserting a crosslink between two segments lowers both the skew and delay variations within the clock signal paths from the connected segments continuing downstream to the sequentially-adjacent registers. Those segments close to the sinks should be chosen to tolerate higher variations. Inserting a crosslink between two non-zero skew segments (segments with skew above the allowed skew threshold  $T_{TH}$ ) may, however, affect the skew schedule in a balanced clock tree, as described by (3). Thus, only zero skew segments (segments with skew below the allowed skew threshold  $T_{TH}$ ) at the upper clock tree levels should be considered for sequentially-adjacent registers with useful non-zero skew. A

heuristic for inserting crosslinks is therefore employed in a balanced clock tree:

- To preserve useful skews within a balanced clock tree, crosslinks are inserted between the zero skew segments.
- To minimize skew variations while preserving the useful skews between the sinks, crosslinks are inserted as close as possible to the sinks between the zero skew segments.

Rule 2. Location of crosslink within a clock tree section: The second design issue is determining the location of the crosslink between two zero skew segments. As shown in (2) and (3), both skew and energy are inversely proportional to  $R_1 + R_2$ . Inserting a crosslink far from the input drivers of a section increases  $R_1 + R_2$ , reducing skew variations while consuming less energy.

*Rule 3. Crosslink parameters:* The third design issue is the size of the crosslink to place between the segments. A crosslink X of specific length l, width w, thickness t, and resistivity  $\rho$  determines the capacitance  $C_X$  and resistance  $R_X$ . A higher  $R_X$  and lower  $C_X$  should preferably be used to reduce both the short-circuit and total power consumption. Thus, crosslinks with a small width and thickness, and therefore higher resistance, should be inserted in low power circuits. Alternatively, a lower  $R_X$  and therefore a higher  $C_X$  should be used to reduce skew at the expense of higher power. The characteristics for efficient crosslink-based networks are described quantitatively in Section III under specific skew and power constraints.

#### III. ENERGY METRIC FOR CROSSLINK TOPOLOGY

Several techniques [4-5] have recently been proposed to estimate the additional energy ( $E_{MESH}$ ) consumed by a meshbased clock network. An energy metric for a crosslink-based topology is provided in this section and utilized to compare the power efficiency of crosslink- and mesh-based topologies.

To maintain zero skew ( $T_X \leq T_{TH}$ ) while minimizing the dissipated power between two segments with a crosslink (*Rule 1*), the largest possible  $R_X$  ( $R_{X,OPT}^{T_X \leq T_{TH}}$ ) and smallest  $C_X$  ( $C_{X,OPT}^{T_X \leq T_{TH}}$ ) should be used, yielding, based on (3),

$$R_{X} = \frac{R_{1} + R_{2}}{\log_{2} T - \log_{2} T_{X}} \leq \frac{R_{1} + R_{2}}{\log_{2} T - \log_{2} T_{H}} = R_{X,OPT}^{T_{X} \leq T_{H}} \quad .$$
(4)

Given a crosslink *X* of specific length *l* and resistivity  $\rho$ , the width *w* and thickness *t* are the only factors that affect the crosslink resistance  $R_X$ . The minimum crosslink capacitance  $C_{X,OPT}^{T_X \leq T_{min}}$  can be further determined based on the constraint,  $w \cdot t = R_{X,OPT}^{T_X \leq T_{min}} / \rho l$ , and a model of the interconnect capacitance (e.g., [16]).

The upper bound on the total additional energy  $E_{X,MAX}$  consumed in a clock tree with N crosslinks is determined by substituting  $R_{X,OPT}^{T_X \leq T_{TH}}$  and  $C_{X,OPT}^{T_X \leq T_{TH}}$  into (2), and subtracting the energy consumed by the clock tree section without a crosslink  $E_{Tree}^{(i)} = (C_1 + C_2)_i V_{DD}^2$ , yielding

$$E_{X,MW} = \sum_{i=1}^{N} \left[ \frac{T}{R_{i} + R_{2} + R_{X,QPT}^{T_{X} \leq T_{MI}}} + \frac{1}{2} \left( \frac{R_{i} - R_{2}}{R_{2}} C_{i} + \frac{R_{2} - R_{i}}{R_{i}} C_{2} + \frac{(R_{i} + R_{2})^{2}}{2R_{i}R_{2}} C_{X,QPT}^{T_{X} \leq T_{MI}} \right) \right]_{i} V_{DD}^{2} .$$
(5)

Summarizing, a crosslink-based topology should be used to mitigate skew variations when  $E_{X,MAX} < E_{Mesh}$ . Otherwise, a mesh-based clock distribution network is preferable.

#### IV. SIMULATION RESULTS

A portion of a clock tree with four levels of buffers and sixteen sequentially-adjacent registers in a 180 nm CMOS technology is considered. The source of the clock distribution network is driven by a 1 GHz clock signal. The wires at the upper and lower clock tree levels are modeled, respectively, by the global and local interconnect parameters [16]. The threshold  $T_{TH}$  for the allowed skew variations is set to 5% of the clock period  $T_P$ . The transistor and wire widths within the clock tree are varied between 20% to 50% of the nominal value, resulting in up to  $0.1 \cdot T_P > T_{TH}$  skew variations at the registers. To mitigate skew variations between sequentially-adjacent registers, crosslinks and sparse and dense meshes [4] are compared. The crosslinks are inserted according to the guidelines presented in Sections II.B and III. The number of inserted crosslinks N is based on (5) to comply with the proposed energy metric,  $E_{X,MAX} < E_{Mesh}$ . Thus, only power efficient crosslink-based solutions are considered.

The largest skew and the additional energy due to the inserted crosslinks or mesh connections are listed in Table I for, respectively, moderate (up to 20%) and large (up to 50%) skew variations within a clock tree with different skew schedules. In each example, locally and globally routed crosslinks are considered, respectively, for close and distant crosslink connected segments. The intermediate- and sink-level sparse and dense meshes are designed based on densities from [4]. Whenever skew variations can be mitigated with crosslinks or intermediate-level meshes, sink-level meshes are not considered due to the low energy efficiency. In all of the examples, the choice of a preferable non-tree topology based on the analytic expressions is consistent with the simulation results.

Based on SPICE simulations, for the case of moderate variations, skew mitigation with crosslinks and a mesh is similar: the 52 (78) ps skew that exceeds the 50 ps  $T_{TH}$  in a clock tree with a zero (useful) skew schedule is mitigated to ~ 34 (44) ps with either crosslinks or a mesh. In a crosslink-based topology, however, the same variation tolerance is achieved at a lower energy ( $E_{X,MAX} < E_{MESH}$ ). Furthermore, in the zero skew clock tree with larger skew variations (71 ps >  $T_{TH}$ ), the crosslink-based topology is preferred since the target skew can be achieved with only crosslinks or with an energy expensive, dense sink-level mesh. However, in the clock tree with a useful skew schedule and larger skew variations (97 ps >  $T_{TH}$ ), the skew cannot be mitigated with crosslinks. Hence, in this case, an intermediate-level sparse mesh is preferable.

| Configuration                                                            | Topology                            | Moderate (up to 20%) variations                                             |                                                                            |                                     |                                                | Larger (up to 50%) variations |                   |                                     |                                                |  |
|--------------------------------------------------------------------------|-------------------------------------|-----------------------------------------------------------------------------|----------------------------------------------------------------------------|-------------------------------------|------------------------------------------------|-------------------------------|-------------------|-------------------------------------|------------------------------------------------|--|
|                                                                          |                                     | Maximum skew                                                                |                                                                            | Energy added by                     |                                                | Maximum skew                  |                   | Energy added by                     |                                                |  |
|                                                                          |                                     | due to variations                                                           |                                                                            | non-tree elements [%]               |                                                | due to variations             |                   | non-tree elements [%]               |                                                |  |
|                                                                          |                                     | ps                                                                          | % of $T_P$                                                                 | SPICE $(E_X \text{ and } E_{MESH})$ | Analytic<br>upper bound<br>$(E_{X,MAX} > E_X)$ | ps                            | % of $T_P$        | SPICE $(E_X \text{ and } E_{MESH})$ | Analytic<br>upper bound<br>$(E_{X,MAX} > E_X)$ |  |
| Skew variations<br>within a zero<br>skew clock tree                      | Clock tree                          | 52                                                                          | 5.2 (> $T_{TH}$ )                                                          | 0.00                                | 0.00                                           | 71                            | 7.1 (> $T_{TH}$ ) | 0.00                                | 0.00                                           |  |
|                                                                          | With local crosslinks               | 31                                                                          | 3.1                                                                        | 0.07                                | 0.23 (> 0.07)                                  | 36                            | 3.6               | 0.08                                | 0.24 (> 0.08)                                  |  |
|                                                                          | With global crosslinks              | 32                                                                          | 3.2                                                                        | 1.20                                | 2.53 (> 1.20)                                  | 35                            | 3.5               | 1.34                                | 2.80 (> 1.34)                                  |  |
|                                                                          | With intermediate-level sparse mesh | 35                                                                          | 3.5                                                                        | 3.76                                | N/A                                            | 67                            | 6.7 (> $T_{TH}$ ) | 3.75                                | N/A                                            |  |
|                                                                          | With intermediate-level dense mesh  | 36                                                                          | 3.6                                                                        | 5.97                                | N/A                                            | 66                            | 6.7 (> $T_{TH}$ ) | 5.91                                | N/A                                            |  |
|                                                                          | With sink-level sparse mesh         | N/A. Variations can be mitigated with                                       |                                                                            |                                     |                                                | 53                            | 5.3 (> $T_{TH}$ ) | 4.07                                | N/A                                            |  |
|                                                                          | With sink-level dense mesh          | crosslinks or intermediate-level mesh.                                      |                                                                            |                                     |                                                | 46                            | 4.6               | 6.28                                | N/A                                            |  |
| Skew variations<br>within a clock<br>tree with a useful<br>skew schedule | Clock tree                          | 78                                                                          | 7.8 (> $T_{TH}$ )                                                          | 0.00                                | 0.00                                           | 97                            | 9.7 (> $T_{TH}$ ) | 0.00                                | 0.00                                           |  |
|                                                                          | With local crosslinks               | 45                                                                          | 4.5                                                                        | 0.80                                | 0.82 (> 0.80)                                  | 62                            | $6.2 (> T_{TH})$  | 5.53                                | 9.73 (> 5.53)                                  |  |
|                                                                          | With global crosslinks              | 44                                                                          | 4.4                                                                        | 0.98                                | 2.64 (> 0.98)                                  | 61                            | $6.1 (> T_{TH})$  | 5.35                                | 16.55 (> 5.35)                                 |  |
|                                                                          | With intermediate-level sparse mesh | 43                                                                          | 4.3                                                                        | 3.45                                | N/A                                            | 48                            | 4.8               | 3.43                                | N/A                                            |  |
|                                                                          | With intermediate-level dense mesh  | 43                                                                          | 4.3                                                                        | 5.48                                | N/A                                            | 39                            | 3.9               | 5.44                                | N/A                                            |  |
|                                                                          | With sink-level sparse mesh         |                                                                             | N/A Variations can be mitigated with crosslinks or intermediate level mesh |                                     |                                                |                               |                   |                                     |                                                |  |
|                                                                          | With sink-level dense mesh          | WA. Variations can be mitigated with clossifiks of intermediate-level mesh. |                                                                            |                                     |                                                |                               |                   |                                     |                                                |  |

TABLE I. COMPARISON OF DIFFERENT NON-TREE APPROACHES TO MITIGATE SKEW VARIATIONS WITHIN A CLOCK TREE WITH A ZERO AND USEFUL SKEW SCHEDULE

## V. SUMMARY

In modern circuits with aggressive timing requirements, non-tree topologies should be considered to cope with skew variations. Mesh-based solutions have been shown to reliably mitigate skew variations, albeit at significantly higher power. Alternatively, mesh redundancy can be avoided in crosslink-based topologies to mitigate skew variations at potentially lower power. Different techniques to evaluate power consumption in a mesh-based clock network exist. Thus, to compare the efficiency of crosslink- and mesh-based topologies, an energy metric for a clock tree with crosslinks is required.

Guidelines for inserting crosslinks within a skew scheduled clock tree are presented in this paper. To maintain a target skew between sequentially-adjacent registers, a heuristic is proposed for inserting crosslinks between zero skew segments upstream to those sequentially-adjacent registers that violate timing constraints. In addition, the crosslink should be inserted far from the segment driver for enhanced tolerance to variations at lower power. The optimum crosslink parameters under zero skew constraints are also presented. An energy metric is provided to determine the most power efficient clock network topology under specific timing constraints. Simulation results confirm the analytic analysis regarding the choice of topology for low power, variation-tolerant clock distribution networks.

### References

- I. S. Kourtev and E. G. Friedman, *Timing Optimization Through Clock Skew Scheduling*, *Second Edition*, Springer Science + Business Media, 2009.
- [2] E. G. Friedman, "Clock Distribution Networks in Synchronous Digital Integrated Circuits," *Proceedings of the IEEE*, Vol. 89, No. 5, pp. 665-692, May 2001.
- [3] A. Abdelhadi, R. Ginosar, A. Kolodny, and E. G. Friedman, "Timing-Driven Variation-Aware Nonuniform Clock Mesh Synthesis," *Proceedings of the ACM Great Lakes Symposium on VLSI*, pp. 250-257, May 2010.

- [4] A. Rajaram and D. Z. Pan, "MeshWorks: A Comprehensive Framework for Optimized Clock Mesh Networks Synthesis," *IEEE Transactions* on Computer-Aided-Design of Integrated Circuits and Systems, Vol. 29, No. 12, pp. 1945-1958, November 2010.
- [5] G. R. Wilke, Analysis and Optimization of Mesh-Based Clock Distribution Architectures, Ph.D. Thesis, Federal University of Rio Grande do Sul, Porte Alegre, Brazil, September 2008.
- [6] G. Venkataraman, Z. Feng, J. Hu, and P. Li, "Combinatorial Algorithms for Fast Clock Mesh Optimization." *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pp. 563-567, November 2006.
- [7] M. Mori, H. Chen, B. Yao, and C. K. Cheng, "A Multiple Level Network Approach for Clock Skew Minimization with Process Variations," *Proceedings of the IEEE Asia and South Pacific Design Automation Conference*, pp. 263-268, January 2004.
- [8] P. J. Restle et al., "The Clock Distribution of the Power4 Microprocessor," Proceedings of the IEEE International Solid-State Circuits Conference, pp. 1.144-1.145, February 2002.
- [9] T. Xanthopoulos et al., "The Design and Analysis of the Clock Distribution Network for a 1.2 GHz Alpha Microprocessor," Proceedings of the IEEE International Solid-State Circuits Conference, pp. 402-403, February 2001.
- [10] N. A. Kurd *et al.*, "A Multigigahertz Clocking Scheme for the Pentium 4 Microprocessor," *IEEE Journal of Solid-State Circuits*, Vol. 36, No. 11, pp. 1647-1653, November 2001.
- [11] S. Tam et al., "Clock Generation and Distribution of a Dual-Core Xeon Processor with 16MB L3 Cache," Proceedings of the IEEE International Solid-State Circuits Conference, pp. 1512-1521, February 2006.
- [12] A. Rajaram and D. Z. Pan, "Variation Tolerant Buffered Clock Network Synthesis with Cross Links," *Proceedings of the ACM International Symposium on Physical Design*, pp. 157-164, April 2006.
- [13] I. Vaisband, E. G. Friedman, R. Ginosar, and A. Kolodny, "Low Power Clock Network Design," *Journal of Low Power Electronics and Applications*, No. 1, Vol. 1, pp. 219-246, May 2011.
- [14] G. Venkataraman et al., "Practical Techniques to Reduce Skew and its Variations in Buffered Clock Networks," *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pp. 592-596, November 2005.
- [15] V. Adler and E. G. Friedman, "Delay and Power Expressions for a CMOS Inverter Driving a Resistive-Capacitive Load," *Analog Integrated Circuits and Signal Processing*, Vol. 14, No. 1/2, pp. 29-39, September 1997.
- [16] Predictive Technology Model. Available online: http://ptm.asu.edu