# Computationally Efficient Clustering of Power Supplies in Heterogeneous Real Time Systems

Inna Vaisband and Eby G. Friedman

Department of Electrical and Computer Engineering, University of Rochester

Rochester, New York 14627

{vaisband, friedman}@ece.rochester.edu

Abstract—High quality power delivery for on-chip high performance integrated circuits is a significant design challenge in modern functionally diverse systems with multiple power domains. To provide a high quality power delivery system with dynamically changing voltages and transient currents, the onchip power needs to be regulated in real time, within each power domain. To exploit the advantages of existing switching and linear power supplies, a heterogeneous power delivery system has recently been proposed based on the principle of separation of power conversion and regulation, lowering the overall energy loss while requiring small on-chip area. The power efficiency of the system is shown to be a strong function of the clustering of the power supplies - the specific configuration in which power converters and regulators are co-designed. A recursive clustering algorithm with polynomial computational complexity is proposed for an optimal real time power distribution system with minimum power losses. The proposed algorithm is evaluated on IBM power grid benchmark circuits and two multi-power domain circuits, yielding up to a 21% increase in power efficiency, and orders of magnitude speedup in runtime with the proposed recursive clustering algorithm.

### I. INTRODUCTION

The delivery of high quality power to the on-chip circuitry with minimum energy loss is a fundamental requirement of all integrated circuits (ICs). The supply voltage, current density, and parasitic impedances, however, do not scale well with each technology generation, degrading the quality of the power delivered from the off-chip power supplies to the on-chip load circuitry. The challenge becomes even more significant as the diversity of modern multi-voltage multi-core systems increases and dynamic voltage scaling (DVS) becomes an integrated part of the power management system.

To maintain a high quality power supply despite increasing off- and on-chip parasitic impedances, hundreds of power converters should ultimately be integrated on-chip, close to the loads within the individual multiple power domains [1]. Recently, a principle of separation of power conversion and regulation has been introduced [2] that exploits the high power efficiency and relatively small area of, respectively, switching [3], [4] and linear ([5], [6]) power supplies. Consistent with this separation principle, power should be primarily converted with a few power efficient switching supplies, delivered to onchip voltage clusters, and regulated with linear low dropout (LDO) regulators within the individual power domains.

Several schemes for heterogeneous power delivery and optimization in terms of the number of on-chip voltage regulators, and the on-chip co-design of the voltage regulators, decoupling capacitors and current loads have been proposed in [1], [7]–[9]. The co-design of hundreds to thousands of on-chip regulators with multiple switching converters is a new design objective that needs to be considered [2]. In energy efficient systems, the voltage and current supply is dynamically scaled within the individual power domains, affecting the voltage drop within the LDOs and the efficiency of the power delivery system. Optimal real time clustering of the power supplies decreases the voltage drop within the LDOs, increasing overall power efficiency. Exhaustive approaches for clustering power supplies are computationally impractical in DVS systems with hundreds to thousands of power domains. A computationally efficient algorithm to co-design in real time switching converters and on-chip LDO regulators in a heterogeneous system is proposed in this paper, targeting high quality power and efficiency within limited on-chip area. The power savings with the proposed approach are evaluated with IBM power grid benchmarks, demonstrating up to a 24% increase in power efficiency with the proposed voltage clusters. Polynomial computational complexity is exhibited with the proposed recursive clustering algorithm, yielding significant speedup.

The rest of the paper is organized as follows. The challenges of power supply clustering in a heterogeneous power delivery system are reviewed in Section II. A computationally efficient algorithm for optimally clustering power supplies is demonstrated in Section III. The separation of power conversion and regulation in benchmark circuits is evaluated in Section IV. The paper is concluded in Section V.

#### II. BACKGROUND

To exploit the different advantages of both switching and linear converters, a heterogeneous power delivery system is considered. The power in the off-chip or in-package switching power supplies is converted, and the on-chip power with compact linear power supplies is regulated, minimizing LDO voltage drops and on-chip power losses. With tens of offchip or in-package SMPS converters and up to thousands of power domains regulated by individual on-chip LDOs [2], the design complexity of the power delivery system is significantly increasing.

Determining the optimal clustering of the on-chip power supplies is an important challenge in a heterogeneous power delivery system. Intuitively, in a power delivery system with

This research is supported in part by the Binational Science Foundation under Grant No. 2012139, the National Science Foundation under Grant No. CCF-1329374, and by grants from Qualcomm, Cisco Systems, and Samsung.

thousands of on-chip voltage regulators driven by multiple power converters, several options exist to determine the LDO input and SMPS output voltages by connecting LDO to different SMPS converters. Alternatively, the voltage at the output of an LDO is determined by the power requirements of the regulated power domain. The voltage drop within the onchip voltage regulators is therefore based on the power supply clustering and power domain specifications, affecting the overall power efficiency of a distributed heterogeneous system. For a finite number of power supply clusters, an optimal clustering exists that minimizes the voltage drop across the distributed LDO while maximizing the efficiency of the power delivery system. Efficiently determining the optimum power clustering is the primary objective of this work.

To explore the efficiency of a heterogeneous power delivery system, consider a system with S off-chip or in-package SMPS converters and L on-chip LDO, delivering power to N power domains with different supply voltages  $\{(V_{DD}^{(i)}, I_{DD}^{(i)})\}_{i=1}^{N}$ . Intuitively, LDO that regulate power domains with similar supply voltages should be assigned to the same voltage cluster. Thus, to explore the power efficiency of a heterogeneous power delivery system,  $L = N \ge S$  is assumed. The  $i^{th}$  SMPS supplies power to  $l_i$  LDO ( $\Sigma l_i = L = N$ ), forming the  $i^{th}$ voltage cluster. Note that the SMPS output voltages  $(V_{SMPS}^{(i)})$ , where *i* is the cluster id), LDO output voltages  $(V_{LDO}^{(i,m)})$ , where *m* is the LDO id in the *i*<sup>th</sup> cluster), and supply voltages  $(V_{DD}^{(i)})$ , where j is the power domain id) are assumed to be ordered such that lower (higher) voltages are assigned into clusters with lower (higher) indices. Thus, the cluster topology K of the power supplies is determined by the distribution of the LDO within the SMPS clusters  $K = \{l_i\}_{i=1}^S$ . To determine the optimal power supply clustering, it is sufficient to determine the number of voltage regulators in each SMPS cluster that minimizes the voltage drops. This observation suggests that the optimal power clustering can be recursively determined based on the optimal clusters in systems with fewer power supplies.

To illustrate the effect of the clustering topology on the power efficiency of a power delivery system, a heterogeneous system is considered with two switching converters and three linear regulators, supplying equal current  $I_{DD}$  to three power domains  $\{V_{DD}^{(i)}\} = \{1.8 \text{ volts}, 1.1 \text{ volts}, 1.0 \text{ volt}\}$ . Assume  $V_{Drop} = 0.1$ . The power supply clusterings  $K_1 = \{1, 2\}$  and  $K_2 = \{2, 1\}$  for a heterogeneous system with S = 2 and L = N = 3 are shown in Figure 1. The voltage at the output of each switching converter is  $V_{Drop}$  higher than the maximum supply voltage within the relevant cluster [2], yielding a power efficiency,  $\varphi(K_1) = 91\%$  and  $\varphi(K_2) = 80\%$ .

# III. COMPUTATIONALLY EFFICIENT POWER SUPPLY CLUSTERING

The optimal clustering topology with minimum power losses can be obtained by exhaustively comparing the power efficiency  $\varphi$  for all possible clusterings, and choosing the configuration with the maximum efficiency. The number of



Fig. 1. Power supply clusters for a heterogeneous power delivery system with S=2 and L=N=3, (a)  $K_1=\{1,2\}$ , and (b)  $K_2=\{2,1\}$ .

possible clusterings, however, grows exponentially with S, producing a computationally infeasible solution. To efficiently determine the preferable power supply clusters, an alternative computationally efficient solution is required. A power supply clustering algorithm with  $\mathcal{O}(N^2 \cdot S)$  is described in this section. A recursive analytic expression is provided for power supply clustering with L LDO and S SMPS in smaller power delivery systems (l < L LDO and s = S - 1 SMPS). Power supply clusters determined by the recursive approach are similar to exhaustive power supply clustering that maximizes the efficiency of a power delivery system, yielding the optimal power supply clusters.

The key idea behind the proposed algorithm is determining the number of voltage regulators in the high voltage SMPS cluster. This algorithm is  $\mathcal{O}(N)$ . Once the number of LDO in a high voltage SMPS cluster is determined, the problem of power supply clustering is reformulated for the remaining LDO and a smaller number of SMPS clusters. To exemplify the proposed solution, consider a heterogeneous system with three switching converters (S=3) and five linear regulators (L=5), supplying equal current  $I_{DD}$  to five power domains  $(N = 5) \{V_{DD}^{(i)}\} =$ {3.3 volts, 2.6 volts, 1.8 volts, 1.6 volts, 1.0 volt}. The optimum power supply clustering  $K_{OPT}(5,3) = \{l_1, l_2, l_3\},\$  $\Sigma l_i = 5$  is determined recursively based on the number of LDO in the high voltage cluster  $l_3$ , and lower order optimal supply clustering  $K_{OPT}(4,2)$ ,  $K_{OPT}(3,2)$ , and  $K_{OPT}(2,2)$ . A single recursive step is illustrated in Figure 2, demonstrating three possible alternatives for clustering with one  $(l_3 = 1)$ , two  $(l_3 = 2)$ , and three  $(l_3 = 3)$  LDO within the high voltage SMPS cluster. Given the lower order clustering  $K_{OPT}(4,2)$ ,  $K_{OPT}(3,2)$ , and  $K_{OPT}(2,2)$ , the optimum clustering  $K_{OPT}(5,3)$  is determined with linear computational complexity by comparing the power efficiencies  $\varphi(\{K_{OPT}(4,2),1\}), \varphi(\{K_{OPT}(3,2),2\}),$ and  $\varphi(\{K_{OPT}(2,2),3\})$ , and choosing the clustering topology that minimizes power losses.

For a general clustering algorithm, consider clustering L on-chip LDO within S SMPS clusters to deliver power to N = L power domains. The optimal clustering topology of a system with N different supply voltages and S SMPS  $K_{OPT}(N, S) = \{l_i\}_{i=1}^S$ , where  $\sum l_i = N$  is determined re-



Fig. 2. A single step of the recursive power supply clustering algorithm for a heterogeneous power delivery system with S = 3 and L = N = 5, (a) a single LDO ( $l_3 = 1$ ), (b) two LDO ( $l_3 = 2$ ), and (c) three LDO ( $l_3 = 3$ ) in a high voltage SMPS cluster.

cursively by

$$K_{OPT}(N,S) = \{K_{OPT}(N-n_0, S-1), l_S\}, \quad (1)$$

with the initial conditions,

$$K_{OPT}(N,2) = \{N - l_S, l_S\},$$
(2)

$$K_{OPT}(N, S = N) = \{1, 2, ..., N\},$$
(3)

where  $1 \leq l_S \leq (N-S)$  is the number of LDO in the high voltage SMPS cluster. To maximize the overall power efficiency of the system, the number of LDO in the last SMPS cluster is

$$\varphi(K_{OPT}(N,S)) = \max_{l_S} \varphi\left(\{K_{OPT}(N-l_S,S-1),l_S\}\right).$$
(4)

Once the power supply clusters are recursively determined, the maximum voltage level within each SMPS cluster determines the SMPS output and LDO input voltage [2]. Pseudo-code of the algorithm for determining the LDO input voltages based on the proposed clustering is shown in Figure 3.

The LDO input voltages in a system with a single (S = 1)switching converter and the maximum number of SMPS (S = N) are determined, respectively, at lines 2 to 4 and 5 to 7. To determine the optimal clustering of a general system with (1 < S < N) switching converters, lines 8 through 41 are executed. The LDO input voltages for all of the systems with  $s \leq S$  SMPS and  $l \leq L$  LDO are determined progressively and stored in matrix  $all_{VLDO}$ . The matrix is allocated and initiated based on (2) and (3) at lines 9 to 18. The voltage levels at the LDO input voltages are determined in a loop (see lines 20 to 21) for systems with a progressively increasing number of power supplies. All of the high voltage cluster configurations with a different number of LDO are determined at lines 27 to 28. The power efficiency of different configurations is compared at lines 30 to 34, determining the most power efficient system. The number of comparisons to determine the optimal clustering  $K_{OPT}(N, S)$  given all of the optimal clusterings of lower order  $K_{OPT}(n < N, s < S)$  is (N-S). The computational complexity to determine the most



Fig. 3. Algorithm to determine LDO input voltages for power efficient clustering.

power efficient clusters with N = L LDO regulators and S SMPS converters is therefore

$$\sum_{s=1}^{S} \left( \sum_{n=s}^{N} \mathcal{O}(n-s) \right) = \mathcal{O}(N^2 \cdot S), \quad N \ge S.$$
 (5)

Power supply clustering with the proposed algorithm is orders of magnitude faster than exhaustively clustering large power delivery systems.

# IV. CO-DESIGN OF POWER SUPPLIES IN CIRCUIT BENCHMARKS

Five test cases have been analyzed based on IBM power grid benchmarks [10] to evaluate the efficiency of the power separation principle in circuits with hundreds of power domains and tens of different supply voltages. Each of the selected benchmarks is partitioned into voltage domains with voltage levels ranging from 0.5 volts to 1.8 volts with a 0.02 volt shift  $\{V_{DD}^{(i)} = 0.5V + i \cdot 0.02V\}$ , and the area of each domain is determined. The current within a benchmark circuit is assumed to be uniformly distributed. The current load within a domain is therefore proportional to the area of the domain. The voltage within each domain is regulated by an LDO, ensuring that the total number of on-chip LDO is similar to the number of voltage domains.

 TABLE I

 Power efficiency in circuits with and without separation of power conversion and regulation.

| Benchmark | Voltage domains/LDO |                    | Power efficiency with $S$ voltage clusters [%] |                       |        |        |          | CPU time [s] |            |
|-----------|---------------------|--------------------|------------------------------------------------|-----------------------|--------|--------|----------|--------------|------------|
|           | Number              | Voltage range [V]  | Without power separation                       | With power separation |        |        | Proposed | Exhaustive   |            |
|           |                     | (MAX [V] - MIN [V] | S = 1                                          | S=2                   | S = 3  | S = 5  | S = 10   | clustering   | clustering |
| ibmpg1    | 49                  | 1.46 - 0.50 = 0.96 | 68.093                                         | 78.301                | 82.421 | 86.484 | 89.231   | 18.268       | > 100,000  |
| ibmpg2    | 21                  | 1.52 - 1.12 = 0.40 | 77.222                                         | 86.953                | 89.193 | 90.955 | 92.109   | 0.641        | > 100,000  |
| ibmpg3    | 15                  | 1.70 - 1.44 = 0.26 | 88.603                                         | 91.905                | 92.819 | 93.578 | 94.116   | 0.195        | > 100,000  |
| ibmpgnew1 | 11                  | 1.72 - 1.52 = 0.20 | 90.342                                         | 92.859                | 93.471 | 94.070 | 94.267   | 0.083        | 113.893    |
| ibmpgnew2 | 11                  | 1.72 - 1.52 = 0.20 | 90.388                                         | 92.852                | 93.487 | 94.070 | 94.269   | 0.080        | 108.238    |

The proposed power supply clustering algorithm is demonstrated in Matlab and applied to all of the test cases on a multi-core system with four Intel(R) Core(TM) i3-2120 CPU @ 3.30 GHz processors and 2,498 MB memory. A voltage drop of 0.1 volts within each LDO is assumed. The power grid specifications and simulation results with and without power supply clustering are listed in Table I.

Those power grids with a range of LDO output voltages up to 0.20 volts (*ibmpqnew1* and *ibmpqnew2*) exhibit a high power efficiency of 93% despite only two SMPS clusters. The power efficiency of these grids increases by 2.5% as compared to a power delivery system with a single switching converter (without power separation). Increasing the power efficiency in these power grids with a large number of SMPS clusters (94% with ten switching converters) requires excessive area and is not cost effective. Alternatively, the *ibmpg*1 benchmark exhibits a wider range of LDO output voltages, 0.5 volts to 1.5 volts and, therefore, a low power efficiency of 68% without power supplies clustering. The effectiveness of power separation is significant in *ibmpg*1 with a 10.2% and 21.1% increase in power efficiency with, respectively, S = 2 and S = 10 SMPS clusters as compared to S = 1. Separation of power conversion and regulation is therefore particularly important in those systems with a wide range of on-chip supply voltages and voltage drops. To provide high quality power in dynamically scaled multi-voltage circuits, the efficiency of the power supply clustering is evaluated within short control time slots. The proposed power supply clustering algorithm exhibits an order of magnitude smaller computational runtime as compared with the exhaustive method, while providing identical results. With the proposed algorithm, the switching converters and linear regulators can be co-designed in real time for power and area efficient management of the energy budget.

## V. CONCLUSIONS

On-chip power regulation and delivery are necessary for delivering high quality power within modern high performance integrated circuits. To address the issues of power quality and power efficiency in complex power delivery systems, the power conversion and regulation operations should be separated. In compliance with this separation principle, power should be primarily converted off-chip, in-package, and/or onchip with power efficient switching supplies, and regulated with ultra-small linear low dropout regulators at the point-ofload [1]. To dynamically co-design tens of power converters with hundreds to thousands of on-chip regulators, optimal clustering of the on-chip LDO within the SMPS voltage clusters should be determined that maximizes in real time the power efficiency of the overall power delivery system. A computationally efficient power supply clustering is critical for real time heterogeneous power delivery.

An algorithm to recursively cluster a heterogeneous power supply system with polynomial computational complexity is presented. An order of magnitude speedup is exhibited with the proposed algorithm as compared with exhaustive clustering, exhibiting a computationally efficient solution for dynamic clustering of on-chip power supplies. A power efficiency above 82% is demonstrated on IBM benchmarks with more than two switching converters, and up to 94% with a larger number of converters.

#### REFERENCES

- S. Kose and E. G. Friedman, "Distributed On-Chip Power Delivery," *IEEE Journal on Emerging and Selected Topics in Circuits and Systems*, Vol. 2, No. 4, pp. 704–713, December 2012.
- [2] I. Vaisband and E. G. Friedman, "Heterogeneous Methodology for Energy Efficient Distribution of On-Chip Power Supplies," *IEEE Transactions on Power Electronics*, Vol. 28, No. 9, pp. 4267–4280, September 2013.
- [3] C.-H. Wu, L.-R. Chang-Chien, and L.-Y. Chiou, "Active Filter Based On-Chip Step-Down DC-DC Switching Voltage Regulator," *Proceedings* of the IEEE TENCON Conference, pp. 1–6, November 2005.
- [4] Y.-H. Lee, S.-C. Huang, S.-W. Wang, W.-C. Wu, P.-C. Huang, H.-H. Ho, Y.-T. Lai, and K.-H. Chen, "Power-Tracking Embedded BuckBoost Converter with Fast Dynamic Voltage Scaling for the SoC System," *IEEE Transactions on Power Electronics*, Vol. 27, No. 3, pp. 1271– 1282, March 2012.
- [5] P. Hazucha et al., "Area-Efficient Linear Regulator with Ultra-Fast Load Regulation," *IEEE Journal of Solid-State Circuits*, Vol. 40, No. 4, pp. 933–940, April 2005.
- [6] M. Al-Shyoukh, H. Lee, and R. Perez, "A Transient-Enhanced Low-Quiescent Current Low-Dropout Regulator with Buffer Impedance Attenuation," *IEEE Journal of Solid-State Circuits*, Vol. 42, No. 8, pp. 1732–1742, August 2007.
- [7] B. Amelifard and M. Pedram, "Optimal Design of the Power-Delivery Network for Multiple Voltage-Island System-on-Chips," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. 28, No. 6, pp. 888–900, June 2009.
- [8] Z. Zeng, X. Ye, Z. Feng, and P. Li, "Tradeoff Analysis and Optimization of Power Delivery Networks with On-Chip Voltage Regulation," *Proceedings of the IEEE/ACM Design Automation Conference*, pp. 831–836, June 2010.
- [9] J. Gjanci and M. H. Chowdhury, "A Hybrid Scheme for On-Chip Voltage Regulation in System-On-a-Chip (SOC)," *IEEE Transactions on Very Large Scale Integration (VLSI) Circuits*, Vol. 19, No. 11, pp. 1949–1959, November 2011.
- [10] S. R. Nassif, "Power Grid Analysis Benchmarks," Proceedings of the IEEE/ACM Asia and South Pacific Design Automation Conference, pp. 376–381, January 2008.