ECE Seminar Lecture Series

Architecting Next-generation SIMT-based Systems for Scalability, Deep Learning, and Data Center Microservices

Mahmoud Khairy, Ph.D. candidate in the Department of Computer Engineering at Purdue University in the Accelerator Architecture Lab at Purdue (AALP)

Wednesday, March 2, 2022
Noon

1400 Wegmans Hall

Mahmoud Khairy

Moore’s law is dead. The physical and economic principles that enabled an exponential rise in transistors per chip have reached their breaking point. However, all is not lost. The death of single-chip performance scaling, even with the extreme parallelism available to Graphics Processing Units (GPUs), will usher in a renaissance in multi-chip Non-Uniform Memory Access (NUMA) scaling. Advances in silicon interposers and other inter-chip signaling technology will enable single-package systems, composed of multiple chiplets that continue to scale even as per-chip transistors do not. In fact, the extreme scaling required for modern Machine Learning (ML) and exa-scaling computing workloads already require multi-GPU systems that have NUMA characteristics. Given this evolving, massively parallel NUMA landscape, the placement of data on each chiplet, or discrete GPU card, and the scheduling of the threads that use that data is a critical factor in system performance and power consumption.
While improving ML inference has received significant attention, general-purpose compute units are still the main driver of data center’s total cost of ownership (TCO). CPUs consume 40% of the total data center power budget, half of which comes from the CPU pipeline’s frontend. Coupled with the hardware efficiency crisis is an increased desire for programmer productivity, flexible scalability, and nimble software updates that have led to the rise of software microservices.
In this talk, I will discuss these new paradigm shifts, addressing the following concerns: (1) how do we overcome the non-uniform memory access overhead for next-generation multi-chiplet GPUs in the era of ML-driven workloads?; (2) how do we improve the energy efficiency of data center’s CPUs in the light of microservices evolution?; and (3) how to study such rapidly-evolving systems with an accurate and extensible performance modeling?

Bio: Mahmoud Khairy is a 6th year Ph.D. candidate in the Department of Computer Engineering at Purdue University. He is a research assistant in Accelerator Architecture Lab at Purdue (AALP) advised by Professor Tim Rogers. His research interests include all aspects of computer architecture, compilers, and systems. His current focus is to overcome the slowing growth of Moore's law by building scalable and efficient hardware and compiler techniques for exascale computing, data center, and deep learning applications. His article on deep learning hardware evolution has been recognized in the press as well as by multiple venture capitals and industrial leaders. He has several research publications in top venues like ISCA, MICRO, and SIGMETRICS. Prior to entering graduate school, he worked as a software engineer at Microsoft and Mentor Graphics.

Homepage:  https://mkhairy.github.io/