Department of Electrical and Computer Engineering Ph.D. Public Defense

Architectures for Accelerated Computation Beyond the von Neumann Hardware

Richard Afoakwa

Supervised by Professor Michael C. Huang

Friday, March 5, 2021
1 p.m.

The von Neumann model of computing systems has pioneered modern computer architecture and driven general-purpose computing in the last 75 years. Over the years, architects have leveraged Moore’s law and Denard’s scaling to provide micro-architectural techniques for incredible performance gains. But, these factors have slowed over the last decade. Due to the ever-changing complexity and scale of problem-sets that a computational system is required to solve, architects need fresh inspiration for over- all system performance improvements - with specific focus on energy efficiency. One such approach lies in the ability to accelerate data movement and, perhaps, to entirely re-think the von Neumann model for a more specialized accelerated computation.

The first section of this work explores interconnect-based approaches to efficiently accelerate data movement by utilizing near-speed-of-light communication substrate. Due to smaller, faster, and energy efficient transistor devices, it is possible to architect energy-efficient high-speed circuitry as the backbone of intra- and inter-chip communication. Such a system is composed of carefully designed high-speed transceiver circuitry and supporting transmission line fabric. Compared to conventional repeated- wires, high-speed communication offers low-latency, low-energy, as well as high- throughput.

Firstly, we explore an approach to increase link bandwidth through multi-bit transmission by pulse-amplitude modulation (PAM). We show that by using higher swing voltage, it is possible to architect such multi-bit transmission. The high-speed circuity can be effectively adapted to transmission distance, thereby more judiciously expending energy. Secondly, we further improve the link bandwidth by architecting dense, fine-pitch, high-speed transmission lines routed through interposer substrate. We show that such a system is capable of supporting a more modular-based architecture platform for multiple processors, memories, and I/O integration for data acceleration.  Also, we propose utilizing multipoint-to-multipoint high-speed transmission. The design enables splitting the physical links into segments allowing for multicast and broadcast capability, and improved overall system concurrency. Improved concurrency translates to higher raw throughput and performance. Finally, we propose utilizing the optimizations mentioned above for 2.5D system architecture. We show that state-of-the-art PAM signaling, coupled with dense interposer-based links can effectively produce similar performance benefits to a fully integrated 3D system. With the added benefits of higher energy efficiency.

In the second part of this work, we explore a non-von Neumann approach to accelerated computing. There exist large classes of problems that cannot be mapped to a conventional von Neumann machine. For example, combinatorial optimizing problems. Therefore, we explore CMOS-compatible Ising machines to provide a means of specialized accelerated computation. Abstract problems can be mapped on such a hardware, and physics naturally guides the dynamics to an optimal solution. The idea for Ising machines have been explored in the physics domain over the years. But, the architecture of such a hardware is yet to be fully exploited in a conventional CMOS system. Recently, physicists have proposed quantum, optical, and oscillatory approaches as a means of architecting Ising machines. We propose the architecture of chip-scale, integrated circuit based designs for more immediate applications. Our design utilizes bistable, resistively-coupled networks. And we show that such an Ising machine out-performs the room-sized quantum and optical approaches in all metrics; speed, area, energy, and quality of solution.