Department of Electrical and Computer Engineering Ph.D. Public Defense

Reducing Data Movement Energy on Dense and Sparse Linear Algebra Workloads: From Machine Learning to High Performance Scientific Computing

Ben Feinberg

Supervised by Professor Engin Ipek

Wednesday, July 17, 2019
2 p.m.

Computer Studies Building, Room 601

Linear algebra is ubiquitous across virtually every field of science and engineering, from climate modeling to machine learning. This ubiquity makes linear algebra a prime candidate for hardware optimizations, which can improve both the run time and the energy efficiency of a wide range of applications. Historically, these optimizations have leveraged Moore’s law and Dennard scaling for smaller, faster, and more efficient devices. Over the past decade, however, these two trends have slowed or stopped, and the problem of data movement energy has become a major limiting factor. To achieve further improvements in linear algebra requires new computing paradigms and optimizations to existing memory systems.

One promising technique to enable further improvements to linear algebra is to design in-situ accelerators that leverage emerging resistive memory technologies. These accelerators exploit the electrical properties of a resistive mesh to perform matrix vector multiplication (MVM)—a fundamental linear algebra kernel—within the memory array itself. This allows the result to be computed in, and read from, the memory arrays directly (rather than reading all of the operands individually), which reduces costly data movement. Regrettably, this prior work has a number of significant shortcomings that restricts the applicability of in-situ MVM to a limited set of machine learning    applications  rather  than  the  entire  range  of  MVM-dominated  linear algebra workloads.

This thesis first presents an error correction scheme for in-situ MVM that addresses both hard and soft errors. Notably, the same electrical properties that provide the performance and energy improvements for in-situ accelerators increase the susceptibility to errors. Even in highly error tolerant neural network applications, the errors induced through noise have a significant impact on the overall accuracy. Complicating these reliability issues is the fact that in-situ computation cannot use traditional error correction schemes such as Hamming codes. Instead, this thesis proposes error correction based on arithmetic codes, where data is encoded through multiplication by an integer. This basic scheme is further improved by data-aware encoding to exploit the state dependence of the errors, and by knowledge of how critical each portion of the computation is to overall system accuracy. By leveraging these optimizations, the proposed error correction scheme can increase the accuracy of machine learning workloads on in-situ accelerators, and enables in-situ accelerators to become applicable to applications without high levels of tolerance to error.

Building on these error correction capabilities, this thesis presents a set of optimizations to enable scientific computing on in-situ MVM accelerators. Unlike neural network applications where 8- to 16-bit fixed-point computation is acceptable, scientific computing requires high precision floating-point. To enable efficient high-precision floating point computation without the probative costs of naive  floating point emulation, three techniques are proposed: reducing overheads by exploiting exponent range locality, early termination of fixed-point computation, and static operation scheduling.  Additionally, the matrices found in scientific computing are typically sparse, resulting in significant overheads when using in-situ accelerators designed for dense matrices. To remedy this limitation, a heterogeneous collection of crossbars with varying sizes is proposed that efficiently handles sparse matrices, and an algorithm for mapping  the  dense  subblocks  of  a sparse matrix to an appropriate set of crossbars is developed. Taken together, these techniques enable an even broader range of applications on in-situ accelerators while still preserving the performance and energy efficiency advantages over conventional, von-Neumann systems.

Despite its promise, in-situ computation is unlikely to entirely replace von-Neumann systems for linear algebra. Even the aforementioned in-situ accelerators for scientific computing rely on a conventional processor to augment the memristive crossbars, with attendant data movement costs. Therefore, further optimizations to data movement energy are still needed. This thesis presents work on commutative data reordering (CDR), a new technique that leverages the commutative property in linear algebra to strategically select the lowest energy order in which data can be transmitted. This work shows how ordering the operands can be reduced to an instance of the traveling salesman problem, enabling the application of numerous well-studied optimization techniques. Unlike prior work that exploits data similarity to reduce data movement energy, CDR takes full advantage of application-level characteristics to improve data similarity with zero metadata or bandwidth overhead to support the reordering.