
Open access
Author
Date
2021Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
In recent years, deep neural networks (DNNs) have revolutionized the field of machine learning. DNNs have facilitated the presence of AI in day-to-day life by providing unprecedented, human-like performance on real-life applications including natural language processing, image recognition, recommendation systems in commercial applications and autonomous driving.
Their ability to perform such tasks with high accuracy and low latency has made DNNs ubiquitous in both industrial and domestic settings, such as in wearable devices, household appliances, home and commercial automation, and healthcare systems.
While DNNs have historically been executed on general purpose processors, their presence in everyday life, in consumer and industrial applications, has necessitated the design of custom hardware platforms such as application-specific integrated circuits (ASICs) for low power and efficient acceleration of DNNs.
Although custom hardware accelerators are widely employed and have become mainstream, they still suffer from the inherent limitations of the von Neumann computing paradigm. Specifically, the synaptic weights of the DNNs must be shuttled back and forth between the memory and the compute units several times during execution.
As the synaptic weights in state-of-the-art DNNs typically amount to millions, a bottleneck such as this can severely compromise performance.
In-memory computing (IMC) is a non-von Neumann paradigm that has recently established itself as an energy-efficient approach for high throughput hardware in deep learning applications. Fundamentally, IMC revolves around the concept of a crossbar array of memory devices. IMC allows the performance of matrix-vector multiplications in O(1) time complexity by mapping the synaptic weights of a DNN layer to the charge or conductance state of the memory devices at the crosspoints. As the computation occurs by means of the memory devices themselves, IMC obviates the von Neumann bottleneck, thus removing the need for data transfer of the synaptic weights.
The great advantage of employing IMC for DNN applications results from the necessity of rethinking architectural and computational concepts to suit this new computing paradigm. Specifically, the presence of synaptic weights that are physically stationary and ready for computation requires the development of a new IMC core architecture and of new methods for facilitating the dataflow through such cores.
In this work, we explore the design-space resulting from the use of IMC and present new architectural, communicational, and algorithmic concepts and methodologies that yield low-power, high-throughput acceleration of inference of Convolutional Neural Networks (CNNs).
We present a new IMC core comprising a crossbar array and light digital logic. Such IMC cores can be interconnected to form a dataflow engine for large-scale executions.
Moreover, we investigate in depth the optimization of the mapping of weights on the IMC memory devices and of activations on the local memories.
A key challenge lies in designing an efficient communication fabric for interconnecting the IMC cores. Thus, in this work, we present a new methodology based on graph homomorphism verification for designing communication fabrics for IMC-based hardware. Hinging on this novel approach, we present one such communication fabric that facilitates the pipelined execution of all state-of-the-art CNNs.
While IMC obviates the need to transfer synaptic weights, activations still need to be communicated from one compute unit to another. To that end, we present a novel approach to optimizing neural networks that reduces power dissipation of communicating activations with bit-serial interfaces.
This thesis presents a comprehensive exploration of the IMC design space. Indeed, our work achieves state-of-the-art energy efficiency and surpasses state-of-the-art throughput in inference of CNNs on our proposed dataflow engine. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000540786Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichOrganisational unit
03996 - Benini, Luca / Benini, Luca
More
Show all metadata
ETH Bibliography
yes
Altmetrics