Abstract:
Convolutional neural networks (CNNs) have achieved significant success in various applications. Numerous hardware accelerators are introduced to accelerate CNN execution ...Show MoreMetadata
Abstract:
Convolutional neural networks (CNNs) have achieved significant success in various applications. Numerous hardware accelerators are introduced to accelerate CNN execution with improved energy efficiency compared to traditional software implementations. Despite the achieved success, deploying traditional hardware accelerators for bulky CNNs on current and emerging smart devices is impeded by limited resources, including memory, power, area, and computational capabilities. Recent works introduced processing-in-memory (PIM), a non-Von-Neumann architecture, which is a promising approach to tackle the problem of data movement between logic and memory blocks. However, as observed from the literature, the existing PIM architectures cannot congregate all the computational operations due to limited programmability and flexibility. Furthermore, the capabilities of the PIM are challenged by the limited available on-chip memory. To enable faster computations and address the limited on-chip memory constraints, this work introduces a novel reconfigurable approximate computing (AC)-based PIM, termed reconfigurable approximate PIM (ReApprox-PIM). The proposed ReApprox-PIM is capable of addressing the two challenges mentioned above in the following manner: 1) it utilizes a programmable lookup-table (LUT)-based processing architecture that can support different AC techniques via programmability and 2) followed by resource-efficient, fast CNN computing via the implementation of highly optimized AC techniques. This results in improved computing footprint, operational parallelism, and reduced computational latency and power consumption compared to prior PIMs relying on exact computations for CNN inference acceleration at a minimal sacrifice of accuracy. We have evaluated the proposed ReApprox-PIM on various CNN architectures, for inference applications, including standard LeNet, AlexNet, ResNet-18, −34, and −50. Our experimental results show that the ReApprox-PIM achieves a speedup of $...
Published in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( Volume: 43, Issue: 8, August 2024)