Abstract:
It is critical to continously improve the hardware efficiency of deep neural network accelerators for its application on resource constrained platform. This brief propose...Show MoreMetadata
Abstract:
It is critical to continously improve the hardware efficiency of deep neural network accelerators for its application on resource constrained platform. This brief proposes a lane shared bit-pragmatic architecture to address the synchronization induced performance bottleneck and hence further improve the performance and efficiency of bit-serial computing architecture. The effectiveness and efficiency of the proposed architecture are demonstrated by extensive evaluation results.
Published in: IEEE Transactions on Circuits and Systems II: Express Briefs ( Volume: 68, Issue: 1, January 2021)