Abstract:
The use of a wide Single-Instruction-Multiple-Data (SIMD) architecture is a promising approach to build energy-efficient high performance embedded processors. In this pap...Show MoreMetadata
Abstract:
The use of a wide Single-Instruction-Multiple-Data (SIMD) architecture is a promising approach to build energy-efficient high performance embedded processors. In this paper, based on our design framework for low-power SIMD processors, we propose a multiply-accumulate (MAC) unit with variable number of accumulator registers. The proposed MAC unit exploits both the merits of merged operation and register tiling. A Convolutional Neural Network (CNN) is a popular learning based algorithm due to its flexibility and high accuracy. However, a CNN-based application is often computationally intensive as it applies convolution operations extensively on a large data set. In this work, a CNN-based intelligent learning application is analyzed and mapped in the context of SIMD architectures. Experimental results show that the proposed architecture is efficient. In a 64-PE instance, the proposed SIMD processor with MAC4reg achieves an effective performance of 63.2 GOPS. Compared to the two baseline SIMD processors without MAC4reg, the proposed design brings 54.0% and 32.1% reduction in execution time, and 20.5% and 35.1% reduction in energy consumption respectively.
Published in: 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)
Date of Conference: 17-21 July 2016
Date Added to IEEE Xplore: 19 January 2017
ISBN Information: