skip to main content
research-article
Free access

Convolution engine: balancing efficiency and flexibility in specialized computing

Published: 23 March 2015 Publication History

Abstract

General-purpose processors, while tremendously versatile, pay a huge cost for their flexibility by wasting over 99% of the energy in programmability overheads. We observe that reducing this waste requires tuning data storage and compute structures and their connectivity to the data-flow and data-locality patterns in the algorithms. Hence, by backing off from full programmability and instead targeting key data-flow patterns used in a domain, we can create efficient engines that can be programmed and reused across a wide range of applications within that domain.
We present the Convolution Engine (CE)---a programmable processor specialized for the convolution-like data-flow prevalent in computational photography, computer vision, and video processing. The CE achieves energy efficiency by capturing data-reuse patterns, eliminating data transfer overheads, and enabling a large number of operations per memory access. We demonstrate that the CE is within a factor of 2--3× of the energy and area efficiency of custom units optimized for a single kernel. The CE improves energy and area efficiency by 8--15× over data-parallel Single Instruction Multiple Data (SIMD) engines for most image processing applications.<!-- END_PAGE_1 -->

References

[1]
Bakhoda, A., Yuan, G., Fung, W.W.L., Wong, H., Aamodt, T.M. Analyzing CUDA workloads using a detailed GPU simulator. In ISPASS: IEEE International Symposium on Performance Analysis of Systems and Software (2009).
[2]
Balfour, J., Dally, W., Black-Schaffer, D., Parikh, V., Park, J. An energy-efficient processor architecture for embedded systems. Comput. Architect. Lett. 7, 1 (2007), 29--32.
[3]
Bayer, B. Color Imaging Array. US Patent Application No. 3971065 (1976).
[4]
Chen, T.-C., Chien, S.-Y., Huang, Y.-W., Tsai, C.-H., Chen, C.-Y., Chen, T.-W., Chen, L.-G. Analysis and architecture design of an HDTV720p 30 frames/sec H.264/AVC encoder. IEEE Trans. Circuits Syst. Video Technol. 16, 6 (2006), 673--688.
[5]
Corbal, J., Valero, M., Espasa, R. Exploiting a new level of DLP in multimedia applications. In Proceedings of the 32nd Annual International Symposium on Microarchitecture (Nov. 1999), 72--79.
[6]
Gonzalez, R. Xtensa: A configurable and extensible processor. Micro IEEE 20, 2 (Mar. 2000), 60--70.
[7]
Hameed, R., Qadeer, W., Wachs, M., Azizi, O., Solomatnikov, A., Lee, B.C., Richardson, S., Kozyrakis, C., Horowitz, M. Understanding sources of inefficiency in general-purpose chips. In ISCA '10: Proceedings of the 37th Annual International Symposium on Computer Architecture (2010), ACM.
[8]
Hamilton, J.F., Adams, J.E. Adaptive Color Plane Interpolation in Single Sensor Color Electronic Camera. US Patent Application No. 5629734 (1997).
[9]
Leng, J., Gilani, S., Hetherington, T., Tantawy, A.E., Kim, N.S., Aamodt, T.M., Reddi, V.J. GPUWattch: Enabling energy optimizations in GPGPUs. In ISCA 2013: International Symposium on Computer Architecture (2013).
[10]
Lowe, D. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 2 (2004), 91--110.
[11]
NVIDIA Inc. Tegra mobile processors. http://www.nvidia.com/object/tegra-4-processor.html.
[12]
Shacham, O., Azizi, O., Wachs, M., Qadeer, W., Asgar, Z., Kelley, K., Stevenson, J., Solomatnikov A., Firoozshahian, A., Lee, B., Richardson, S., Horowitz, M. Rethinking digital design: Why design must change. IEEE Micro 30, 6 (Nov. 2010), 9--24.
[13]
Stratton, J.A., Rodrigues, C., Sung, I.-J., Obeid, N., Chang, L.W., Anssari, N., Liu, G.D., Hwu, W.-M.W. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing. IMPACT Technical Report. In IMPACT-12-01, 2012.
[14]
Tensilica Inc. Tensilica Instruction Extension (TIE) Language Reference Manual.
[15]
Texas Instruments Inc. OMAP 5 platform. www.ti.com/omap.
[16]
Venkatesh, G., Sampson, J., Goulding, N., Garcia, S., Bryksin, V., Lugo-Martinez, J., Swanson, S., Taylor, M.B. Conservation cores: Reducing the energy of mature computations. In ASPLOS'10 (2010), ACM.

Cited By

View all
  • (2024)RoboVisio: A Micro-Robot Vision Domain-Specific SoC for Autonomous Navigation Enabling Fully-on-Chip Intelligence via 2-MB eMRAMIEEE Journal of Solid-State Circuits10.1109/JSSC.2024.336835059:8(2644-2658)Online publication date: Aug-2024
  • (2024)Fused Functional Units for Area-Efficient CGRAs2024 25th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED60706.2024.10528780(1-8)Online publication date: 3-Apr-2024
  • (2023)Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUsACM Transactions on Architecture and Code Optimization10.1145/3600092Online publication date: 27-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 58, Issue 4
April 2015
86 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/2749359
  • Editor:
  • Moshe Y. Vardi
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 March 2015
Published in CACM Volume 58, Issue 4

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)363
  • Downloads (Last 6 weeks)74
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)RoboVisio: A Micro-Robot Vision Domain-Specific SoC for Autonomous Navigation Enabling Fully-on-Chip Intelligence via 2-MB eMRAMIEEE Journal of Solid-State Circuits10.1109/JSSC.2024.336835059:8(2644-2658)Online publication date: Aug-2024
  • (2024)Fused Functional Units for Area-Efficient CGRAs2024 25th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED60706.2024.10528780(1-8)Online publication date: 3-Apr-2024
  • (2023)Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUsACM Transactions on Architecture and Code Optimization10.1145/3600092Online publication date: 27-May-2023
  • (2023)Technology Prospects for Data-Intensive ComputingProceedings of the IEEE10.1109/JPROC.2022.3218057111:1(92-112)Online publication date: Jan-2023
  • (2023)Accelerating Image Processing Using Reduced Precision Calculation Convolution EnginesJournal of Signal Processing Systems10.1007/s11265-023-01869-595:9(1115-1126)Online publication date: 9-May-2023
  • (2022)Microarchitectural Attacks in Heterogeneous Systems: A SurveyACM Computing Surveys10.1145/354410255:7(1-40)Online publication date: 15-Jun-2022
  • (2022)Efficient and scalable core multiplexing with M³vProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507741(452-466)Online publication date: 28-Feb-2022
  • (2022)The Mozart reuse exposed dataflow processor for AI and beyondProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3533040(978-992)Online publication date: 18-Jun-2022
  • (2022)The Art of Scaling: Distributed and Connected to Sustain the Golden Age of Computation2022 IEEE International Solid- State Circuits Conference (ISSCC)10.1109/ISSCC42614.2022.9731536(25-31)Online publication date: 20-Feb-2022
  • (2022)Efficient Hardware Architectures for Accelerating Deep Neural Networks: SurveyIEEE Access10.1109/ACCESS.2022.322976710(131788-131828)Online publication date: 2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDFChinese translation

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media