research-article

Continuous real-world inputs can open up alternative accelerator designs

Authors:

Antoine Joubert,

Rodolphe Héliot,

Olivier TemamAuthors Info & Claims

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

Pages 1 - 12

https://doi.org/10.1145/2485922.2485923

Published: 23 June 2013 Publication History

Abstract

Motivated by energy constraints, future heterogeneous multi-cores may contain a variety of accelerators, each targeting a subset of the application spectrum. Beyond energy, the growing number of faults steers accelerator research towards fault-tolerant accelerators.

In this article, we investigate a fault-tolerant and energy-efficient accelerator for signal processing applications. We depart from traditional designs by introducing an accelerator which relies on unary coding, a concept which is well adapted to the continuous real-world inputs of signal processing applications. Unary coding enables a number of atypical micro-architecture choices which bring down area cost and energy; moreover, unary coding provides graceful output degradation as the amount of transient faults increases.

We introduce a configurable hybrid digital/analog micro-architecture capable of implementing a broad set of signal processing applications based on these concepts, together with a back-end optimizer which takes advantage of the special nature of these applications. For a set of five signal applications, we explore the different design tradeoffs and obtain an accelerator with an area cost of 1.63mm². On average, this accelerator requires only 2.3% of the energy of an Atom-like core to implement similar tasks. We then evaluate the accelerator resilience to transient faults, and its ability to trade accuracy for energy savings.

References

[1]

"TMS320C6000 CPU and instruction set reference guide," Texas Instruments, Tech. Rep., 2006.

[2]

R. S. Amant, D. A. Jimenez, and D. Burger, "Low-power, high-performance analog neural branch prediction," in International Symposium on Microarchitecture, Como, 2008.

Digital Library

[3]

J. V. Arthur and K. Boahen, "Silicon-Neuron Design: A Dynamical Systems Approach," Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 58, no. 99, p. 1, 2011.

[4]

N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1--7, Aug. 2011.

Digital Library

[5]

K. A. Boahen, "Point-to-point connectivity between neuromorphic chips using address events," IEEE Transactions on Circuits and Systems, vol. 47, no. 5, pp. 416--434, 2000.

[6]

S. Borkar, "Design perspectives on 22nm CMOS and beyond," in Design Automation Conference, Jul. 2009, pp. 93--94.

Digital Library

[7]

D. Burger, "Future Architectures will Incorporate HPUs (keynote)," in International Symposium on Microarchitecture, 2011.

[8]

L. N. Chakrapani, B. E. S. Akgul, S. Cheemalavagu, P. Korkmaz, K. V. Palem, and B. Seshasayee, "Ultra-efficient (embedded) SOC architectures based on probabilistic CMOS (PCMOS) technology," in Design, Automation and Test in Europe Conference, Munich, 2006, p. 1110.

Digital Library

[9]

A. Chanthbouala, V. Garcia, R. O. Cherifi, K. Bouzehouane, S. Fusil, X. Moya, S. Xavier, H. Yamada, C. Deranlot, N. D. Mathur, M. Bibes, A. Barthélémy, and J. Grollier, "A ferroelectric memristor." Nature materials, vol. 11, no. 10, pp. 860--4, Oct. 2012. {Online}. Available: http://dx.doi.org/10.1038/nmat3415

[10]

S. Deneve, "Bayesian Spiking Neurons I: Inference," Neural Computation, vol. 117, pp. 91--117, 2008.

Digital Library

[11]

C. Eliasmith and C. H. Anderson, Neural Engineering: Computation, Representation and Dynamics in Neurobiological Systems. MIT Press, 2003.

Digital Library

[12]

H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger, "Dark Silicon and the End of Multicore Scaling," in Proceedings of the 38th International Symposium on Computer Architecture (ISCA), Jun. 2011.

Digital Library

[13]

H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Architecture support for disciplined approximate programming," in ASPLOS, T. Harris and M. L. Scott, Eds. ACM, 2012, pp. 301--312.

Digital Library

[14]

H. Esmaeilzadeh, "Neural Acceleration for General-Purpose Approximate Programs," in International Symposium on Microarchitecture, 2012.

Digital Library

[15]

K. Fan, M. Kudlur, G. S. Dasika, and S. A. Mahlke, "Bridging the computation gap between programmable processors and hardwired accelerators," in HPCA. IEEE Computer Society, 2009, pp. 313--322.

[16]

W. Gerstner and W. M. Kistler, Spiking Neuron Models. Cambridge University Press, 2002.

Digital Library

[17]

T. S. Hall, C. M. Twigg, J. D. Gray, P. Hasler, and D. V. Anderson, "Large-Scale Field-Programmable Analog Arrays for Analog Signal Processing," IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 52, no. 11, pp. 2298--2307, 2005.

[18]

R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. C. Lee, S. Richardson, C. Kozyrakis, and M. Horowitz, "Understanding sources of inefficiency in general-purpose chips," in International Symposium on Computer Architecture. New York, New York, USA: ACM Press, 2010, p. 37.

Digital Library

[19]

A. Hashmi, A. Nere, J. J. Thomas, and M. Lipasti, "A case for neuromorphic ISAs," in International Conference on Architectural Support for Programming Languages and Operating Systems. New York, NY: ACM, 2011.

Digital Library

[20]

A. Joubert, B. Belhadj, O. Temam, and R. Heliot, "Hardware Spiking Neurons Design: Analog or Digital?" in International Joint Conference on Neural Networks, Brisbane, 2012.

[21]

M. D. Kruijf, S. Nomura, and K. Sankaralingam, "Relax: An Architectural Framework for Software Recovery of Hardware Faults," in International Symposium on Computer Architecture. Saint-Malo: ACM Press, 2010.

Digital Library

[22]

I. Kuon and J. Rose, "Measuring the gap between FPGAs and ASICs," in International Symposium on Field Programmable Gate Arrays, ser. FPGA '06. New York, NY, USA: ACM, Feb. 2006, pp. 21--30.

Digital Library

[23]

G. Lemieux, E. Lee, M. Tom, and A. Yu, "Directional and Single-Driver Wires in FPGA Interconnect," in International Conference on Field-Programmable Technology. IEEE, 2004, pp. 41--48.

[24]

S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 42. New York, NY, USA: ACM, 2009, pp. 469--480.

Digital Library

[25]

C. Mead, Analog VLSI and Neural Systems. Addison-Wesley, 1989.

Digital Library

[26]

P. Merolla, J. Arthur, F. Akopyan, N. Imam, R. Manohar, and D. Modha, "A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm," in IEEE Custom Integrated Circuits Conference. IEEE, Sep. 2011, pp. 1--4.

[27]

T. K. Moon and W. C. Stirling, Mathematical Methods and Algorithms for Signal Processing. Prentice Hall, 1999.

[28]

A. Rose, Advances in Electronics, Vol. 1, L. N. Y. A. P. Martin, Ed., 1948.

[29]

U. Rutishauser and R. J. Douglas, "State-dependent computation using coupled recurrent networks," Neural computation, vol. 21, no. 2, pp. 478--509, 2009.

Digital Library

[30]

R. Sarpeshkar and M. O'Halloran, "Scalable hybrid computation with spikes." Neural computation, vol. 14, no. 9, pp. 2003--2038, 2002.

Digital Library

[31]

J. Schemmel, J. Fieres, and K. Meier, "Wafer-scale integration of analog neural networks," in International Joint Conference on Neural Networks. Ieee, Jun. 2008, pp. 431--438.

[32]

R. Serrano-Gotarredona, M. Oster, P. Lichtsteiner, A. Linares-Barranco, R. Paz-Vicente, F. Gomez-Rodriguez, L. Camunas-Mesa, R. Berner, M. Rivas-Perez, T. Delbruck, S.-C. Liu, R. Douglas, P. Hafliger, G. Jimenez-Moreno, A. Civit Ballcels, T. Serrano-Gotarredona, A. J. Acosta-Jimenez, and B. Linares-Barranco, "CAVIAR: a 45k neuron, 5M synapse, 12G connects/s AER hardware sensory-processing- learning-actuating system for high-speed visual object recognition and tracking." IEEE transactions on neural networks, vol. 20, no. 9, pp. 1417--38, Sep. 2009.

Digital Library

[33]

S. Sethumadhavan, R. Roberts, and Y. Tsividis, "A Case for Hybrid Discrete-Continuous Architectures," IEEE Computer Architecture Letters, vol. 99, no. RapidPosts, 2011.

Digital Library

[34]

R. Silver, K. Boahen, S. Grillner, N. Kopell, and K. L. Olsen, "Neurotech for neuroscience: unifying concepts, organizing principles, and emerging tools." The Journal of neuroscience: the official journal of the Society for Neuroscience, vol. 27, no. 44, pp. 11 807--19, Oct. 2007.

[35]

M. V. Srinivasan and G. D. Bernard, "A proposed mechanism for multiplication of neural signals," Biological Cybernetics, vol. 21, no. 4, pp. 227--236, 1976.

Digital Library

[36]

Steve Keckler, "Life After Dennard and How I Learned to Love the Picojoule (keynote)," in International Symposium on Microarchitecture, Sao Paolo, Dec. 2011, p. Keynote presentation.

[37]

O. Temam, "A Defect-Tolerant Accelerator for Emerging High-Performance Applications," in International Symposium on Computer Architecture, Portland, Oregon, 2012.

Digital Library

[38]

O. Temam and R. Heliot, "Implementation of signal processing tasks on neuromorphic hardware," in International Joint Conference on Neural Networks. IEEE, Jul. 2011, pp. 1120--1125.

[39]

W. Thies, M. Karczmarek, and S. P. Amarasinghe, "StreamIt: A Language for Streaming Applications," in Compiler Construction, ser. Lecture Notes in Computer Science, vol. 2304. Berlin, Heidelberg: Springer, Mar. 2002.

Digital Library

[40]

B. P. Tripp and C. Eliasmith, "Population models of temporal differentiation." Neural computation, vol. 22, no. 3, pp. 621--659, 2010.

Digital Library

[41]

A. van Schaik, "Building blocks for electronic spiking neural networks." Neural networks, vol. 14, no. 6-7, pp. 617--628, 2001.

[42]

G. Venkatesh, J. Sampson, N. Goulding-hotta, S. K. Venkata, M. B. Taylor, and S. Swanson, "QsCORES: Trading Dark Silicon for Scalable Energy Efficiency with Quasi-Specific Cores Categories and Subject Descriptors," in International Symposium on Microarchitecture, 2011.

Digital Library

[43]

R. J. Vogelstein, U. Mallik, J. T. Vogelstein, and G. Cauwenberghs, "Dynamically reconfigurable silicon array of spiking neurons with conductance-based synapses," IEEE Transactions on Neural Networks, vol. 18, no. 1, pp. 253--265, 2007.

Digital Library

Cited By

Koraei MCebrian JJahre M(2023)Near-optimal multi-accelerator architectures for predictive maintenance at the edgeFuture Generation Computer Systems10.1016/j.future.2022.10.030140:C(331-343)Online publication date: 8-Feb-2023
https://dl.acm.org/doi/10.1016/j.future.2022.10.030
Reis LWanner L(2021)Functional Approximation and Approximate Parallelization with the ACCEPT compiler2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD53543.2021.00030(188-197)Online publication date: Oct-2021
https://doi.org/10.1109/SBAC-PAD53543.2021.00030
Chen JWen SShi KYang Y(2021)Highly parallelized memristive binary neural networkNeural Networks10.1016/j.neunet.2021.09.016144:C(565-572)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1016/j.neunet.2021.09.016
Show More Cited By

Recommendations

Continuous real-world inputs can open up alternative accelerator designs
ICSA '13

Motivated by energy constraints, future heterogeneous multi-cores may contain a variety of accelerators, each targeting a subset of the application spectrum. Beyond energy, the growing number of faults steers accelerator research towards fault-tolerant ...
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 2006 ASPLOS Conference

GPUs are difficult to program for general-purpose uses. Programmers can either learn graphics APIs and convert their applications to use graphics pipeline operations or they can use stream programming abstractions of GPUs. We describe Accelerator, a ...
Cross-Accelerator Performance Profiling
XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale

The computing requirements of scientific applications have influenced processor design, and have motivated the introduction and use of many-core processors, i.e., accelerators, for high performance computing (HPC). Consequently, it is now common for the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

June 2013

686 pages

ISBN:9781450320795

DOI:10.1145/2485922

General Chair:
Avi Mendelson
Technion

ACM SIGARCH Computer Architecture News Volume 41, Issue 3
ICSA '13
June 2013
666 pages
ISSN:0163-5964
DOI:10.1145/2508148
Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE CS

In-Cooperation

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Agence Nationale de la Recherche

Conference

ISCA'13

Sponsor:

ISCA'13: The 40th Annual International Symposium on Computer Architecture

June 23 - 27, 2013

Tel-Aviv, Israel

Acceptance Rates

ISCA '13 Paper Acceptance Rate 56 of 288 submissions, 19%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

58
Total Citations
View Citations
1,128
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Koraei MCebrian JJahre M(2023)Near-optimal multi-accelerator architectures for predictive maintenance at the edgeFuture Generation Computer Systems10.1016/j.future.2022.10.030140:C(331-343)Online publication date: 8-Feb-2023
https://dl.acm.org/doi/10.1016/j.future.2022.10.030
Reis LWanner L(2021)Functional Approximation and Approximate Parallelization with the ACCEPT compiler2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD53543.2021.00030(188-197)Online publication date: Oct-2021
https://doi.org/10.1109/SBAC-PAD53543.2021.00030
Chen JWen SShi KYang Y(2021)Highly parallelized memristive binary neural networkNeural Networks10.1016/j.neunet.2021.09.016144:C(565-572)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1016/j.neunet.2021.09.016
Yakopcic CFernando BTaha T(2019)Design Space Evaluation of a Memristor Crossbar Based Multilayer Perceptron for Image Processing2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8852005(1-8)Online publication date: Jul-2019
https://doi.org/10.1109/IJCNN.2019.8852005
Yazdanbakhsh ASong CSacks JLotfi-Kamran PEsmaeilzadeh HKim NEvripidou SStenström PO'Boyle M(2018)In-DRAM near-data approximate acceleration for GPUsProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243188(1-14)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243188
Li YMa SGuo YChen GXu R(2018)Single-Channel Dataflow for Convolutional Neural Network Accelerator2018 IEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC)10.1109/ITOEC.2018.8740349(966-970)Online publication date: Dec-2018
https://doi.org/10.1109/ITOEC.2018.8740349
Yazdanbakhsh AFalahati HWolfe PSamadi KKim NEsmaeilzadeh H(2018)GANAXProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00060(650-661)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00060
Jeong HShi L(2018)Memristor devices for neural networksJournal of Physics D: Applied Physics10.1088/1361-6463/aae22352:2(023003)Online publication date: 30-Oct-2018
https://doi.org/10.1088/1361-6463/aae223
Luo TLiu SLi LWang YZhang SChen TXu ZTemam OChen Y(2017)DaDianNaoIEEE Transactions on Computers10.1109/TC.2016.257435366:1(73-88)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1109/TC.2016.2574353
Stuart MManic M(2017)Survey of progress in deep neural networks for resource-constrained applicationsIECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society10.1109/IECON.2017.8217271(7259-7266)Online publication date: Oct-2017
https://doi.org/10.1109/IECON.2017.8217271
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten