skip to main content
10.1145/1023833.1023842acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

A low power architecture for embedded perception

Published: 22 September 2004 Publication History

Abstract

Recognizing speech, gestures, and visual features are important interface capabilities for future embedded mobile systems. Unfortunately, the real-time performance requirements of complex perception applications cannot be met by current embedded processors and often even exceed the performance of high performance microprocessors whose energy consumption far exceeds embedded energy budgets. Though custom ASICs provide a solution to this problem, they incur expensive and lengthy design cycles and are inflexible. This paper introduces a VLIW perception processor which uses a combination of clustered function units, compiler controlled dataflow and compiler controlled clock-gating in conjunction with a scratch-pad memory system to achieve high performance for perceptual algorithms at low energy consumption. The architecture is evaluated using ten benchmark applications taken from complex speech and visual feature recognition, security, and signal processing domains. The energy-delay product of a 0.13μ implementation of this architecture is compared against ASICs and general purpose processors. Using a combination of Spice simulations and real processor power measurements, we show that the cluster running at 1 GHz clock frequency outperforms a 2.4 GHz Pentium 4 by a factor of 1.75 while simultaneously achieving 159 times better energy delay product than a low power Intel XScale embedded processor.

References

[1]
D. Brash. The ARM Archtecture Version 6 (ARMv6). ARM Holdings plc Whitepaper, January 2002.
[2]
T. Callahan and J. Wawrzynek. Adapting software pipelining for reconfigurable computing. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), San Jose, CA, 2000. ACM.
[3]
Y. Cao, T. Sato, D. Sylvester, M. Orshansky, and C. Hu. New paradigm of predictive MOSFET and interconnect modeling for early circuit design. In Proceedings of the IEEE Custom Integrated Circuits Conference (CICC), pages 201--204, June 2000.
[4]
Y. Cao, T. Sato, D. Sylvester, M. Orshansky, and C. Hu. Predictive technology model. http://www.device.eecs.berkeley.edu/~ptm, 2002.
[5]
A. DeHon. DPGA-coupled microprocessors: Commodity ICs for the early 21st century. In D. A. Buell and K. L. Pocek, editors, IEEE Workshop on FPGAs for Custom Computing Machines, pages 31--39, Los Alamitos, CA, 1994. IEEE Computer Society Press.
[6]
R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits, 31(9):1277--1284, September 1996.
[7]
R. E. Gonzalez. Xtensa: A configurable and extensible processor. IEEE Micro, 20(2):60--70, March 2000.
[8]
M. K. Gowan, L. L. Biro, and D. B. Jackson. Power considerations in the design of the Alpha 21264 microprocessor. In Design Automation Conference, pages 726--731, 1998.
[9]
J. Hennessy and D. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 3rd edition, 2002.
[10]
J. Hoogerbrugge and L. Augusteijn. Instruction scheduling for TriMedia. Journal of Instruction-Level Parallelism, 1(1), Feb. 1999.
[11]
J. Hoogerbrugge, H. Corporaal, and H. Mulder. MOVE: a framework for high-performance processor design. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pages 692--701. ACM Press, 1991.
[12]
X. Huang, F. Alleva, H.-W. Hon, M.-Y. Hwang, K.-F. Lee, and R. Rosenfeld. The SPHINX-II speech recognition system: an overview. Computer Speech and Language, 7(2):137--148, 1993.
[13]
S. M. Joshi. Some fast speech processing algorithms using Altivec technology. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 2135 -- 2138, Mar. 1999.
[14]
B. Mathew. The Perception Processor. PhD thesis, School of Computing, University of Utah, Aug. 2004.
[15]
B. Mathew and A. Davis. A Loop Accelerator for Low Power Embedded VLIW Processors. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2004.
[16]
B. Mathew, A. Davis, and R. Evans. A characterization of visual feature recognition. In Proceedings of the IEEE 6th Annual Workshop on Workload Characterization (WWC-6), pages 3--11, October 2003.
[17]
B. Mathew, A. Davis, and Z. Fang. A low-power accelerator for the Sphinx 3 speech recognition system. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES '03), pages 210--219, October 2003.
[18]
B. Mathew, A. Davis, and A. Ibrahim. Perception coprocessors for embedded systems. In Proceedings of the Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia), pages 109--116, October 2003.
[19]
S. O. Memik, E. Bozorgzadeh, R. Kastner, and M. Sarrafzade. A super-scheduler for embedded reconfigurable systems. In Proceedings of the International Conference on Computer-Aided Design (ICCAD), page 391, Nov. 2001.
[20]
H. Nguyen and L. K. John. Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology. In International Conference on Supercomputing, pages 11--20, 1999.
[21]
S. Rixner, W. J. Dally, U. J. Kapasi, B. Khailany, A. Lopez-Lagunas, P. R. Mattson, and J. D. Owens. A bandwidth-efficient architecture for media processing. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture (MICRO-31), pages 3--13, Nov. 1998.
[22]
H. A. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23--38, 1998.
[23]
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Dec. 2001.
[24]
E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: Raw machines. IEEE Computer, 30(9):86--93, 1997.
[25]
N. H. E. Weste and K. Eshraghian. Principles of CMOS VLSI Design, A Systems Perspective. Addison Wesley, second edition, 1993.
[26]
H.-S. Yun and J. Kim. Power-aware modulo scheduling for high-performance vliw processors. In Proceedings of the 2001 International Symposium on Low Power Electronics and Design, pages 40--45. ACM Press, 2001.

Cited By

View all
  • (2018)Performance Analysis and Optimization of Automatic Speech RecognitionIEEE Transactions on Multi-Scale Computing Systems10.1109/TMSCS.2017.27391584:4(847-860)Online publication date: 1-Oct-2018
  • (2015)Memory Considerations for Low Energy Ray TracingComputer Graphics Forum10.1111/cgf.1245834:1(47-59)Online publication date: 1-Feb-2015
  • (2013)An energy and bandwidth efficient ray tracing architectureProceedings of the 5th High-Performance Graphics Conference10.1145/2492045.2492058(121-128)Online publication date: 19-Jul-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
September 2004
324 pages
ISBN:1581138903
DOI:10.1145/1023833
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 September 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. VLIW
  2. computer vision
  3. embedded systems
  4. low power design
  5. perception
  6. speech recognition
  7. stream processor

Qualifiers

  • Article

Conference

CASES04

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Performance Analysis and Optimization of Automatic Speech RecognitionIEEE Transactions on Multi-Scale Computing Systems10.1109/TMSCS.2017.27391584:4(847-860)Online publication date: 1-Oct-2018
  • (2015)Memory Considerations for Low Energy Ray TracingComputer Graphics Forum10.1111/cgf.1245834:1(47-59)Online publication date: 1-Feb-2015
  • (2013)An energy and bandwidth efficient ray tracing architectureProceedings of the 5th High-Performance Graphics Conference10.1145/2492045.2492058(121-128)Online publication date: 19-Jul-2013
  • (2010)Parallel scalable hardware implementation of asynchronous discrete particle swarm optimizationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2009.12.00123:2(177-187)Online publication date: 1-Mar-2010
  • (2009)StreamRayACM SIGARCH Computer Architecture News10.1145/2528521.150828237:1(325-336)Online publication date: 7-Mar-2009
  • (2009)StreamRayACM SIGPLAN Notices10.1145/1508284.150828244:3(325-336)Online publication date: 7-Mar-2009
  • (2009)StreamRayProceedings of the 14th international conference on Architectural support for programming languages and operating systems10.1145/1508244.1508282(325-336)Online publication date: 7-Mar-2009
  • (2008)Coherent ray tracing via stream filtering2008 IEEE Symposium on Interactive Ray Tracing10.1109/RT.2008.4634622(59-66)Online publication date: Aug-2008
  • (2008)Sensor Fusion on an Embedded System for Traffic Data Analysis - ETRADA-V System2008 11th International IEEE Conference on Intelligent Transportation Systems10.1109/ITSC.2008.4732560(894-899)Online publication date: Oct-2008
  • (2008)Video based Traffic Congestion Prediction on an Embedded System2008 11th International IEEE Conference on Intelligent Transportation Systems10.1109/ITSC.2008.4732555(950-955)Online publication date: Oct-2008
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media