skip to main content
10.1145/1878921.1878925acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

Resource recycling: putting idle resources to work on a composable accelerator

Published: 24 October 2010 Publication History

Abstract

Mobile computing platforms in the form of smart phones, netbooks, and personal digital assistants have become an integral part of our everyday lives. Moving ahead to the future, mobile multimedia support will become a key differentiating factor for customers. Features such as high-definition audio and video, video conferencing, 3D graphics, and image projection will lead to the adoption of one phone over another. However, in contrast to wireless signal processing which is dominated by vectorizable computation, mobile multimedia applications often contain complex control flow and variable computational requirements. Moreover, data access is more complex where media applications typically operate on multi-dimensional vectors of data rather than single-dimensional vectors with simple strides. To handle these complexities, composable accelerators such as the Polymorphic Pipeline Array, or PPA, present an appealing hardware platform by adding a degree of hardware configurability over existing accelerators. Hardware resources can be both statically as well as dynamically partitioned among executing tasks to maximize execution efficiency. However, an effective compilation framework is essential to partition and assign resources to make intelligent use of the available hardware. In this paper, a compilation framework is introduced that maximizes application throughput with hybrid resource partitioning of a PPA system. Static partitioning handles part of the resource assignment, but this is followed up by dynamic partitioning to identify idle resources and put them to use -- resource recycling. Experimental results show that real-time media applications can take advantage of the static and dynamic configurability of the PPA for increase.
throughput.

References

[1]
K. Berkel, F. Heinle, P. Meuwissen, K. Moerman, and M. Weiss. Vector processing as an enabler for software-defined radio in handheld devices. EURASIP Journal Applied Signal Processing, 2005(1):2613--2625, 2005.
[2]
H. Bluethgen, C. Grassmann, W. Raab, and U. Ramacher. A programmable platform for software-defined radio. In Intl. Symposium on System-on-a-Chip, pages 15--20, Nov. 2003.
[3]
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5):720--748, 1999.
[4]
C. Ebeling et al. Mapping applications to the RaPiD configurable architecture. In Proc. of the 5th IEEE Symposium on Field-Programmable Custom Computing Machines, pages 106--115, Apr. 1997.
[5]
J. Glossner, E. Hokenek, and M. Moudgill. The sandbridge sandblaster communications processor. In Proc. of the 2004 Workshop on Application Specific Processors, pages 53--58, Sept. 2004.
[6]
S. Goldstein et al. PipeRench: A coprocessor for streaming multimedia acceleration. In Proc. of the 26th Annual International Symposium on Computer Architecture, pages 28--39, June 1999.
[7]
M. Gordon, W. Thies, M. Karczmarek, J. Lin, A. Meli, A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A stream compiler for communication-exposed architectures. In Tenth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 291--303, Oct. 2002.
[8]
M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In 14th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 151--162, 2006.
[9]
IBM. Cell Broadband Engine Architecture, Mar. 2006.
[10]
E. Ipek, M. Kirman, N. Kirman, and J. Martinez. Core fusion: Accommodating software diversity in chip multiprocessors. In Proc. of the 34th Annual International Symposium on Computer Architecture, pages 186--197, 2007.
[11]
C. Kim, S. Sethumadhavan, M. Govindan, N. Ranganathan, D. Gulati, D. Burger, and S. W. Keckler. Composable lightweight processors. In Proc. of the 40th Annual International Symposium on Microarchitecture, pages 381--393, Dec. 2007.
[12]
M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In Proc. of the SIGPLAN '08 Conference on Programming Language Design and Implementation, pages 114--124, June 2008.
[13]
E. Lee and D. Messerschmitt. Synchronous data flow. Proceedings of the IEEE, 75(9):1235--1245, 1987.
[14]
W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. Amarasinghe. Space-time scheduling of instruction-level parallelism on a RAW machine. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 46--57, Oct. 1998.
[15]
Y. Lin et al. Soda: A low-power architecture for software radio. In Proc. of the 33rd Annual International Symposium on Computer Architecture, pages 89--101, June 2006.
[16]
B. Mei et al. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In Proc. of the 2003 Design, Automation and Test in Europe, pages 296--301, Mar. 2003.
[17]
B. Mei, A. Lambrechts, J. Y. Mignolet, D. Verkest, and R. Lauwereins. Architecure exploration for a reconfigurable architecture template. In Proc. of the 2005 Design, Automation and Test in Europe, pages 90--101, Mar. 2005.
[18]
H. Park, K. Fan, S. Mahlke, T. Oh, H. Kim, and H. seok Kim. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In Proc. of the 17th International Conference on Parallel Architectures and Compilation Techniques, pages 166--176, Oct. 2008.
[19]
H. Park, Y. Park, and S. Mahlke. Polymorphic pipeline array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications. In Proc. of the 42nd Annual International Symposium on Microarchitecture, pages 370--380, Dec. 2009.
[20]
Y. Park, H. Park, and S. Mahlke. Cgra express: Accelerating execution using dynamic operation fusion. In Proc. of the 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 271--280, Oct. 2009.
[21]
B. R. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proc. of the 27th Annual International Symposium on Microarchitecture, pages 63--74, Nov. 1994.
[22]
W. Thies, V. Chandrasekhar, and S. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in c programs. In Proc. of the 40th Annual International Symposium on Microarchitecture, Dec. 2007.
[23]
W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In Proc. of the 2002 International Conference on Compiler Construction, pages 179--196, 2002.
[24]
M. Woh et al. From SODA to scotch: The evolution of a wireless baseband processor. In Proc. of the 41st Annual International Symposium on Microarchitecture, pages 152--163, Nov. 2008.
[25]
H. Zhong, K. Fan, S. Mahlke, and M. Schlansker. A distributed control path architecture for VLIW processors. In Proc. of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 197--206, Sept. 2005.

Cited By

View all
  • (2018)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_12(427-472)Online publication date: 14-Oct-2018
  • (2013)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-1-4614-6859-2_18(553-592)Online publication date: 10-May-2013

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
October 2010
276 pages
ISBN:9781605589039
DOI:10.1145/1878921
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • CEDA
  • IEEE CAS
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. coarse-grained reconfigurable architecture
  2. composable accelerator
  3. dynamic partitioning
  4. modulo scheduling
  5. workload balancing

Qualifiers

  • Research-article

Conference

ESWeek '10
ESWeek '10: Sixth Embedded Systems Week
October 24 - 29, 2010
Arizona, Scottsdale, USA

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_12(427-472)Online publication date: 14-Oct-2018
  • (2013)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-1-4614-6859-2_18(553-592)Online publication date: 10-May-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media