skip to main content
10.1145/2463209.2488894acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Optimizations for configuring and mapping software pipelines in many core systems

Published: 29 May 2013 Publication History

Abstract

Efficiently utilizing the computational resources of many core systems is one of the most prominent challenges. The problem worsens when resource requirements vary unpredictably and applications may be started/stopped at any time. To address this challenge, we propose two schemes that calculate and adapt task mappings at runtime: a centralized, optimal mapping scheme and a distributed, hierarchical mapping scheme that trades optimality for a high degree of scalability. Experiments on Intel's 48-core Single-Chip Cloud Computer and in a many core simulator show that a significant improvement in system performance can be achieved over current state-of-the-art.

References

[1]
P. Azad, T. Gockel, and R. Dillmann. Computer Vision - Principles and Practice. Elektor Electronics, 2008.
[2]
J. M. Bahi et al. Dynamic load balancing and efficient load estimators for asynchronous iterative algorithms. IEEE Trans. Parallel Distrib. Syst., 16:289--299, April 2005.
[3]
J. Cheng et al. MAPS: An Integrated Framework for MPSoC Application Parallelization. In IEEE/ACM Des. Aut. Conf. (DAC), 2008.
[4]
C.-L. Chou and R. Marculescu. User-Aware Dynamic Task Allocation in Networks-on-Chips. In IEEE/ACM Des., Aut., and Test in Europe (DATE), 2008.
[5]
D. G. Feitelson et al. Theory and Practice in Parallel Job Scheduling. In Workshop on Job Sched. Strat. for Parallel Proc. (JSSPP), 1994.
[6]
M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ACM Int. Conf. on Arch. Support for Prog. Lang. and Oper. Syst. (ASPLOS), 2006.
[7]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. In IEEE Workshop on Workload Charact. (WWC), 2001.
[8]
J. L. Henning. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News, 34(4), Sept. 2006.
[9]
J. Howard et al. A 48-Core IA-32 Message-Passing Processor with DVFS in 45nm CMOS. In IEEE Int. Solid-State Circ. Conf. (ISSCC), 2010.
[10]
S. Kobbe et al. DistRM: Distributed Resource Management for On-Chip Many-Core Systems. In Int. Symp. on Hardw./Softw. Codesign and Syst. Synth. (CODES+ISSS), 2011.
[11]
H. Lee, W. Che, and K. Chatha. Dynamic scheduling of stream programs on embedded multi-core processors. In Int. Symp. on Hardw./Softw. Codesign and Syst. Synth. (CODES+ISSS), 2012.
[12]
T. Li, D. Baumberger, D. A. Koufaty, and S. Hahn. Efficient Operating System Scheduling for Performance-Asymmetric Multi-Core Architectures. In IEEE Int. Comp. Symp. (ICS), 2007.
[13]
G. Ottoni et al. Automatic Thread Extraction with Decoupled Software Pipelining. In IEEE/ACM Int. Symp. on Microarch. (MICRO), 2005.
[14]
P. Radojković et al. Optimal Task Assignment in Multithreaded Processors: A Statistical Approach. In ACM Int. Conf. on Arch. Support for Prog. Lang. and Oper. Syst. (ASPLOS), 2012.
[15]
M. Rajagopalan et al. Thread scheduling for Multi-Core Platforms. In HotOS, 2007.
[16]
L. Schor et al. Scenario-based design flow for mapping streaming applications onto on-chip many-core systems. In IEEE Int. Conf. on Compilers, Arch., and Synth., for Embedded Syst. (CASES), 2012.
[17]
A. Snavely and D. Tullsen. Symbiotic Jobscheduling for a Simultaneous Multithreading Processor. In IEEE/ACM Int. Symp. on Microarch. (MICRO), pages 234--244, 2000.
[18]
A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In SIGMETRICS, 2002.
[19]
W. Thies, V. Chandrasekhar, and S. Amarasinghe. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs. In IEEE/ACM Int. Symp. on Microarch. (MICRO), 2007.
[20]
J. Turek, J. L. Wolf, and P. S. Yu. Approximate Algorithms Scheduling Parallelizable Tasks. In ACM Symp. on Par. Alg. and Arch. (SPAA), 1992.

Cited By

View all
  • (2018)Real-Time and Energy-Efficient Face Detection on CPU-GPU Heterogeneous Embedded PlatformsIEICE Transactions on Information and Systems10.1587/transinf.2018PAP0004E101.D:12(2878-2888)Online publication date: 1-Dec-2018
  • (2017)Design Space Exploration and Run-Time Adaptation for Multi-core Resource Management Under Performance and Power ConstraintsHandbook of Hardware/Software Codesign10.1007/978-94-017-7358-4_11-1(1-32)Online publication date: 8-Apr-2017
  • (2017)Design Space Exploration and Run-Time Adaptation for Multicore Resource Management Under Performance and Power ConstraintsHandbook of Hardware/Software Codesign10.1007/978-94-017-7267-9_11(301-332)Online publication date: 27-Sep-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '13: Proceedings of the 50th Annual Design Automation Conference
May 2013
1285 pages
ISBN:9781450320719
DOI:10.1145/2463209
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 May 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

DAC '13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Real-Time and Energy-Efficient Face Detection on CPU-GPU Heterogeneous Embedded PlatformsIEICE Transactions on Information and Systems10.1587/transinf.2018PAP0004E101.D:12(2878-2888)Online publication date: 1-Dec-2018
  • (2017)Design Space Exploration and Run-Time Adaptation for Multi-core Resource Management Under Performance and Power ConstraintsHandbook of Hardware/Software Codesign10.1007/978-94-017-7358-4_11-1(1-32)Online publication date: 8-Apr-2017
  • (2017)Design Space Exploration and Run-Time Adaptation for Multicore Resource Management Under Performance and Power ConstraintsHandbook of Hardware/Software Codesign10.1007/978-94-017-7267-9_11(301-332)Online publication date: 27-Sep-2017
  • (2016)Thread Assignment in Multicore/Multithreaded Processors: A Statistical ApproachIEEE Transactions on Computers10.1109/TC.2015.241753365:1(256-269)Online publication date: 1-Jan-2016
  • (2016)Elastipipe: On Providing Cloud Elasticity for Pipeline-structured ApplicationsAdvances on P2P, Parallel, Grid, Cloud and Internet Computing10.1007/978-3-319-49109-7_28(293-304)Online publication date: 22-Oct-2016
  • (2015)E-pipelineProceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition10.5555/2755753.2755835(363-368)Online publication date: 9-Mar-2015
  • (2015)Thermal constrained resource management for mixed ILP-TLP workloads in dark silicon chipsProceedings of the 52nd Annual Design Automation Conference10.1145/2744769.2744916(1-6)Online publication date: 7-Jun-2015
  • (2015)Runtime Resource Allocation for Software PipelinesACM Transactions on Parallel Computing10.1145/27423472:1(1-23)Online publication date: 21-May-2015
  • (2015)An Efficient Application Processor Architecture for Multicore Software Video DecodingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2014.232936525:2(325-338)Online publication date: Feb-2015
  • (2015)ADAPT: An adaptive manycore methodology for software pipelined applicationsThe 20th Asia and South Pacific Design Automation Conference10.1109/ASPDAC.2015.7059092(701-706)Online publication date: Jan-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media