skip to main content
10.1145/2024724.2024753acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Compilation of stream programs onto scratchpad memory based embedded multicore processors through retiming

Published: 05 June 2011 Publication History

Abstract

The prevalence of stream applications in signal processing, multi-media, and network processing domains has resulted in a new trend of programming and architecture design. Several languages and multicore architectures have been developed to support streaming applications. In many of these multicore architectures scratchpad memories (SPM) have substituted caches due to their lower power consumption. Performance optimization on SPM based architectures requires the programmer/compiler to efficiently manage the limited local memory. Our paper addresses the problem of compilation of stream programs onto multicore architectures that incorporate SPMs. We propose a retiming technique that maximizes the throughput under a memory constraint with a user-specified number of software pipeline stages. Trade-offs between double buffering and code overlay are explored intensively in our technique to achieve the best performance. The efficiency of our technique was evaluated by compiling several stream applications for the IBM Cell BE and comparing their results against existing approaches.

References

[1]
Compute Unified Device Architecture Programming Guide. NVIDIA: Santa Clara, CA, 2007.
[2]
A. Agarwal. The tile processor: A 64-core multicore for embedded processing. 2007.
[3]
I. Buck, T. Foley, and D. Horn et al. Brook for gpus: stream computing on graphics hardware. ACM, 2004.
[4]
W. Che, A. Panda, and K. Chatha. Compilation of stream programs for multicore processors that incorporate scratchpad memories. DATE, 2010.
[5]
Y. Choi, Y. Lin, N. Chong, S. Mahlke, and T. Mudge. Stream compilation for real-time embedded multicore systems. In CGO '09, pages 210--220, 2009.
[6]
J. Eker and J. W. Janneck. Cal language report. 2003.
[7]
M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. SIGOPS, 2006.
[8]
J. A. Kahle, M. N. Day, and H. P. Hofstee et al. Introduction to the cell multiprocessor. IBM Journal of Research and Development, 49:589--604, 2005.
[9]
U. Kapasi, W. Dally, and S. Rixner et al. The imagine stream processor. In Computer Design, 2002.
[10]
E. Kilgariff and R. Fernando. The geforce 6 series gpu architecture. In SIGGRAPH. ACM, 2005.
[11]
M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In Proceedings of ACM SIGPLAN, 2008.
[12]
C. Leiserson and J. Saxe. Retiming synchronous circuitry. Algorithmica, 6:5--35, 1991.
[13]
S.-w. Liao and Z. Du et al. Data and computation transformations for brook streaming applications on multiprocessors. In Proceedings of CGO, 2006.
[14]
J. Pino and E. Lee. Hierarchical static scheduling of dataflow graphs onto multiple processors. In In Proceedings of ASSP, volume 4, 1995.
[15]
J. Stratton, S. Stone, and W.-m. Hwu. Mcuda: An efficient implementation of cuda kernels for multi-core cpus. In LCPC, volume 5335, pages 16--30, 2008.
[16]
W. Thies, M. Karczmarek, and S. Amarasinghe. Streamit: A language for streaming applications. 2304:49--84, 2002.

Cited By

View all
  • (2019)Design-Time Memory Subsystem Optimization for Low-Power Multi-Core Embedded Systems2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC.2019.00056(347-353)Online publication date: Oct-2019
  • (2018)A Non-Stop Double Buffering Mechanism for Dataflow ArchitectureJournal of Computer Science and Technology10.1007/s11390-017-1747-633:1(145-157)Online publication date: 26-Jan-2018
  • (2015)Temperature-Aware Data Allocation for Embedded Systems with Cache and Scratchpad MemoryACM Transactions on Embedded Computing Systems10.1145/262965014:2(1-24)Online publication date: 9-Mar-2015
  • Show More Cited By

Index Terms

  1. Compilation of stream programs onto scratchpad memory based embedded multicore processors through retiming

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DAC '11: Proceedings of the 48th Design Automation Conference
    June 2011
    1055 pages
    ISBN:9781450306362
    DOI:10.1145/2024724
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 June 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. compiler
    2. embedded
    3. multicore
    4. processors
    5. retiming
    6. scratchpad memory
    7. stream

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    DAC '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Upcoming Conference

    DAC '25
    62nd ACM/IEEE Design Automation Conference
    June 22 - 26, 2025
    San Francisco , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Design-Time Memory Subsystem Optimization for Low-Power Multi-Core Embedded Systems2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC.2019.00056(347-353)Online publication date: Oct-2019
    • (2018)A Non-Stop Double Buffering Mechanism for Dataflow ArchitectureJournal of Computer Science and Technology10.1007/s11390-017-1747-633:1(145-157)Online publication date: 26-Jan-2018
    • (2015)Temperature-Aware Data Allocation for Embedded Systems with Cache and Scratchpad MemoryACM Transactions on Embedded Computing Systems10.1145/262965014:2(1-24)Online publication date: 9-Mar-2015
    • (2013)SPM-SieveProceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.5555/2555729.2555750(1-10)Online publication date: 29-Sep-2013
    • (2013)A software-only scheme for managing heap data on limited local memory(LLM) multicore processorsACM Transactions on Embedded Computing Systems10.1145/2501626.250163213:1(1-18)Online publication date: 5-Sep-2013
    • (2013)A lifetime aware buffer assignment method for streaming applications on DRAM/PRAM hybrid memoryACM Transactions on Embedded Computing Systems10.1145/2435227.243523212:1s(1-17)Online publication date: 21-Mar-2013
    • (2013)SPM-Sieve: A framework for assisting data partitioning in scratch pad memory based systems2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)10.1109/CASES.2013.6662527(1-10)Online publication date: Sep-2013
    • (2012)Dynamic scheduling of stream programs on embedded multi-core processorsProceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis10.1145/2380445.2380465(93-102)Online publication date: 7-Oct-2012
    • (2012)Integrating software caches with scratch pad memoryProceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems10.1145/2380403.2380440(201-210)Online publication date: 7-Oct-2012
    • (2012)Unrolling and retiming of stream applications onto embedded multicore processorsProceedings of the 49th Annual Design Automation Conference10.1145/2228360.2228598(1272-1277)Online publication date: 3-Jun-2012
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media