Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory

Zhang, Lei; Qiu, Meikang; Tseng, Wei-Che; Sha, Edwin H.-M.

doi:10.1007/s11265-009-0362-3

Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory

Published: 24 April 2009

Volume 58, pages 247–265, (2010)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Lei Zhang¹,
Meikang Qiu²,
Wei-Che Tseng³ &
…
Edwin H.-M. Sha³

369 Accesses
98 Citations
Explore all metrics

Abstract

One of the most critical components that determine the success of an MPSoC based architecture is its on-chip memory. Scratch Pad Memory (SPM) is increasingly being applied to substitute cache as the on-chip memory of embedded MPSoCs due to its superior chip area, power consumption and timing predictability. SPM can be organized as a Virtually Shared SPM (VS-SPM) architecture that takes advantage of both shared and private SPM. However, making effective use of the VS-SPM architecture strongly depends on two inter-dependent problems: variable partitioning and task scheduling. In this paper, we decouple these two problems and solve them in phase-ordered manner. We propose two variable partitioning heuristics based on an initial schedule: High Access Frequency First (HAFF) variable partitioning and Global View Prediction (GVP) variable partitioning. Then, we present a loop pipeline scheduling algorithm known as Rotation Scheduling with Variable Partitioning (RSVP) to improve overall throughput. Our experimental results obtained on MiBench show that the average performance improvements over IDAS (Integrated Data Assignment with Scheduling) are 23.74% for HAFF and 31.91% for GVP on four-core MPSoC. The average schedule length generated by RSVP is 25.96% shorter than that of list scheduling with optimal variable partition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

Article Open access 15 May 2021

Partition Scheduling on Heterogeneous Multicore Processors for Multi-dimensional Loops Applications

Article 15 July 2016

An adaptive migration–replication scheme (AMR) for shared cache in chip multiprocessors

Article 26 July 2015

References

Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., & Marwedel, P. (2002). Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. CODES ’02: Proceedings of the tenth international symposium on Hardware/software codesign (pp. 73–78).
Motorola Corporation (1998). Mmc2001 reference manual. http://www.motorola.com/SPS/MCORE/info_documentation.htm.
Texas Instruments (1997). Tms370cx7x 8-bit microcontroller. http://www-s.ti.com/sc/psheets/spns034c/spns034c.pdf.
Motorola Corporation (2000). Cpu12 reference manual. http://e-www.motorola.com/brdata/PDFDB/MICROCONTROLLERS/16BIT/68HC12FAMILY/REFMAT/CPU12RM.pdf.
Kandemir, M., Ramanujam, J., Irwin, J., Vijaykrishnan, N., Kadayif, I., & Parikh, A. (2001). Dynamic management of scratch-pad memory space. In DAC ’01: Proceedings of the 38th conference on Design automation (pp. 690–695).
Xue, C., Shao, Z., Liu, M., Qiu, M., & Sha E. H. M. (2006). Loop scheduling with complete memory latency hiding on multi-core architecture. In ICPADS ’06: Proceedings of the 12th international conference on parallel and distributed systems (pp. 375–382).
Chen, T.-F., & Baer, J.-L. (1998). A performance study of software and hardware data prefetching schemes. International Symposium on Computer Architecture, 223–232.
Chen, F., ONeil, T. W., & Sha, E. H.-M. (2000). Optimizing overall loop schedules using prefetching and partitioning. IEEE Transactions on Parallel and Distributed Systems, 11(6), 604–614.
Article Google Scholar
Wang, Z., Sha, E. H.-M., & Wang, Y. (2002). Partitioning and scheduling dsp applications with maximal memory access hiding. EURASIP Journal on Applied Signal Processing, 9, 926–935.
Google Scholar
Kandemir, M., Ramanujam, J., & Choudhury, A. (2002). Exploring shared scratch pad memory space in embedded multiprocessor system. In DAC ’02: Proceedings of the 39th conference on design automation (pp. 219–224).
Terechko, A., Le Thénaff, E., & Corporaal, H. (2003). Cluster assignment of global values for clustered vliw processors. In CASES ’03: Proceedings of the 2003 international conference on compilers, architecture and synthesis for embedded systems (pp. 32–40).
Suhendra, V., Raghavan, C., & Mitra, T. (2006). Integrated scratchpad memory optimization and task scheduling for mpsoc architectures. In CASES ’06: Proceedings of the 2006 international conference on compilers, architecture and synthesis for embedded systems (pp. 401–410).
Ozturk, O., Chen, G., Kandemir, M., & Karakoy, M. (2006). An integer linear programming based approach to simultaneous memory space partitioning and data allocation for chip multiprocessors. In ISVLSI ’06: Proceedings of the IEEE computer society annual symposium on emerging VLSI technologies and architectures (p. 50).
Vallerio, K. S., & Jha, N. K. (2003). Task graph extraction for embedded system synthesis. In VLSID ’03: Proceedings of the 16th international conference on VLSI design (p. 480).
Chao, L.-F., LaPaugh, A. S., & Sha, E. H.-M. (1997). Rotation scheduling: A loop pipelining algorithm. IEEE Transactins on Computer-Aided Design, 16(3), 229–239.
Article Google Scholar
Aiken, A., & Nicolau, A. (1988). Optimal loop parallelization. SIGPLAN Notices, 23(7).
Chao, L.-F., & Sha, E. H.-M. (1997). Scheduling data-flow graphs via retiming and unfolding. IEEE Transactions on Parallel and Distributed Systems, 8(12), 1259–1267.
Article Google Scholar
Ozturk, O., Kandemir, M., Chen, G., Irwin, M. J., & Karakoy, M. (2005). Customized on-chip memories for embedded chip multiprocessors. In ASP-DAC ’05: Proceedings of the 2005 conference on Asia South Pacific design automation (pp. 743–748).
Meftali, S., Gharsalli, F., Rousseau, F., & Jerraya, A. A. (2001). An optimal memory allocation for application-specific multiprocessor system-on-chip. In ISSS ’01: Proceedings of the 14th international symposium on systems synthesis (pp. 19–24).
Panda, P. R., Dutt, N. D., & Nicolau, A. (2000). On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Transactions on Design Automation of Electronic Systems, 5(3), 682–704.
Article Google Scholar
Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., & Shippy, D. (2005). Introduction to the cell multiprocessor. IBM Journal of Research and Development, 49(4/5), 589–604.
Article Google Scholar
Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., & Brown R. B. (2001). Mibench: A free, commercially representative embedded benchmark suite. In WWC ’01: Proceedings of the workload characterization, 2001. WWC-4. 2001 IEEE international workshop (pp. 3–14).
Valgrind (2009). Valgrind homepage. http://www.valgrind.org.
Chen G., Ozturk, O., Kandemir, M., & Irwin, M. J. (2006). Multi-level on-chip memory hierachy design for embedded chip multiprocessor. In ICPADS ’06: Proceedings of the 12th international conference on parallel and distributed system (pp. 383–390).

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610054, P. R. China
Lei Zhang
Department of Electrical Engineering, University of New Orleans, New Orleans, LA, 70148, USA
Meikang Qiu
Department of Computer Science, University of Texas at Dallas, Richardson, TX, 75083, USA
Wei-Che Tseng & Edwin H.-M. Sha

Authors

Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Meikang Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Che Tseng
View author publications
You can also search for this author in PubMed Google Scholar
Edwin H.-M. Sha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Zhang.

Additional information

This work is partially supported by NSF CCR-0309461, NSF IIS-0513669, HK CERG B-Q60B, NSFC 60728206, and China Scholarship Council[2007]3020.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Qiu, M., Tseng, WC. et al. Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory. J Sign Process Syst Sign Image Video Technol 58, 247–265 (2010). https://doi.org/10.1007/s11265-009-0362-3

Download citation

Received: 02 November 2008
Revised: 14 March 2009
Accepted: 16 March 2009
Published: 24 April 2009
Issue Date: February 2010
DOI: https://doi.org/10.1007/s11265-009-0362-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory

Abstract

Access this article

Similar content being viewed by others

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

Partition Scheduling on Heterogeneous Multicore Processors for Multi-dimensional Loops Applications

An adaptive migration–replication scheme (AMR) for shared cache in chip multiprocessors

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory

Abstract

Access this article

Similar content being viewed by others

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

Partition Scheduling on Heterogeneous Multicore Processors for Multi-dimensional Loops Applications

An adaptive migration–replication scheme (AMR) for shared cache in chip multiprocessors

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation