Skip to main content
Log in

Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

One of the most critical components that determine the success of an MPSoC based architecture is its on-chip memory. Scratch Pad Memory (SPM) is increasingly being applied to substitute cache as the on-chip memory of embedded MPSoCs due to its superior chip area, power consumption and timing predictability. SPM can be organized as a Virtually Shared SPM (VS-SPM) architecture that takes advantage of both shared and private SPM. However, making effective use of the VS-SPM architecture strongly depends on two inter-dependent problems: variable partitioning and task scheduling. In this paper, we decouple these two problems and solve them in phase-ordered manner. We propose two variable partitioning heuristics based on an initial schedule: High Access Frequency First (HAFF) variable partitioning and Global View Prediction (GVP) variable partitioning. Then, we present a loop pipeline scheduling algorithm known as Rotation Scheduling with Variable Partitioning (RSVP) to improve overall throughput. Our experimental results obtained on MiBench show that the average performance improvements over IDAS (Integrated Data Assignment with Scheduling) are 23.74% for HAFF and 31.91% for GVP on four-core MPSoC. The average schedule length generated by RSVP is 25.96% shorter than that of list scheduling with optimal variable partition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10

Similar content being viewed by others

References

  1. Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., & Marwedel, P. (2002). Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. CODES ’02: Proceedings of the tenth international symposium on Hardware/software codesign (pp. 73–78).

  2. Motorola Corporation (1998). Mmc2001 reference manual. http://www.motorola.com/SPS/MCORE/info_documentation.htm.

  3. Texas Instruments (1997). Tms370cx7x 8-bit microcontroller. http://www-s.ti.com/sc/psheets/spns034c/spns034c.pdf.

  4. Motorola Corporation (2000). Cpu12 reference manual. http://e-www.motorola.com/brdata/PDFDB/MICROCONTROLLERS/16BIT/68HC12FAMILY/REFMAT/CPU12RM.pdf.

  5. Kandemir, M., Ramanujam, J., Irwin, J., Vijaykrishnan, N., Kadayif, I., & Parikh, A. (2001). Dynamic management of scratch-pad memory space. In DAC ’01: Proceedings of the 38th conference on Design automation (pp. 690–695).

  6. Xue, C., Shao, Z., Liu, M., Qiu, M., & Sha E. H. M. (2006). Loop scheduling with complete memory latency hiding on multi-core architecture. In ICPADS ’06: Proceedings of the 12th international conference on parallel and distributed systems (pp. 375–382).

  7. Chen, T.-F., & Baer, J.-L. (1998). A performance study of software and hardware data prefetching schemes. International Symposium on Computer Architecture, 223–232.

  8. Chen, F., ONeil, T. W., & Sha, E. H.-M. (2000). Optimizing overall loop schedules using prefetching and partitioning. IEEE Transactions on Parallel and Distributed Systems, 11(6), 604–614.

    Article  Google Scholar 

  9. Wang, Z., Sha, E. H.-M., & Wang, Y. (2002). Partitioning and scheduling dsp applications with maximal memory access hiding. EURASIP Journal on Applied Signal Processing, 9, 926–935.

    Google Scholar 

  10. Kandemir, M., Ramanujam, J., & Choudhury, A. (2002). Exploring shared scratch pad memory space in embedded multiprocessor system. In DAC ’02: Proceedings of the 39th conference on design automation (pp. 219–224).

  11. Terechko, A., Le Thénaff, E., & Corporaal, H. (2003). Cluster assignment of global values for clustered vliw processors. In CASES ’03: Proceedings of the 2003 international conference on compilers, architecture and synthesis for embedded systems (pp. 32–40).

  12. Suhendra, V., Raghavan, C., & Mitra, T. (2006). Integrated scratchpad memory optimization and task scheduling for mpsoc architectures. In CASES ’06: Proceedings of the 2006 international conference on compilers, architecture and synthesis for embedded systems (pp. 401–410).

  13. Ozturk, O., Chen, G., Kandemir, M., & Karakoy, M. (2006). An integer linear programming based approach to simultaneous memory space partitioning and data allocation for chip multiprocessors. In ISVLSI ’06: Proceedings of the IEEE computer society annual symposium on emerging VLSI technologies and architectures (p. 50).

  14. Vallerio, K. S., & Jha, N. K. (2003). Task graph extraction for embedded system synthesis. In VLSID ’03: Proceedings of the 16th international conference on VLSI design (p. 480).

  15. Chao, L.-F., LaPaugh, A. S., & Sha, E. H.-M. (1997). Rotation scheduling: A loop pipelining algorithm. IEEE Transactins on Computer-Aided Design, 16(3), 229–239.

    Article  Google Scholar 

  16. Aiken, A., & Nicolau, A. (1988). Optimal loop parallelization. SIGPLAN Notices, 23(7).

  17. Chao, L.-F., & Sha, E. H.-M. (1997). Scheduling data-flow graphs via retiming and unfolding. IEEE Transactions on Parallel and Distributed Systems, 8(12), 1259–1267.

    Article  Google Scholar 

  18. Ozturk, O., Kandemir, M., Chen, G., Irwin, M. J., & Karakoy, M. (2005). Customized on-chip memories for embedded chip multiprocessors. In ASP-DAC ’05: Proceedings of the 2005 conference on Asia South Pacific design automation (pp. 743–748).

  19. Meftali, S., Gharsalli, F., Rousseau, F., & Jerraya, A. A. (2001). An optimal memory allocation for application-specific multiprocessor system-on-chip. In ISSS ’01: Proceedings of the 14th international symposium on systems synthesis (pp. 19–24).

  20. Panda, P. R., Dutt, N. D., & Nicolau, A. (2000). On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Transactions on Design Automation of Electronic Systems, 5(3), 682–704.

    Article  Google Scholar 

  21. Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., & Shippy, D. (2005). Introduction to the cell multiprocessor. IBM Journal of Research and Development, 49(4/5), 589–604.

    Article  Google Scholar 

  22. Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., & Brown R. B. (2001). Mibench: A free, commercially representative embedded benchmark suite. In WWC ’01: Proceedings of the workload characterization, 2001. WWC-4. 2001 IEEE international workshop (pp. 3–14).

  23. Valgrind (2009). Valgrind homepage. http://www.valgrind.org.

  24. Chen G., Ozturk, O., Kandemir, M., & Irwin, M. J. (2006). Multi-level on-chip memory hierachy design for embedded chip multiprocessor. In ICPADS ’06: Proceedings of the 12th international conference on parallel and distributed system (pp. 383–390).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Zhang.

Additional information

This work is partially supported by NSF CCR-0309461, NSF IIS-0513669, HK CERG B-Q60B, NSFC 60728206, and China Scholarship Council[2007]3020.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Qiu, M., Tseng, WC. et al. Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory. J Sign Process Syst Sign Image Video Technol 58, 247–265 (2010). https://doi.org/10.1007/s11265-009-0362-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-009-0362-3

Keywords

Navigation