Skip to main content
Log in

Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Multiprocessor System-on-Chip is one of the main drivers of the semiconductor industry revolution by enabling the integration of complex functionality on a single chip. The techniques for processor design and application optimizations can be combined together for more efficient design of these systems. Thus, the memory optimization techniques improving the data locality can be combined with multithreading technology, improving the overall processor efficiency. The combination of these techniques is mainly challenged by the adaptation of memory optimization techniques to the high parallelism offered by the multithreading environments. This paper presents an in-depth analysis of the impact of multiprocessor and multithreading environments on memory optimization techniques. A discussion is provided on the different types of parallelization (fine and coarse grain) and their influence on memory optimization technique. Some improvements on existing memory optimization techniques are presented as well some adaptation necessary to use them in this type of environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30
Figure 31
Figure 32
Figure 33
Figure 34
Figure 35
Figure 36
Figure 37
Figure 38
Figure 39

Similar content being viewed by others

References

  1. Jerraya, A. A., & Wayne, W. (2005). Multiprocessor systems-on-chips, Elsevier ed.. United States of America: Morgan Kaufmann.

  2. Wolf, W. (2004). The future of multiprocessor systems-on-chips. Design Automation Conference, pp. 681–685.

  3. Haines, M., & Bohm, W. (1993). An evaluation of software multithreading in a conventional. Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, pp. 106–113.

  4. Catthoor, F., Franssen, F., Wuytack, S., et al. (1994). Global communication and memory optimizing transformations for low. IEE Workshop on VLSI Signal Processing, VII, 178–187.

    Article  Google Scholar 

  5. Catthoor, F., Wuytack, S., Greef, E. D., et al. (1998). Custom memory management methodology—Exploration of memory organisation for embedded multimedia system design. Boston: Kluwer.

    MATH  Google Scholar 

  6. Wolf, M. E., & Lam M. S. (1991). A data locality optimizing algorithm. Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, pp. 30–44.

  7. Paulin, P. G., Pilkington, C., Langevin, M., et al. (2006). Parallel programming models for a multiprocessor SoC platform applied to networking and multimedia. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(7), 667–680.

    Article  Google Scholar 

  8. Carr, S., & Kennedy, K. (1994). Scalar replacement in the presence of conditional control flow. Software—Practice and Experience, 24(1), 51–77 (1994/01/).

    Article  Google Scholar 

  9. Greef, E. D. (1998). Storage size reduction for multimedia application. PhD thesis. Katholieke Universiteit, Leuven.

  10. Olukotun, K., Nayfeh, B. A., Hammond, L., et al. (1996) The case for a single chip multiprocessor. Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 2–11.

  11. Cierniak, M., & Li, W. (1995). Unifying data and control transformations for distributed shared-memory machines. Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation, pp. 205–217.

  12. Darte, A. (1999). On the complexity of loop fusion. Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pp. 149–157.

  13. Kennedy, K. (2001). Fast greedy weighted fusion. International Journal of Parallel Programming, 29(5), 463–491 (2001/10/).

    Article  MATH  Google Scholar 

  14. Fraboulet, A., Kodary, K., & Mignotte, A. (2001). Loop fusion for memory space optimization. Proceedings of the 14th International Symposium on System Synthesis, pp. 95–100.

  15. Marchal, P., Catthoor, F., & Gomez, J. I. (2004). Optimizing the memory bandwidth with loop fusion. CODES + ISSS 2004. International Conference on Hardware/Software Codesign and System Synthesis, pp. 188–193.

  16. Kandemir, M., Kadayif, I., Choudhary, A., et al. (2002). Optimizing inter-nest data locality. Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 127–135.

  17. Kandemir, M. (2002). Data space oriented tiling. Programming Languages and Systems. 11th European Symposium on Programming, ESOP 2002. Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2002. Proceedings (Lecture Notes in Computer Science 2305). pp. 178–193.

  18. Li, F., & Kandemir, M. (2005). Locality-conscious workload assignment for array-based computations in MPSOC architectures. Proceedings of the 42nd. Design Automation Conference, pp. 95–100.

  19. Krishnan, V., & Torrellas, J. (1999). A chip-multiprocessor architecture with speculative multithreading. IEEE Transactions on Computers, 48(9), 866–880.

    Article  Google Scholar 

  20. Van Achteren, T., Deconinck, G., Catthoor, F., et al. (2002). Data reuse exploration techniques for loop-dominated applications. Proceedings of Design, Automation and Test in Europe Conference and Exhibition, pp. 428–435.

  21. Ilya, I., Erik, B., Miguel, M., et al. (2007). DRDU: A data reuse analysis technique for efficient scratch-pad memory management. ACM Transactions on Design Automation of Electronic Systems, 12(2), 15.

    Article  Google Scholar 

  22. Ghez, C., Miranda, M., Vandecappelle, A., et al. (2000). Systematic high-level address code transformations for piece-wise linear indexing: Illustration on a medical imaging algorithm. SiPS 2000. 2000 IEEE Workshop on Signal Processing Systems, pp. 603–612.

  23. Catthoor, F., Danckaert, K., Kulkarni, K. K., et al. (2002). Data access and storage management for embedded programmable processors. p. 324. Berlin: Springer.

  24. Schaumont, P., Lai, B.-C. C., Qin, W., et al. (2005). Cooperative multithreading on embedded multiprocessor architectures enables energy-scalable design. Proceedings of the 42nd Design Automation Conference, pp. 27–30.

  25. Chong, Y.-K., & Hwang, K. (1995). Performance analysis of four memory consistency models for. IEEE Transactions on Parallel and Distributed Systems, 6(10), 1085–1099.

    Article  Google Scholar 

  26. Dimitroulakos, G., Galanis, M. D., & Goutis, C. E. (2005). Performance improvements using coarse-grain reconfigurable logic in embedded SOCs. International Conference on Field Programmable Logic and Applications, pp. 630–635.

  27. Al-Hashimi, B. M. (2006). System-on-chip: Next Generation Electronics: IEE.

  28. Forsell, M. J. (2005). Step caches—A novel approach to concurrent memory access on shared memory MP-SOCs. NORCHIP 23rd Conference, pp. 74–77.

  29. Bouchebaba, Y., & Coelho, F. (2002). Tiling and memory reuse for sequences of nested loops. Euro-Par 2002 Parallel Processing. Proceedings of the 8th International Euro-Par Conference. (Lecture Notes in Computer Science Vol.2400), pp. 255–264.

  30. Bouchebaba, Y., Girodias, B., Nicolescu, G., et al. (2007). MPSoC memory optimization using program transformation. ACM Transactions on Design Automation of Electronic Systems, 12(4), 43.

    Article  Google Scholar 

  31. Bouchebaba, Y., Lavigueur, B., Girodias, B., et al. (2007). MPSoC memory optimization for digital camera applications: Digital system design architectures, methods and tools, 2007. DSD 2007. 10th Euromicro Conference on “Digital System Design Architectures, Methods and Tools, 2007. DSD 2007, pp. 424–427.

  32. Girodias, B., Bouchebaba, Y., Nicolescu, G., et al. (2006). Application-level memory optimization for MPSoC. Seventeenth IEEE International Workshop on Rapid System Prototyping, pp. 169–178.

  33. Kwak, H., Lee, B., Hurson, A. R., et al. (1999). Effects of multithreading on cache performance. IEEE Transactions on Computers, 48(2), 176–184.

    Article  Google Scholar 

  34. Atitallah, R., Niar, S., Greiner, A., et al. (2006). Estimating energy consumption for an MPSoC architectural exploration. Architecture of Computing Systems—ARCS, pp. 298–310.

  35. “SUIF, http://suif.stanford.edu” November 2006.

  36. “CLooG” http://www.prism.uvsq.fr/∼cedb/bastools/cloog.html.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. Girodias.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Girodias, B., Bouchebaba, Y., Nicolescu, G. et al. Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications. J Sign Process Syst Sign Image Video Technol 57, 263–283 (2009). https://doi.org/10.1007/s11265-008-0293-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-008-0293-4

Keywords

Navigation