Abstract
Multiprocessor System-on-Chip is one of the main drivers of the semiconductor industry revolution by enabling the integration of complex functionality on a single chip. The techniques for processor design and application optimizations can be combined together for more efficient design of these systems. Thus, the memory optimization techniques improving the data locality can be combined with multithreading technology, improving the overall processor efficiency. The combination of these techniques is mainly challenged by the adaptation of memory optimization techniques to the high parallelism offered by the multithreading environments. This paper presents an in-depth analysis of the impact of multiprocessor and multithreading environments on memory optimization techniques. A discussion is provided on the different types of parallelization (fine and coarse grain) and their influence on memory optimization technique. Some improvements on existing memory optimization techniques are presented as well some adaptation necessary to use them in this type of environment.
Similar content being viewed by others
References
Jerraya, A. A., & Wayne, W. (2005). Multiprocessor systems-on-chips, Elsevier ed.. United States of America: Morgan Kaufmann.
Wolf, W. (2004). The future of multiprocessor systems-on-chips. Design Automation Conference, pp. 681–685.
Haines, M., & Bohm, W. (1993). An evaluation of software multithreading in a conventional. Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, pp. 106–113.
Catthoor, F., Franssen, F., Wuytack, S., et al. (1994). Global communication and memory optimizing transformations for low. IEE Workshop on VLSI Signal Processing, VII, 178–187.
Catthoor, F., Wuytack, S., Greef, E. D., et al. (1998). Custom memory management methodology—Exploration of memory organisation for embedded multimedia system design. Boston: Kluwer.
Wolf, M. E., & Lam M. S. (1991). A data locality optimizing algorithm. Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, pp. 30–44.
Paulin, P. G., Pilkington, C., Langevin, M., et al. (2006). Parallel programming models for a multiprocessor SoC platform applied to networking and multimedia. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(7), 667–680.
Carr, S., & Kennedy, K. (1994). Scalar replacement in the presence of conditional control flow. Software—Practice and Experience, 24(1), 51–77 (1994/01/).
Greef, E. D. (1998). Storage size reduction for multimedia application. PhD thesis. Katholieke Universiteit, Leuven.
Olukotun, K., Nayfeh, B. A., Hammond, L., et al. (1996) The case for a single chip multiprocessor. Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 2–11.
Cierniak, M., & Li, W. (1995). Unifying data and control transformations for distributed shared-memory machines. Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation, pp. 205–217.
Darte, A. (1999). On the complexity of loop fusion. Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pp. 149–157.
Kennedy, K. (2001). Fast greedy weighted fusion. International Journal of Parallel Programming, 29(5), 463–491 (2001/10/).
Fraboulet, A., Kodary, K., & Mignotte, A. (2001). Loop fusion for memory space optimization. Proceedings of the 14th International Symposium on System Synthesis, pp. 95–100.
Marchal, P., Catthoor, F., & Gomez, J. I. (2004). Optimizing the memory bandwidth with loop fusion. CODES + ISSS 2004. International Conference on Hardware/Software Codesign and System Synthesis, pp. 188–193.
Kandemir, M., Kadayif, I., Choudhary, A., et al. (2002). Optimizing inter-nest data locality. Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 127–135.
Kandemir, M. (2002). Data space oriented tiling. Programming Languages and Systems. 11th European Symposium on Programming, ESOP 2002. Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2002. Proceedings (Lecture Notes in Computer Science 2305). pp. 178–193.
Li, F., & Kandemir, M. (2005). Locality-conscious workload assignment for array-based computations in MPSOC architectures. Proceedings of the 42nd. Design Automation Conference, pp. 95–100.
Krishnan, V., & Torrellas, J. (1999). A chip-multiprocessor architecture with speculative multithreading. IEEE Transactions on Computers, 48(9), 866–880.
Van Achteren, T., Deconinck, G., Catthoor, F., et al. (2002). Data reuse exploration techniques for loop-dominated applications. Proceedings of Design, Automation and Test in Europe Conference and Exhibition, pp. 428–435.
Ilya, I., Erik, B., Miguel, M., et al. (2007). DRDU: A data reuse analysis technique for efficient scratch-pad memory management. ACM Transactions on Design Automation of Electronic Systems, 12(2), 15.
Ghez, C., Miranda, M., Vandecappelle, A., et al. (2000). Systematic high-level address code transformations for piece-wise linear indexing: Illustration on a medical imaging algorithm. SiPS 2000. 2000 IEEE Workshop on Signal Processing Systems, pp. 603–612.
Catthoor, F., Danckaert, K., Kulkarni, K. K., et al. (2002). Data access and storage management for embedded programmable processors. p. 324. Berlin: Springer.
Schaumont, P., Lai, B.-C. C., Qin, W., et al. (2005). Cooperative multithreading on embedded multiprocessor architectures enables energy-scalable design. Proceedings of the 42nd Design Automation Conference, pp. 27–30.
Chong, Y.-K., & Hwang, K. (1995). Performance analysis of four memory consistency models for. IEEE Transactions on Parallel and Distributed Systems, 6(10), 1085–1099.
Dimitroulakos, G., Galanis, M. D., & Goutis, C. E. (2005). Performance improvements using coarse-grain reconfigurable logic in embedded SOCs. International Conference on Field Programmable Logic and Applications, pp. 630–635.
Al-Hashimi, B. M. (2006). System-on-chip: Next Generation Electronics: IEE.
Forsell, M. J. (2005). Step caches—A novel approach to concurrent memory access on shared memory MP-SOCs. NORCHIP 23rd Conference, pp. 74–77.
Bouchebaba, Y., & Coelho, F. (2002). Tiling and memory reuse for sequences of nested loops. Euro-Par 2002 Parallel Processing. Proceedings of the 8th International Euro-Par Conference. (Lecture Notes in Computer Science Vol.2400), pp. 255–264.
Bouchebaba, Y., Girodias, B., Nicolescu, G., et al. (2007). MPSoC memory optimization using program transformation. ACM Transactions on Design Automation of Electronic Systems, 12(4), 43.
Bouchebaba, Y., Lavigueur, B., Girodias, B., et al. (2007). MPSoC memory optimization for digital camera applications: Digital system design architectures, methods and tools, 2007. DSD 2007. 10th Euromicro Conference on “Digital System Design Architectures, Methods and Tools, 2007. DSD 2007, pp. 424–427.
Girodias, B., Bouchebaba, Y., Nicolescu, G., et al. (2006). Application-level memory optimization for MPSoC. Seventeenth IEEE International Workshop on Rapid System Prototyping, pp. 169–178.
Kwak, H., Lee, B., Hurson, A. R., et al. (1999). Effects of multithreading on cache performance. IEEE Transactions on Computers, 48(2), 176–184.
Atitallah, R., Niar, S., Greiner, A., et al. (2006). Estimating energy consumption for an MPSoC architectural exploration. Architecture of Computing Systems—ARCS, pp. 298–310.
“SUIF, http://suif.stanford.edu” November 2006.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Girodias, B., Bouchebaba, Y., Nicolescu, G. et al. Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications. J Sign Process Syst Sign Image Video Technol 57, 263–283 (2009). https://doi.org/10.1007/s11265-008-0293-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-008-0293-4