Abstract
Multicore processors can provide sufficient computing power and flexibility for complex streaming applications, such as high-definition video processing. For less hardware complexity and power consumption, the distributed scratchpad memory architecture is considered, instead of the cache memory architecture. However, the distributed design poses new challenges to programming. It is difficult to exploit all available capabilities and achieve maximal throughput, due to the combined complexity of inter-processor communication, synchronization, and workload balancing. In this study, we developed an efficient design flow for parallelizing multimedia applications on a distributed scratchpad memory multicore architecture. An application is first partitioned into streaming components and then mapped onto multicore processors. Various hardware-dependent factors and application-specific characteristics are involved in generating efficient task partitions and allocating resources appropriately. To test and verify the proposed design flow, three popular multimedia applications were implemented: a full-HD motion JPEG decoder, an object detector, and a full-HD H.264/AVC decoder. For demonstration purposes, SONY PlayStation\(^{\circledR }\)3 was selected as the target platform. Simulation results show that, on PS3, the full-HD motion JPEG decoder with the proposed design flow can decode about 108.9 frames per second (fps) in the 1080p format. The object detection application can perform real-time object detection at 2.84 fps at \(1280 \times 960\) resolution, 11.75 fps at \(640 \times 480\) resolution, and 62.52 fps at \(320 \times 240\) resolution. The full-HD H.264/AVC decoder applications can achieve nearly 50 fps.
Similar content being viewed by others
References
Bai, K., Shrivastava, A.: Heap data management for limited local memory (LLM) multi-core processors. In: Proceedings of the CODES+ISSS, pp. 317–325 (2010)
Baik, H., Sihn, K., Kim, Y., Bae, S., Han, N., Song, H.J.: Analysis and parallelization of H.264 decoder on cell broadband engine architecture. In: Proceedings of the IEEE Symposium Signal Processing and Information Technology, pp. 791–795 (2007)
Bai, K., Shrivastava, A., Kudchadker, S.: Stack data management for limited local memory (LLM) multi-core processors. In: Proceedings of the ASAP, pp. 231–234 (2011)
Chen, S.-K., Lin, T.-J., Liu, C.-W.: Parallel object detection on multicore platforms. In: IEEE Workshop on Signal Processing Systems, pp. 75–80 (2007)
Che, W., Panda, A., Chatha, K.S.: Compilation of stream programs for multicore processors that incorporate scratchpad memories. In: Proceedings of the DATE, pp. 1118–1123 (2011)
Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, ITU-T Rec. H.264 and ISO/IEC 14496–10 AVC (2003)
Gschwind, M.: The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor. Int. J. Parallel Program. 35(3), 233–262 (2007)
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 4th edn. Morgan Kaufmann Publishers, California (2007)
IBM Corp.: C/C++ Language Extensions for Cell Broadband Engine Architecture. User Guide (2008)
IBM Corp.: Cell Programming Guide. User Guide, (2008)
IBM Corp.: Cell Programming Tutorial. User Guide, (2008)
IBM Corp.: SPE Runtime Management Library. User Guide, (2008)
Ismail, L., Guerchi, D.: Performance evaluation of convolution of the cell broadband engine processor. IEEE Trans. Parallel Distrib. Syst. 22(2), 337–351 (2011)
Jung, S.C., Shrivastava, S., Bai, K.: Dynamic code mapping for limited local memory systems. In: Proceedings of the ASAP, pp. 13–20 (2010)
Kahn, G.: The semantics of a simple language for parallel programming. In: Proceedings of the IFIP Congress, pp. 471–475 (1974)
Kudlur, M., Mahlke, S.: Orchestrating the execution of stream programs on multicore platforms. In: Proceedings of the PLDI, pp. 114–124 (2008)
Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ahn, J., Mattson, P., Owens, J.: Programmable stream processors. IEEE Comput. 36(8), 54–62 (2003)
Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the cell multiprocessor. IBM J. Res. Dev. 49(4/5), 589–604 (2005)
Kistler, M., Perrone, M., Petrini, F.: Cell multiprocessor communication network: built for speed. IEEE Micro. 26(3), 10–23 (2006)
Kim, Y., Kim, J., Bae, S., Baik, H., Song, H. J.: H.264/AVC decoder parallelization and optimization on asymmetric multicore platform using dynamic load balancing. In: IEEE International Conference on Multimedia and Expo., pp. 1001–1004 (2008)
McCool, M.: Data-parallel programming on the cell BE and the GPU using the RapidMind development platform. In: GSPx Multicore Applications Conference (2006)
Ohara, M., Inoue, H., Sohda, Y., Komatsu, H., Nakatani, T.: MPI microtask for programming the cell broadband engine\(^{\rm TM}\) processor. IBM Syst. J. 45(1), 85–102 (2006)
OpenCV on the cell. http://cell.fixstars.com/opencv/index.php/OpenCV_on_the_Cell (2010)
Pennebarker, W.B., Mitchell, J.L.: JPEG: Still Image Data Compression Standard. Kluwer, Massachusetts (1993)
Perez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: making it easier to program the cell broadband engine processor. IBM J. Res. Dev. 51(5), 593–604 (2007)
Sarje, A., Zola, J., Aluru, S.: Accelerating pairwise computations on cell processors. IEEE Trans. Parallel Distrib. Syst. 22(1), 69–77 (2011)
Sugano, H., Miyamoto, R.: A real-time object recognition system on cell broadband engine. In: Mery, D., Rueda, L. (eds.) Advances in Image and Video Technology, LNCS Series 4872, pp. 932–943. Springer, Berlin (2007)
Tol, E. van der, Jaspers, E., Gelderblom, R.: Mapping of H.264 decoding on multiprocessor architecture. In: Proceedings of the SPIE Conference on Image and Video Communications and Processing, pp. 707–718 (2003)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Symposium Computer Vision and Pattern Recognition, pp. 511–518 (2001)
Acknowledgments
This work was supported in part by the Nation Science Council, Taiwan, under Grant NSC-102-2220-E-009-013- and Ministry of Economic Affairs, Taiwan, under Grant MOEA-101-EC-17-A-02-S1-202.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, SK., Hung, CY., Chen, CC. et al. Parallelizing Complex Streaming Applications on Distributed Scratchpad Memory Multicore Architecture. Int J Parallel Prog 42, 875–899 (2014). https://doi.org/10.1007/s10766-013-0256-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-013-0256-7