Multicore processors have been utilized in embedded systems and general computing applications for some time. However, these multicore chips execute multiple applications concurrently, with each core carrying out a particular task in the system. Such systems can be found in gaming, automotive real-time systems and video / image encoding devices. These system are commonly deployed to overcome deadline misses, which are primarily due to overloading of a single multitasking core. In this paper, we explore the use of multiple cores for a single application, as opposed to multiple applications executing in a parallel fashion. A single application is parallelized using two different methods: one, a master-slave model; and two, a sequential pipeline model. The systems were implemented using Tensilica’s Xtensa LX processors with queues as the means of communications between two cores. In a master-slave model, we utilized a course grained approach whereby a main core distributes the workload to the remaining cores and reads the processed data before writing the results back to file. In the pipeline model, a lower granularity is used. The application is partitioned into multiple sequential blocks; each block representing a stage in a sequential pipeline. For both models we applied a number of differing configurations ranging from a single core to a nine-core system. We found that without any optimization for the seven core system, the sequential pipeline approach has a more efficient area usage, with an area increase to speedup ratio of 1.83 compared to the master-slave approach of 4.34. With selective optimization in the pipeline approach, we obtained speed ups of up to 4.6 × while with an area increase of only 3.1 × (area increase to speedup ratio of just 0.68).
Similar content being viewed by others
References
Kumar R., Tullsen D.M., Jouppi N.P., Ranganathan P. (2005). Heterogeneous Chip Multiprocessors. Computer 38(11):32–38
D. Pham et al., The Design and Implementation of a First-generation Cell Processor, in Proc. of the ISSCC 2005, IEEE CS Press, pp. 184–186 (2005).
T. D. Braun, H. J. Siegel, and A. A. Maciejewski, Heterogeneous computing: Goals, Methods, and Open Problems, in Proc. of the HiPC 2001, Hyderabad, India, Springer, Berlin, Vol. 2228, pp. 302–320 (2001).
J. Axelsson, A Case Study in Heterogeneous Implementation of Automotive Real-Time Systems, in Proc. of the CODES’98, Seattle (1998).
Strik M.T.J., Timmer A.H., van Meerbergen J.L., van Rootselaar G.-J. (2000). Heterogeneous Multiprocessor for the Management of Real-time Video and Graphics Streams. IEEE J. Solid-State Circuits 35(11):1722–1731
Zhang N., Wu C.-H. (1997). Study on Adaptive Job Assignment for Multiprocessor Implementation of MPEG2 Video Encoding. IEEE Trans. Ind. Electron. 44(5):726–734
A. Berić, Ramanathan Sethuraman, Carlos Alba Pinto, Harm Peters, Gerard Veldman, Peter van de Haar, and Marc Duranton, Heterogeneous Multiprocessor for High Definition Video, in Proc of the ICCE’06, pp. 401–402 (2006).
S. Gopalakrishnan and M. Caccamo, Task Partitioning with Replication upon Heterogeneous Multiprocessor Systems, in Proc of the RTAS’06, pp. 199–207 (2006).
S. Baruah, Task Partitioning upon Heterogeneous Multiprocessor Platforms, in Proc of the RTAS’04, pp. 536–543 (2004).
M. Kim, D. Kim, and G.E. Sobelman, MPEG-4 Performance Analysis for a CDMA Network-on-chip, in Proc of the 2005 International Conference on Communications, Circuits and Systems, 2005, pp. 493–496 (2005).
Wieferink A., Doerper M., Leupers R., Ascheid G., Meyr H., Kogel T., Braun G., Nohl A. (2005). System Level Processor/Communication Co-exploration Methodology for Multiprocessor System-on-Chip Platforms. Comput. Digit. Tech. IEE Proc. 152(1):3–11
V. Stefan V. Živojnović, S Pees, and H. Myer, LISA-machine Description Language and Generic Machine Model for HW/SW Co-design, in Workshop on VLSI Signal Processing, pp. 127–136 (1996).
SystemC Initiative. (http://www.systemc.org).
K. S. Chatha and R. Vemuri, A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems, in Proc. of the 11th International Symposium on System Synthesis, 1998, Hsinchu, pp. 145–151 (1998).
J. Jeon and K. Choi, Loop Pipelining in Hardware-Software Partitioning, in Design Automation Conference 1998. Proceedings of the ASP-DAC ’98. Asia and South Pacific, Yokohama, Japan, pp. 361–366 (1998).
T. Kodaka, K. Kimura, and H. Kasahara, Multigrain Parallel Processing for JPEG Encoding on a Single Chip Multiprocessor, in Proc. of the IWIA’02, pp. 57–63 (2002).
Banerjee S., Hamada T., Chau P.M., Fellman R.D. (1995). Macro Pipelining Based Scheduling on High Performance Heterogeneous Multiprocessor Systems. IEEE Trans. Signal Process. 43(6):1468–1484
T. A. Giuma and K. W. Hart, Microcomputer Bus Architectures, in Southcon Conference, Orlando, FL, pp. 431–437 (1996).
Independent JPEG Group. IJG (http://www.ijg.org).
Xtensa Processor. Tensilica Inc. (http://www.tensilica.com).
Sun F., Ravi S., Raghunathan A., Jha N.K. (2004). Custom-instruction synthesis for extensible-processor platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 23(2):216–228
Flix: Fast relief for performance-hungry embedded applications, Tensilica Inc. (http://www.tensilica.com/pdf/FLIX_White_Paper_v2.pdf) (2005).
K.-C. Huang and F.-J. Wang, Design Patterns for Parallel Computations of Master-Slave Model, in Proc. of the International Conference on Information, Communications and Signal Processing, Vol. 3, pp. 1508–1512 (1997).
T. G. Lewis and H. El-Rewini, Introduction to Parallel Computing, Prentice Hall, Englewood Cliffs, NJ (1992).
E. Hamilton, JPEG File Interchange Format. Technical report, C-Cube Microsystems, September 1 (1992).
J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 3rd Ed., Morgan Kaufmann Publishers, Los Atlos, CA (2003).
Author information
Authors and Affiliations
Corresponding author
Additional information
National ICT Australia is funded through the Australian Government’s Backing Australia’s Ability initiative, in part through the Australian Research Council.
Rights and permissions
About this article
Cite this article
Shee, S.L., Erdos, A. & Parameswaran, S. Architectural Exploration of Heterogeneous Multiprocessor Systems for JPEG. Int J Parallel Prog 36, 140–162 (2008). https://doi.org/10.1007/s10766-007-0040-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-007-0040-7