Skip to main content
Log in

Architectural Exploration of Heterogeneous Multiprocessor Systems for JPEG

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Multicore processors have been utilized in embedded systems and general computing applications for some time. However, these multicore chips execute multiple applications concurrently, with each core carrying out a particular task in the system. Such systems can be found in gaming, automotive real-time systems and video / image encoding devices. These system are commonly deployed to overcome deadline misses, which are primarily due to overloading of a single multitasking core. In this paper, we explore the use of multiple cores for a single application, as opposed to multiple applications executing in a parallel fashion. A single application is parallelized using two different methods: one, a master-slave model; and two, a sequential pipeline model. The systems were implemented using Tensilica’s Xtensa LX processors with queues as the means of communications between two cores. In a master-slave model, we utilized a course grained approach whereby a main core distributes the workload to the remaining cores and reads the processed data before writing the results back to file. In the pipeline model, a lower granularity is used. The application is partitioned into multiple sequential blocks; each block representing a stage in a sequential pipeline. For both models we applied a number of differing configurations ranging from a single core to a nine-core system. We found that without any optimization for the seven core system, the sequential pipeline approach has a more efficient area usage, with an area increase to speedup ratio of 1.83 compared to the master-slave approach of 4.34. With selective optimization in the pipeline approach, we obtained speed ups of up to 4.6 × while with an area increase of only 3.1 × (area increase to speedup ratio of just 0.68).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kumar R., Tullsen D.M., Jouppi N.P., Ranganathan P. (2005). Heterogeneous Chip Multiprocessors. Computer 38(11):32–38

    Article  Google Scholar 

  2. D. Pham et al., The Design and Implementation of a First-generation Cell Processor, in Proc. of the ISSCC 2005, IEEE CS Press, pp. 184–186 (2005).

  3. T. D. Braun, H. J. Siegel, and A. A. Maciejewski, Heterogeneous computing: Goals, Methods, and Open Problems, in Proc. of the HiPC 2001, Hyderabad, India, Springer, Berlin, Vol. 2228, pp. 302–320 (2001).

  4. J. Axelsson, A Case Study in Heterogeneous Implementation of Automotive Real-Time Systems, in Proc. of the CODES’98, Seattle (1998).

  5. Strik M.T.J., Timmer A.H., van Meerbergen J.L., van Rootselaar G.-J. (2000). Heterogeneous Multiprocessor for the Management of Real-time Video and Graphics Streams. IEEE J. Solid-State Circuits 35(11):1722–1731

    Article  Google Scholar 

  6. Zhang N., Wu C.-H. (1997). Study on Adaptive Job Assignment for Multiprocessor Implementation of MPEG2 Video Encoding. IEEE Trans. Ind. Electron. 44(5):726–734

    Article  Google Scholar 

  7. A. Berić, Ramanathan Sethuraman, Carlos Alba Pinto, Harm Peters, Gerard Veldman, Peter van de Haar, and Marc Duranton, Heterogeneous Multiprocessor for High Definition Video, in Proc of the ICCE’06, pp. 401–402 (2006).

  8. S. Gopalakrishnan and M. Caccamo, Task Partitioning with Replication upon Heterogeneous Multiprocessor Systems, in Proc of the RTAS’06, pp. 199–207 (2006).

  9. S. Baruah, Task Partitioning upon Heterogeneous Multiprocessor Platforms, in Proc of the RTAS’04, pp. 536–543 (2004).

  10. M. Kim, D. Kim, and G.E. Sobelman, MPEG-4 Performance Analysis for a CDMA Network-on-chip, in Proc of the 2005 International Conference on Communications, Circuits and Systems, 2005, pp. 493–496 (2005).

  11. Wieferink A., Doerper M., Leupers R., Ascheid G., Meyr H., Kogel T., Braun G., Nohl A. (2005). System Level Processor/Communication Co-exploration Methodology for Multiprocessor System-on-Chip Platforms. Comput. Digit. Tech. IEE Proc. 152(1):3–11

    Article  Google Scholar 

  12. V. Stefan V. Živojnović, S Pees, and H. Myer, LISA-machine Description Language and Generic Machine Model for HW/SW Co-design, in Workshop on VLSI Signal Processing, pp. 127–136 (1996).

  13. SystemC Initiative. (http://www.systemc.org).

  14. K. S. Chatha and R. Vemuri, A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems, in Proc. of the 11th International Symposium on System Synthesis, 1998, Hsinchu, pp. 145–151 (1998).

  15. J. Jeon and K. Choi, Loop Pipelining in Hardware-Software Partitioning, in Design Automation Conference 1998. Proceedings of the ASP-DAC ’98. Asia and South Pacific, Yokohama, Japan, pp. 361–366 (1998).

  16. T. Kodaka, K. Kimura, and H. Kasahara, Multigrain Parallel Processing for JPEG Encoding on a Single Chip Multiprocessor, in Proc. of the IWIA’02, pp. 57–63 (2002).

  17. Banerjee S., Hamada T., Chau P.M., Fellman R.D. (1995). Macro Pipelining Based Scheduling on High Performance Heterogeneous Multiprocessor Systems. IEEE Trans. Signal Process. 43(6):1468–1484

    Article  Google Scholar 

  18. T. A. Giuma and K. W. Hart, Microcomputer Bus Architectures, in Southcon Conference, Orlando, FL, pp. 431–437 (1996).

  19. Independent JPEG Group. IJG (http://www.ijg.org).

  20. Xtensa Processor. Tensilica Inc. (http://www.tensilica.com).

  21. Sun F., Ravi S., Raghunathan A., Jha N.K. (2004). Custom-instruction synthesis for extensible-processor platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 23(2):216–228

    Article  Google Scholar 

  22. Flix: Fast relief for performance-hungry embedded applications, Tensilica Inc. (http://www.tensilica.com/pdf/FLIX_White_Paper_v2.pdf) (2005).

  23. K.-C. Huang and F.-J. Wang, Design Patterns for Parallel Computations of Master-Slave Model, in Proc. of the International Conference on Information, Communications and Signal Processing, Vol. 3, pp. 1508–1512 (1997).

  24. T. G. Lewis and H. El-Rewini, Introduction to Parallel Computing, Prentice Hall, Englewood Cliffs, NJ (1992).

  25. E. Hamilton, JPEG File Interchange Format. Technical report, C-Cube Microsystems, September 1 (1992).

  26. J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 3rd Ed., Morgan Kaufmann Publishers, Los Atlos, CA (2003).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seng Lin Shee.

Additional information

National ICT Australia is funded through the Australian Government’s Backing Australia’s Ability initiative, in part through the Australian Research Council.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shee, S.L., Erdos, A. & Parameswaran, S. Architectural Exploration of Heterogeneous Multiprocessor Systems for JPEG. Int J Parallel Prog 36, 140–162 (2008). https://doi.org/10.1007/s10766-007-0040-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-007-0040-7

Keywords

Navigation