Skip to main content
Log in

Modeling and Evaluating Non-shared Memory CELL/BE Type Multi-core Architectures for Local Image and Video Processing

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Local processing, which is a dominant type of processing in image and video applications, requires a huge computational power to be performed in real-time. However, processing locality, in space and/or in time, allows to exploit data parallelism and data reusing. Although it is possible to exploit these properties to achieve high performance image and video processing in multi-core processors, it is necessary to develop suitable models and parallel algorithms, in particular for non-shared memory architectures. This paper proposes an efficient and simple model for local image and video processing on non-shared memory multi-core architectures. This model adopts a single program multiple data approach, where data is distributed, processed and reused in an optimal way, regarding the data size, the number of cores and the local memory capacity. The model was experimentally evaluated by developing video local processing algorithms and programming the Cell Broadband Engine multi-core processor, namely for advanced video motion estimation and in-loop deblocking filtering. Furthermore, based on these experiences it is also addressed the main challenges of vectorization, and the reduction of branch mispredictions and computational load imbalances. The limits and advantages of the regular and adaptive algorithms are also discussed. Experimental results show the adequacy of the proposed model to perform local video processing, and that real-time is achieved even to process the most demanding parts of advanced video coding. Full-pixel motion estimation is performed over high resolution video (720×576 pixels) at a rate of 30 frames per second, by considering large search areas and five reference frames.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17

Similar content being viewed by others

References

  1. Aji, A. M., Feng, W. c., Blagojevic, F., & Nikolopoulos, D. S. (2008). Cell-SWat: Modeling and scheduling wavefront computations on the Cell Broadband Engine. In CF ’08: Proceedings of the 5th conference on computing frontiers (pp. 13–22). New York: ACM.

    Chapter  Google Scholar 

  2. Alvarez, M., Salamí, E., Ramírez, A., & Valero, M. (2007). Performance impact of unaligned memory operations in SIMD extensions for video codec applications. In ISPASS (pp. 62–71).

  3. Ates, H. F., & Altunbasak, Y. (2005). SAD reuse in hierarchical motion estimation for the H.264 encoder. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing 2005 (Vol. 2, pp. ii/905–ii/908). (ICASSP’05). Piscataway: IEEE Signal Processing Society.

    Google Scholar 

  4. Bader, D. A., Agarwal, V., Madduri, K., & Kang, S. (2007). High performance combinatorial algorithm design on the Cell Broadband Engine processor. Parallel Computing, 33(10–11), 720–740.

    Article  Google Scholar 

  5. Blagojevic, F., Feng, X., & Cameron, K. W. (2008). Nikolopoulos, D.S.: Modeling multi-grain parallelism on heterogeneous multi-core processors: A case study of the Cell BE. In Proc. of the 2008 international conference on high-performance embedded architectures and compilers.

  6. Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., et al. (2004). Brook for GPUs: Stream computing on graphics hardware. ACM Transactions on Graphics, 23(3), 777–786.

    Article  Google Scholar 

  7. Chen, T., Raghavan, R., Dale, J. N., & Iwata, E. (2007). Cell broadband engine architecture and its first implementation: A performance view. IBM Journal of Research and Development, 51(5), 559–572.

    Article  Google Scholar 

  8. Chen, T., Raghavan, R., Dale, J. N., & Iwata, E. (2007). Cell broadband engine architecture and its first implementation: A performance view. IBM Journal of Research and Development, 51(5), 559–572.

    Article  Google Scholar 

  9. Chen, Z., Xu, J., He, Y., & Zheng, J. (2006). Fast integer-pel and fractional-pel motion estimation for H.264/AVC. Journal of Visual Communication and Image Representation, 17, 264–290.

    Article  Google Scholar 

  10. Ciric, V., & Milentijevic, I. (2007). Area-time tradeoffs in H.264/AVC deblocking filter design for mobile devices. In Proceedings of the IEEE conference on signal processing and its applications. Sharjah, UAE.

  11. Dou, Y., Deng, L., Xu, J., & Zheng, Y. (2008). DMA performance analysis and multi-core memory optimization for SWIM benchmark on the Cell processor. In ISPA ’08: Proceedings of the 2008 IEEE international symposium on parallel and distributed processing with applications (pp. 170–179). Washington, DC: IEEE Computer Society. doi:10.1109/ISPA.2008.54.

    Chapter  Google Scholar 

  12. Gonzalez, R. C., & Woods, R. E. (2006). Digital image processing (3rd ed.). Upper Saddle River: Prentice-Hall.

    Google Scholar 

  13. Graham, R. L., Knuth, D. E., & Patashnik, O. (1994). Concrete mathematics: A foundation for computer science, chap. integer functions (pp. 67–101). Boston: Addison-Wesley.

    Google Scholar 

  14. Hill, M., & Marty, M. (2008). Amdahl’s law in the multicore era. Computer, 41(7), 33–38.

    Article  Google Scholar 

  15. IBM: C/C++ language extensions for Cell Broadband Engine architecture, version 2.5 (2008). http://cell.scei.co.jp/e_download.html.

  16. Sousa, L., Piedade, M. (1993). Parallel algorithms: For digital image processing, computer vision and neural networks, chap. Low level parallel image processing pp. 25–52. New York: Wiley.

    Google Scholar 

  17. Momcilovic, S., & Sousa, L. (2008). A parallel algorithm for advanced video motion estimation on multicore architectures. In Complex, intelligent and software intensive systems, 2008. CISIS 2008. International conference on (pp. 831–836).

  18. Park, J., & Ha, S. (2007). Performance analysis of parallel execution of H.264 encoder on the Cell processor. In ESTImedia (pp. 27–32).

  19. Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., et al. (2008). Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics, 27(3), 1–15.

    Article  Google Scholar 

  20. Sotak, Jr., G. E., & Boyer, K. L. (1989). The Laplacian-of-Gaussian kernel: A formal analysis and design procedure for fast, accurate convolution and full-frame output. Computer Vision, Graphics, and Image Processing, 48(2), 147–189.

    Article  Google Scholar 

  21. Weiss, B. (2006). Fast median and bilateral filtering. ACM Transactions on Graphics, 25(3), 519–526.

    Article  Google Scholar 

  22. Wiegand, T., Schwarz, H., Joch, A., Kossentini, F., & Sullivan, G. J. (2003). Rate-constrained coder control and comparison of video coding standards. Circuits and Systems for Video Technology, IEEE Transactions on, 13(7), 688–703.

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by Portuguese Foundation for Science and Technology (FCT). The authors also acknowledge Georgia Institute of Technology, its Sony-Toshiba-IBM Center of Competence, and the National Science Foundation, for the use of the Cell Broadband Engine resources that have contributed to this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Svetislav Momcilovic.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Momcilovic, S., Sousa, L. Modeling and Evaluating Non-shared Memory CELL/BE Type Multi-core Architectures for Local Image and Video Processing. J Sign Process Syst 62, 301–318 (2011). https://doi.org/10.1007/s11265-010-0463-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-010-0463-z

Keywords

Navigation