skip to main content
10.1145/3078155.3078177acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiwoclConference Proceedingsconference-collections
extended-abstract

Wavefront Parallel Processing on GPUs with an Application to Video Encoding Algorithms

Published:16 May 2017Publication History

ABSTRACT

In this paper, we present our experiences in designing, implementing and evaluating efficient applications of the wavefront pattern for block-level motion estimation in video encoding algorithms using OpenCL™ kernels on Intel® Processor Graphics™. We implement multiple solutions exploring different performance considerations, evaluate their pros and cons, present performance data, and provide our recommendations.

References

  1. Khronos OpenCL Working Group. The OpenCL specification version 1.2, 2.0. 2015. Retrieved from: http://www.khronos.org/registry/cl/.Google ScholarGoogle Scholar
  2. Intel Corporation. 2017. Cl_intel_device_side_avc_motion_estimation Extension Specification. (2017). https://www.khronos.org/registry/OpenCL/extensions/intel/cl_intel_device_side_avc_motion_estimation.txtGoogle ScholarGoogle Scholar
  3. Junkins, Stephen. 2015. The Compute Architecture of Intel® Processor Graphics Gen9. Retrieved from: https://software.intel.com/en-us/file/the-compute-architecture-of-intel-processor-graphics-gen9-v1d0pdfGoogle ScholarGoogle Scholar
  4. Wiegand, Thomas, et al. "Overview of the H.264/AVC video coding standard." IEEE Transactions on circuits and systems for video technology 13.7 (2003): 560--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sullivan, Gary J., et al. "Overview of the high efficiency video coding (HEVC) standard." IEEE Transactions on circuits and systems for video technology 22.12 (2012): 1649--1668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sullivan G. J. and Wiegand T. (1998) Rate-distortion optimization for video compression. IEEE Signal Processing Magazine, vol. 15, pp. 74--90, ISSN: 1053--5888.Google ScholarGoogle ScholarCross RefCross Ref
  7. Zhao, Zhuo, and Ping Liang. "Data partition for wavefront parallelization of H. 264 video encoder." Circuits and Systems, 2006. ISCAS 2006. Proceedings. 2006 IEEE International Symposium on. IEEE, 2006.Google ScholarGoogle Scholar
  8. Cheung, Nagai-Man, et al. "Video coding on multicore graphics processors." IEEE Signal Processing Magazine 27.2 (2010): 79--89.Google ScholarGoogle ScholarCross RefCross Ref
  9. Sarwer, Mohammed Golam, and QM Jonathan Wu. "Improved intra prediction of H.264/AVC." Effective Video Coding for Multimedia Applications, Sudhakar Radhakrishnan (Ed.), ISBN (2011): 978--953.Google ScholarGoogle Scholar
  10. Hiranandani, Seema, Ken Kennedy, and Chau-Wen Tseng. "Evaluating compiler optimizations for Fortran D." Journal of Parallel and Distributed Computing 21.1 (1994): 27--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Prylli, Loic, and Bernard Tourancheau. "Block cyclic array redistribution." (1995).Google ScholarGoogle Scholar
  12. Volkov, Vasily, and James W. Demmel. "Benchmarking GPUs to tune dense linear algebra." High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for. IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kumar, Vipin, et al. Introduction to parallel computing: design and analysis of algorithms. Vol. 400. Redwood City, CA: Benjamin/Cummings, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gomes, Jeremias M., et al. "Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™" Computer Architecture and High Performance Computing (SBAC-PAD), 2015 27th International Symposium on. IEEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Aji, Ashwin M., and Wu-Chun Feng. Accelerating data-serial applications on data-parallel GPGPUs: a systems approach. Technical Report TR-08-24, Computer Science, Virginia Tech, 2008.Google ScholarGoogle Scholar
  16. Xiao, Shucai, and Wu-chun Feng. "Inter-block GPU communication via fast barrier synchronization." Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on. IEEE, 2010Google ScholarGoogle ScholarCross RefCross Ref
  17. Liu, Yongchao, Douglas L. Maskell, and Bertil Schmidt. "CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units." BMC research notes 2.1 (2009): 73.Google ScholarGoogle ScholarCross RefCross Ref
  18. Gupta, Kshitij, Jeff A. Stuart, and John D. Owens. "A study of persistent threads style GPU programming for GPGPU workloads." Innovative Parallel Computing (InPar), 2012. IEEE, 2012.Google ScholarGoogle Scholar
  1. Wavefront Parallel Processing on GPUs with an Application to Video Encoding Algorithms

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader