Skip to main content
Log in

Software pipelining with CGA and proposed intrinsics on a reconfigurable processor for HEVC decoders

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

This work proposes several intrinsics on a reconfigurable processor intended for HEVC decoding and software pipelining algorithms with a coarse-grained array (CGA) architecture as well as the proposed intrinsic instructions. Software pipelining algorithms are developed for the CGA acceleration of inverse transform, pixel reconstruction, de-blocking filter and sample adaptive offset modules. To enable efficient software pipelining, several very-long instruction-word-based intrinsics are designed in order to maximize the parallelization rather than the computational acceleration. We found that the HEVC decoder with the proposed intrinsics yields 2.3 times faster in running clock cycle than a decoder that does not use the intrinsics. In addition, the HEVC decoder with CGA pipelining algorithms executes 10.9 times faster than that without the CGA mode.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Sullivan, G., Ohm, J., Han, W.-J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)

    Article  Google Scholar 

  2. Ahn, Y.-J., Ryu, H., Sim, D., Kang, J.-W.: Analysis of screen content coding based on HEVC. IEIE Trans. Smart Process. Comput. 4(4), 231–236 (2015)

    Article  Google Scholar 

  3. Viitanen, M., Vanne, J., Hamalainen, T.D., Gabouj, M.: Complexity analysis of next-generation HEVC decoder. In: Proceedings on IEEE International Symposium Circuits and System (ISCAS), pp. 882–885 (2012)

  4. Vanne, J., Vitanen, M., Hamalainen, T.D., Hallapuro, A.: Comparative rate-distortion-complexity analysis of HEVC and AVC video codecs. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1885–1898 (2012)

    Article  Google Scholar 

  5. Yan, L., Duan, Y., Sun, J., Guo, Z.: Implementation of HEVC decoder on x86 processors with SIMD optimization. In: IEEE Visual Communications and Image Processing (VCIP), pp. 1–6 (2012)

  6. Seo, J., Jo, H., Sim, D., Kim, D., Song, J.: Fast CAVLD of H.264/AVC on bitstream decoding processor. EURASIP J. Image Video Process. 2013(1), 1–14 (2013)

    Article  Google Scholar 

  7. Ryu, H., Ahn, Y.-J., Mok, J.-S., Sim, D.: Performance analysis of HEVC paralleization methods for high-resolution videos. IEIE Trans. Smart Process. Comput. 4(1), 28–34 (2015)

    Article  Google Scholar 

  8. Chen, T.W., Huang, Y.W., Chen, T.C., Chen, Y.H., Tsai, C.Y., Chen, L.G.: Architecture design of H.264/AVC decoder with hybrid task pipelining for high definition videos. In: Proceedings on IEEE International Symposium Circuits and System (ISCAS), pp. 2931–2934 (2005)

  9. Nunez-Yanez, J.L., Spiteri, T., Vafiadis, G.: Multi-standard reconfigurable motion estimation processor for hybrid video codecs. IET Comput. Digit. Tech. 5(2), 73–85 (2011)

    Article  Google Scholar 

  10. Wang, Y., Liu, L., Yin, S., Zhu, M., Cao, P., Wang, J., Wei, S.: On-chip memory hierarchy in one coarse-grained reconfigurable time and data-reference time. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. (2013)

  11. Mei, C., Li, M., Cao, P., Amin, A., Li, C., Yang, J., Dejonghe, A., Perre, L.V., Shi, L., Pollin, S.: Exploration of full HD media decoding on a software defined radio baseband processor. IEEE Trans. Signal Process. 61(18), 4438–4449 (2013)

    Article  MathSciNet  Google Scholar 

  12. Kim, S., Lee, J., Yang, J., Sunwoo, M., Oh, S.: Novel instructions and their hardware architecture for video signal processing. In: Proceedings on IEEE International Symposium Circuits and Systems (ISCAS), pp. 3323–3326 (2005)

  13. Kim, H., Ahn, M., Stratton, J.A., Hwu, W.W.: Design evaluation of openCL compiler framework for coarse-grained reconfigurable arrays. In: Proceedings of IEEE International Conference on Field-Programmable Technology (FPT), pp. 313–320, Seoul, KR (2012)

  14. Lee, J., Byun, K., Eum, N.: ASIP for multi-standard video decoding. In: International Conference on Advances in Circuits, Electronics and Micro-electronics, pp. 37–42 (2012)

  15. Jo, H.-H., Ahn, Y.-J., Kang, D.-B., Ji, B., Sim, D.-G.: Flexible multi-core platform for a multiple-format video decoder. J. Signal Process. Syst. Signal Image Video Technol. 80(2), 163–179 (2013)

    Article  Google Scholar 

  16. Maiti, K., Pasupuleti, S.K., Gadde, R.N., Lee, S.J.: Efficient deblocking filter implementation on reconfigurable processor. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, pp. 1050–1054 (2016)

  17. Real-time hardware decoding on FPGAs developed at Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute. https://www.hhi.fraunhofer.de/en/departments/vca/technologies-and-solutions/hevc-software-and-hardware-solutions/hevc-4k-real-time-hardware-decoder.html

  18. Brown, S., Rose, J.: FPGAs and CPLDs: a tutorial. IEEE Des. Test Comput. 13(2), 24–57 (2002)

    Google Scholar 

  19. Rodriguez-Andian, J.J., Moure, M.J., Valdes, M.D.: Features, design tools, and application domains of FPGAs. IEEE Trans. Ind. Electron. 54(4), 1810–1823 (2007)

    Article  Google Scholar 

  20. Park, S., Kim, H., Byun, K.: High performance and FPGA implementation of scalable video encoder. IEIE Trans. Smart Process. Comput. 3(6), 353–357 (2014)

    Article  Google Scholar 

  21. Kim, I.-K., Min, J., Lee, T., Han, W.-J., Park, J.H.: Block partitioning structure in the HEVC standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1697–1706 (2012)

    Article  Google Scholar 

  22. Ahn, Y.-J., Han, W.-J., Sim, D.G.: Study of decoder complexity for HEVC and AVC standards based on tool-by-tool comparison. In: SPIE Application of Digital Processing XXXV, Proceedings of SPIE, vol. 8499, pp. 8499–8432, San Diego, CA (2012)

  23. Bossen, F., Bross, B., Suhring, K., Flynn, D.: HEVC complexity and implementation analysis. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1685–1696 (2012)

    Article  Google Scholar 

  24. Jin, S., Lee, S.-H., Chung, M.-K., Cho, Y.-G., Ryu, S.: Implementation of a volume rendering on coarse-grained reconfigurable multiprocessor. In: Proceedings of IEEE International Conference on Field-Programmable Technology (FPT), pp. 243–246, Seoul, KR (2012)

  25. Lee, S., Song, J., Kim, M., Kim, D., Lee, S.: H.264/AVC UHD decoder implementation on multi-cluster platform using hybrid parallelization method. In: 18th IEEE International Conference on Image Processing (2011)

  26. Kim, C., Chung, M., Cho, Y., Konijnenburg, M., Ryu, S., Kim, J.: ULP-SRP: ultra low power samsung reconfigurable processor for biomedical applications. In: Proceedings of IEEE International Conference on Field-Programmable Technology (FPT), pp. 329–334, Seoul, KR (2012)

  27. Budagavi, M., Fuldeth, A., Bjontegaard, G., Sze, V., Sadafale, M.: Core transform design in the high efficiency video coding (HEVC) standard. IEEE J. Sel. Top. Signal Process. 7(6), 1029–1041 (2013)

    Article  Google Scholar 

  28. Norkin, A., Bjontegaard, G., Fuldseth, A., Narroschke, M., Ikeda, M., Andersson, K., Zhou, M., Auwera, G.V.: HEVC deblocking filter. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1746–1754 (2012)

    Article  Google Scholar 

  29. Fu, C.-M., Alshina, E., Alshin, A., Huang, Y.-W., Chen, C.-Y., Tsai, C.-Y., Hsu, C.-W., Lei, S.-M., Park, J.-H., Han, W.-J.: Sample adaptive offset in the HEVC standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1755–1764 (2012)

    Article  Google Scholar 

  30. Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG11, HM-9.0 reference software

  31. Bossen, F.: Common HM test conditions and software reference configuration. In: Joint Collaborative Team on Video Coding (JCT-VC), JCTVC-E196 (2011)

Download references

Acknowledgements

This research was partly supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2014R1A2A1A11052210) and the Ministry of Science, ICT and Future Planning (MSIP), Korea, under the Information Technology Research Center (ITRC) support program (IITP-2017-2016-0-00288) supervised by the Institute for Information & Communications Technology Promotion (IITP).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Donggyu Sim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahn, YJ., Yoo, J., Jo, HH. et al. Software pipelining with CGA and proposed intrinsics on a reconfigurable processor for HEVC decoders. J Real-Time Image Proc 16, 2173–2187 (2019). https://doi.org/10.1007/s11554-017-0729-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-017-0729-9

Keywords