Abstract
This work proposes several intrinsics on a reconfigurable processor intended for HEVC decoding and software pipelining algorithms with a coarse-grained array (CGA) architecture as well as the proposed intrinsic instructions. Software pipelining algorithms are developed for the CGA acceleration of inverse transform, pixel reconstruction, de-blocking filter and sample adaptive offset modules. To enable efficient software pipelining, several very-long instruction-word-based intrinsics are designed in order to maximize the parallelization rather than the computational acceleration. We found that the HEVC decoder with the proposed intrinsics yields 2.3 times faster in running clock cycle than a decoder that does not use the intrinsics. In addition, the HEVC decoder with CGA pipelining algorithms executes 10.9 times faster than that without the CGA mode.











Similar content being viewed by others
References
Sullivan, G., Ohm, J., Han, W.-J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)
Ahn, Y.-J., Ryu, H., Sim, D., Kang, J.-W.: Analysis of screen content coding based on HEVC. IEIE Trans. Smart Process. Comput. 4(4), 231–236 (2015)
Viitanen, M., Vanne, J., Hamalainen, T.D., Gabouj, M.: Complexity analysis of next-generation HEVC decoder. In: Proceedings on IEEE International Symposium Circuits and System (ISCAS), pp. 882–885 (2012)
Vanne, J., Vitanen, M., Hamalainen, T.D., Hallapuro, A.: Comparative rate-distortion-complexity analysis of HEVC and AVC video codecs. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1885–1898 (2012)
Yan, L., Duan, Y., Sun, J., Guo, Z.: Implementation of HEVC decoder on x86 processors with SIMD optimization. In: IEEE Visual Communications and Image Processing (VCIP), pp. 1–6 (2012)
Seo, J., Jo, H., Sim, D., Kim, D., Song, J.: Fast CAVLD of H.264/AVC on bitstream decoding processor. EURASIP J. Image Video Process. 2013(1), 1–14 (2013)
Ryu, H., Ahn, Y.-J., Mok, J.-S., Sim, D.: Performance analysis of HEVC paralleization methods for high-resolution videos. IEIE Trans. Smart Process. Comput. 4(1), 28–34 (2015)
Chen, T.W., Huang, Y.W., Chen, T.C., Chen, Y.H., Tsai, C.Y., Chen, L.G.: Architecture design of H.264/AVC decoder with hybrid task pipelining for high definition videos. In: Proceedings on IEEE International Symposium Circuits and System (ISCAS), pp. 2931–2934 (2005)
Nunez-Yanez, J.L., Spiteri, T., Vafiadis, G.: Multi-standard reconfigurable motion estimation processor for hybrid video codecs. IET Comput. Digit. Tech. 5(2), 73–85 (2011)
Wang, Y., Liu, L., Yin, S., Zhu, M., Cao, P., Wang, J., Wei, S.: On-chip memory hierarchy in one coarse-grained reconfigurable time and data-reference time. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. (2013)
Mei, C., Li, M., Cao, P., Amin, A., Li, C., Yang, J., Dejonghe, A., Perre, L.V., Shi, L., Pollin, S.: Exploration of full HD media decoding on a software defined radio baseband processor. IEEE Trans. Signal Process. 61(18), 4438–4449 (2013)
Kim, S., Lee, J., Yang, J., Sunwoo, M., Oh, S.: Novel instructions and their hardware architecture for video signal processing. In: Proceedings on IEEE International Symposium Circuits and Systems (ISCAS), pp. 3323–3326 (2005)
Kim, H., Ahn, M., Stratton, J.A., Hwu, W.W.: Design evaluation of openCL compiler framework for coarse-grained reconfigurable arrays. In: Proceedings of IEEE International Conference on Field-Programmable Technology (FPT), pp. 313–320, Seoul, KR (2012)
Lee, J., Byun, K., Eum, N.: ASIP for multi-standard video decoding. In: International Conference on Advances in Circuits, Electronics and Micro-electronics, pp. 37–42 (2012)
Jo, H.-H., Ahn, Y.-J., Kang, D.-B., Ji, B., Sim, D.-G.: Flexible multi-core platform for a multiple-format video decoder. J. Signal Process. Syst. Signal Image Video Technol. 80(2), 163–179 (2013)
Maiti, K., Pasupuleti, S.K., Gadde, R.N., Lee, S.J.: Efficient deblocking filter implementation on reconfigurable processor. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, pp. 1050–1054 (2016)
Real-time hardware decoding on FPGAs developed at Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute. https://www.hhi.fraunhofer.de/en/departments/vca/technologies-and-solutions/hevc-software-and-hardware-solutions/hevc-4k-real-time-hardware-decoder.html
Brown, S., Rose, J.: FPGAs and CPLDs: a tutorial. IEEE Des. Test Comput. 13(2), 24–57 (2002)
Rodriguez-Andian, J.J., Moure, M.J., Valdes, M.D.: Features, design tools, and application domains of FPGAs. IEEE Trans. Ind. Electron. 54(4), 1810–1823 (2007)
Park, S., Kim, H., Byun, K.: High performance and FPGA implementation of scalable video encoder. IEIE Trans. Smart Process. Comput. 3(6), 353–357 (2014)
Kim, I.-K., Min, J., Lee, T., Han, W.-J., Park, J.H.: Block partitioning structure in the HEVC standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1697–1706 (2012)
Ahn, Y.-J., Han, W.-J., Sim, D.G.: Study of decoder complexity for HEVC and AVC standards based on tool-by-tool comparison. In: SPIE Application of Digital Processing XXXV, Proceedings of SPIE, vol. 8499, pp. 8499–8432, San Diego, CA (2012)
Bossen, F., Bross, B., Suhring, K., Flynn, D.: HEVC complexity and implementation analysis. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1685–1696 (2012)
Jin, S., Lee, S.-H., Chung, M.-K., Cho, Y.-G., Ryu, S.: Implementation of a volume rendering on coarse-grained reconfigurable multiprocessor. In: Proceedings of IEEE International Conference on Field-Programmable Technology (FPT), pp. 243–246, Seoul, KR (2012)
Lee, S., Song, J., Kim, M., Kim, D., Lee, S.: H.264/AVC UHD decoder implementation on multi-cluster platform using hybrid parallelization method. In: 18th IEEE International Conference on Image Processing (2011)
Kim, C., Chung, M., Cho, Y., Konijnenburg, M., Ryu, S., Kim, J.: ULP-SRP: ultra low power samsung reconfigurable processor for biomedical applications. In: Proceedings of IEEE International Conference on Field-Programmable Technology (FPT), pp. 329–334, Seoul, KR (2012)
Budagavi, M., Fuldeth, A., Bjontegaard, G., Sze, V., Sadafale, M.: Core transform design in the high efficiency video coding (HEVC) standard. IEEE J. Sel. Top. Signal Process. 7(6), 1029–1041 (2013)
Norkin, A., Bjontegaard, G., Fuldseth, A., Narroschke, M., Ikeda, M., Andersson, K., Zhou, M., Auwera, G.V.: HEVC deblocking filter. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1746–1754 (2012)
Fu, C.-M., Alshina, E., Alshin, A., Huang, Y.-W., Chen, C.-Y., Tsai, C.-Y., Hsu, C.-W., Lei, S.-M., Park, J.-H., Han, W.-J.: Sample adaptive offset in the HEVC standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1755–1764 (2012)
Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG11, HM-9.0 reference software
Bossen, F.: Common HM test conditions and software reference configuration. In: Joint Collaborative Team on Video Coding (JCT-VC), JCTVC-E196 (2011)
Acknowledgements
This research was partly supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2014R1A2A1A11052210) and the Ministry of Science, ICT and Future Planning (MSIP), Korea, under the Information Technology Research Center (ITRC) support program (IITP-2017-2016-0-00288) supervised by the Institute for Information & Communications Technology Promotion (IITP).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ahn, YJ., Yoo, J., Jo, HH. et al. Software pipelining with CGA and proposed intrinsics on a reconfigurable processor for HEVC decoders. J Real-Time Image Proc 16, 2173–2187 (2019). https://doi.org/10.1007/s11554-017-0729-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-017-0729-9