Abstract
In the High Efficiency Video Coding (HEVC) standard, multiple decoding modules have been designed to take advantage of parallel processing. In particular, the HEVC in-loop filters (i.e., the deblocking filter and sample adaptive offset) were conceived to be exploited by parallel architectures. However, the type of the offered parallelism mostly suits the capabilities of multi-core CPUs, thus making a real challenge to efficiently exploit massively parallel architectures such as Graphic Processing Units (GPUs), mainly due to the existing data dependencies between the HEVC decoding procedures. In accordance, this paper presents a novel strategy to increase the amount of parallelism and the resulting performance of the HEVC in-loop filters on GPU devices. For this purpose, the proposed algorithm performs the HEVC filtering at frame-level and employs intrinsic GPU vector instructions. When compared to the state-of-the-art HEVC in-loop filter implementations, the proposed approach also reduces the amount of required memory transfers, thus further boosting the performance. Experimental results show that the proposed GPU in-loop filters deliver a significant improvement in decoding performance. For example, average frame rates of 76 frames per second (FPS) and 125 FPS for Ultra HD 4K are achieved on an embedded NVIDIA GPU for All Intra and Random Access configurations, respectively.
Similar content being viewed by others
References
Bossen, F.: Common test conditions and software reference configurations. Doc. JCTVC-L1100 of JCT-VC (2013)
Bossen, F., Bross, B., Suhring, K., Flynn, D.: HEVC complexity and implementation analysis. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1685–1696 (2012). doi:10.1109/TCSVT.2012.2221255
Chi, C.C., Alvarez-Mesa, M., Bross, B., Juurlink, B., Schierl, T.: SIMD acceleration for HEVC decoding. IEEE Trans. Circuits Syst. Video Technol. 25(5), 841–855 (2015). doi:10.1109/TCSVT.2014.2364413
Chi, C.C., Alvarez-Mesa, M., Juurlink, B., Clare, G., Henry, F., Pateux, S., Schierl, T.: Parallel scalability and efficiency of HEVC parallelization approaches. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1827–1838 (2012). doi:10.1109/TCSVT.2012.2223056
Cho, S., Kim, H., Kim, H.Y., Kim, M.: Efficient in-loop filtering across tile boundaries for multi-core HEVC hardware decoders with 4 K/8 K-UHD video applications. IEEE Trans. Multimed. 17(6), 778–791 (2015). doi:10.1109/TMM.2015.2418995
Eldeken, A.F., Dansereau, R.M., Fouad, M.M., Salama, G.I.: High throughput parallel scheme for HEVC deblocking filter. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 1538–1542 (2015). doi:10.1109/ICIP.2015.7351058
Fu, C.M., Alshina, E., Alshin, A., Huang, Y.W., Chen, C.Y., Tsai, C.Y., Hsu, C.W., Lei, S.M., Park, J.H., Han, W.J.: Sample adaptive offset in the HEVC standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1755–1764 (2012). doi:10.1109/TCSVT.2012.2221529
Haglund, L.: The SVT high definition multi format test set. Tech. rep., Sveriges Television AB (SVT), Sweden (2006). ftp://vqeg.its.bldrdoc.gov/HDTV/SVT_MultiFormat/SVT_MultiFormat_v10.pdf
Hautala, I., Boutellier, J., Hannuksela, J., Silvén, O.: Programmable low-power multicore coprocessor architecture for HEVC/H.265 in-loop filtering. IEEE Trans. Circuits Syst. Video Technol. 25(7), 1217–1230 (2015). doi:10.1109/TCSVT.2014.2369744
JCT-VC: High Efficient Video Coding (HEVC). ITU-T Recommendation H.265 and ISO/IEC 23008-2, ITU-T and ISO/IEC JTC 1 (2013)
JCT-VC: Subversion repository for the HEVC test model version HM 15.0 (2014). https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-15.0/
Norkin, A., Bjøntegaard, G., Fuldseth, A., Narroschke, M., Ikeda, M., Andersson, K., Zhou, M., Van der Auwera, G.: HEVC deblocking filter. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1746–1754 (2012). doi:10.1109/TCSVT.2012.2223053
Norkin, A., Bjontegaard, G., Fuldseth, A., Narroschke, M., Ikeda, M., Andersson, K., Zhou, M., Van der Auwera, G.: HEVC deblocking filter. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1746–1754 (2012)
NVIDIA Corporation: NVIDIA\(^{{\textregistered }} \text{CUDA}^{{\rm TM}}\) Compute Unified Device Architecture Programming Guide (version 1.0: Jun. 2007 (and subsequent editions))
Ohm, J., Sullivan, G., Schwarz, H., Tan, T.K., Wiegand, T.: Comparison of the coding efficiency of video coding standards-including high efficiency video coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. 22(12), 1669–1684 (2012)
de Souza, D.F., Ilic, A., Roma, N., Sousa, L.: HEVC in-loop filters GPU parallelization in embedded systems. In: 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), pp. 123–130 (2015). doi:10.1109/SAMOS.2015.7363667
Subramanya, P.N., Adireddy, R., Anand, D.: SAO in CTU decoding loop for HEVC video decoder. In: 2013 International Conference on Signal Processing and Communication (ICSC), pp. 507–511 (2013). doi:10.1109/ICSPCom.2013.6719845
Sullivan, G.J., Ohm, J., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012). doi:10.1109/TCSVT.2012.2221191
Wang, B., Alvarez-Mesa, M., Chi, C.C., Juurlink, B.: An optimized parallel IDCT on graphics processing units. In: Proceedings of the 18th International Conference on Parallel Processing Workshops, Euro-Par’12, pp. 155–164. Springer, Berlin, Heidelberg (2013). doi:10.1007/978-3-642-36949-0_18. http://dx.doi.org/10.1007/978-3-642-36949-0_18
Wang, B., Alvarez-Mesa, M., Chi, C.C., Juurlink, B.: Parallel H.264/AVC motion compensation for gpus using opencl. IEEE Trans. Circuits Syst. Video Technol. 25(3), 525–531 (2015). doi:10.1109/TCSVT.2014.2344512
Zhou, W., Zhang, J., Zhou, X., Liu, Z., Liu, X.: A high-throughput and multi-parallel VLSI architecture for HEVC deblocking filter. IEEE Trans. Multimed. PP(99), 1–1 (2016). doi:10.1109/TMM.2016.2537217
Acknowledgements
This work was supported by national funds through FCT, under projects PTDC/EEI-ELC/3152/2012 and UID/CEC/50021/2013. Diego F. de Souza also acknowledges FCT for the Ph.D. scholarship SFRH/BD/76285/2011.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, B., de Souza, D.F., Alvarez-Mesa, M. et al. GPU Parallelization of HEVC In-Loop Filters. Int J Parallel Prog 45, 1515–1535 (2017). https://doi.org/10.1007/s10766-017-0488-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-017-0488-z