Abstract
Videos have become the major type of data produced and consumed every day. With screens grow larger, ultra high definition (UHD) videos are becoming more popular since they provide better visual experience. However, video contents with UHD resolution are still scarce. High-performance video super-resolution (SR) techniques that can obtain high resolution (HR) videos from low resolution (LR) sources are recently used in UHD video production. Deep learning (DL)-based SR methods can provide HR videos with appreciable objective and subjective qualities, while their massive computational complexity makes the processing speed far slower than real-time even on GPU servers when producing UHD videos. Moreover, transcoding and other video processing algorithms executed during the enhancement are also time and resource consuming, which performs relatively slow on ordinary CPU and GPU servers. Nowadays, hardware including GPU, field-programmable gate array (FPGA) and application specific integrated circuit (ASIC) are proved to have outstanding capability on image and video processing tasks in different aspects, and there are also dedicated hardware accelerators meant for specific video processing tasks. In this paper, we focus on accelerating a UHD video enhancement workflow on heterogeneous system with multiple hardware accelerators. First, we optimize the most time consuming task, video SR, with CUDNN and CUDA libraries to achieve real-time processing speed for a single UHD output frame on an ordinary GPU. Second, we design a GPU-friendly multi-thread scheduling algorithm for data and computation to better utilize GPU resources and achieve real-time performance on outputting UHD video clips. Third, targeting on production environment, we build a UHD video enhancement application on selected heterogeneous hardware, with an integrated command line tool of our proposed algorithm, and achieve 60 fps real-time end to end processing speed. Experiments show high efficiency, robustness and compatibility of our approach.












Similar content being viewed by others
References
Wang, Z., Jian, C., Steven, C.: Deep learning for image super-resolution: a survey (2019). arXiv preprint arXiv:1902.06068
Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2016)
Elad, M., Feuer, A.: Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images. IEEE Trans. Image Process. 6(12), 1646–1658 (1997)
Bose, N.K., Boo, K.J.: High?resolution image reconstruction with multisensors. Int. J. Imaging Syst. Technol. 9(4), 294–304 (1998)
He, Y., Yap, K.H., Chen, L., Chau, L.: A nonlinear least square technique for simultaneous image registration and super-resolution. IEEE Trans. Image Process. 16(11), 2830–2841 (2007)
Anbarjafari, G., Demirel, H.: Image super resolution based on interpolation of wavelet domain high frequency subbands and the spatial domain input image. ETRI J. 32(3), 390–394 (2010)
Kim, K.I., Kwon, Y.: Single-image super-resolution using sparse regression and natural image prior. IEEE Trans. Pattern Anal. Mach. Intell. 32(6), 1127–1133 (2010)
Bevilacqua, M., Roumy, A., Guillemot, C., Alberi-Morel, L.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding (2012)
Jung, C., Ke, P., Sun, Z., Gu, A.: A fast deconvolution-based approach for single-image super-resolution with GPU acceleration. J. Real-Time Image Process. 14(2), 501–512 (2018)
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. European conference on computer vision. Springer, Cham (2014)
Zhao, Z., Song, L., Xie, R., Yang, X.: GPU accelerated high-quality video/image super-resolution. 2016 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) (2016)
Kim, J., Jung, K., Kyoung, M.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2016)
Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2017)
Kim, J., Kwon Lee, J., Mu Lee, K.: Deeply-recursive convolutional network for image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2016)
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2017)
Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., Change Loy, C.: Esrgan: Enhanced super-resolution generative adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Caballero, J., Ledig, C., Aitken, A., Acosta, A., Totz, J., Wang, Z., Shi, W.: Real-time video super-resolution with spatio-temporal networks and motion compensation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Jo, Y., Wug Oh, S., Kang, J., Joo Kim, S.: Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2018)
Yang, W., Zhang, X., Tian, Y., Wang, W., Xue, J., Liao, Q.: Deep learning for single image super-resolution: a brief review. IEEE Trans Multimedia (2019)
Chang, J., Keon-Woo, K., Suk-Ju, K.: An energy-efficient fpga-based deconvolutional neural networks accelerator for single image super-resolution. IEEE Trans Circuits Syst Video Technol (2018)
Kim, Y., Choi, J.S., Kim, M.: A real-time convolutional neural network for super-resolution on FPGA with applications to 4K UHD 60 fps Video Services. IEEE Trans Circuits Syst Video Technol (2018)
He, Z., Huang, H., Jiang, M., Bai, Y., Luo, G.: FPGA-based real-time super-resolution system for ultra high definition videos. In: 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) (2018)
Ko, Y., Yi, Y., Ha, S.: An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs. J. Real-Time Image Process. 9(1), 5–18 (2014)
Lee, D., Sim, D., Cho, K., Oh, S.: Fast motion estimation for HEVC on graphics processing unit (GPU). J. Real-Time Image Process. 12(2), 549–562 (2016)
Zhu, H., Wang, D., Zhang, P., Luo, Z., Jiao, L., Han, H.: Parallel implementations of frame rate up-conversion algorithm using OpenCL on heterogeneous computing devices. Multimedia Tools Appl 78, 9311–9334 (2018)
Bittner, R., Ruf, E., Forin, A.: Direct GPU/FPGA communication via PCI express. Cluster Comput. 17(2), 339–348 (2014)
Chang, Z.H., Jong, B.F., Wong, W.J., Wong, M.: Distributed video transcoding on a heterogeneous computing platform. In: 2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS) (2016)
HajiRassouliha, A., Taberner, A.J., Nash, M.P., Nielsen, P.M.: Suitability of recent hardware accelerators (DSPs, FPGAs, and GPUs) for computer vision and image processing algorithms. Signal Process. Image Commun. 68, 101–119 (2018)
Georgis, G., Lentaris, G., Reisis, D.: Acceleration techniques and evaluation on multi-core CPU, GPU and FPGA for image processing and super-resolution. J. Real-Time Image Process. 16, 1–28 (2016)
Schulter, S., Christian, L., Horst, B.: Fast and accurate image upscaling with super-resolution forests. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Fang, M., Fang, J., Zhang, W., Zhou, H., Liao, J., Wang, Y.: Benchmarking the GPU memory at the warp level. Parallel Comput. 71, 23–41 (2018)
Acknowledgements
This work was supported by NSFC (61521062, U1611461, 61671296), MoE-China Mobile Research Fund Project (MCM20180702), the 111 Project (B07022 and Sheitc No. 150633) and the Shanghai Key Laboratory of Digital Media Processing and Transmissions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dong, Y., Song, L., Xie, R. et al. Real-time UHD video super-resolution and transcoding on heterogeneous hardware. J Real-Time Image Proc 17, 2029–2045 (2020). https://doi.org/10.1007/s11554-019-00913-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-019-00913-7