Abstract
Video Super-Resolution (VSR) task aims to reconstruct missing high-frequency information lost in degradation. Researchers have proposed many excellent models. However, these models require large memory and high computational cost. In this paper, we propose a novel VSR model called StudentVSR (StuVSR) which is a unidirectional recurrent network. To guarantee StuVSR can generate sufficient high-frequency information, we propose Inceptual Attention (IA) mechanism. Meanwhile, to compress the model size, we utilize the idea of knowledge distillation. We take an auto-encoder network as teacher and redesign the knowledge distillation mode. StuVSR employs extremely small parameters and accomplishes the VSR task in a rapid manner. StuVSR can generate 30-frame-per-second (FPS) 1080p-2k videos in real-time. We conduct comparison experiments to prove the superiority of StuVSR and StuVSR achieves the highest Peak Signal to Noise Ratio (PSNR) score among 16 state of the arts. We also explore the function of the inceptual attention and the knowledge distillation mode through ablation experiments. We will publish the codes at https://github.com/Dawn3474/StuVSR.
L. Wang—Supported by Institute of Information Engineering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Caballero, J., et al.: Real-time video super-resolution with spatio-temporal networks and motion compensation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4778–4787 (2017)
Chan, K.C., Wang, X., Yu, K., Dong, C., Loy, C.C.: BasicVSR: the search for essential components in video super-resolution and beyond. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)
Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: Neural Information Processing Systems, NIPS 2017, pp. 742–751. Curran Associates Inc., Red Hook (2017)
Ding, L., Wang, Z., Fan, Y., Liu, X., Huang, T.: Robust video super-resolution with learned temporal dynamics. In: IEEE International Conference on Computer Vision (2017)
Fuoli, D., Gu, S., Timofte, R.: Efficient video super-resolution through recurrent latent space propagation. In: IEEE International Conference on Computer Vision Workshop, pp. 3476–3485. IEEE (2019)
Haris, M., Shakhnarovich, G., Ukita, N.: Recurrent back-projection network for video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3897–3906 (2019)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. Comput. Sci. 14(7), 38–39 (2015)
Huang, Y., Wang, W., Wang, L.: Bidirectional recurrent convolutional networks for multi-frame super-resolution. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Neural Information Processing Systems, vol. 28. Curran Associates, Inc. (2015)
Isobe, T., Jia, X., Gu, S., Li, S., Wang, S., Tian, Q.: Video super-resolution with recurrent structure-detail network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 645–660. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_38
Isobe, T., et al.: Video super-resolution with temporal group attention. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8008–8017 (2020)
Isobe, T., Zhu, F., Jia, X., Wang, S.: Revisiting temporal modeling for video super-resolution. arXiv preprint arXiv:2008.05765 (2020)
Jo, Y., Oh, S.W., Kang, J., Kim, S.J.: Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3224–3232 (2018)
Li, W., Tao, X., Guo, T., Qi, L., Lu, J., Jia, J.: MuCAN: multi-correspondence aggregation network for video super-resolution. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 335–351. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_20
Noh, J., Bae, W., Lee, W., Seo, J., Kim, G.: Better to follow, follow to be better: towards precise supervision of feature super-resolution for small object detection. In: IEEE International Conference on Computer Vision, pp. 9725–9734 (2019)
Peng, H., Du, H., Yu, H., LI, Q., Liao, J., Fu, J.: Cream of the crop: distilling prioritized paths for one-shot neural architecture search. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Neural Information Processing Systems, vol. 33, pp. 17955–17964. Curran Associates, Inc. (2020)
Sajjadi, M.S., Vemulapalli, R., Brown, M.: Frame-recurrent video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6626–6634 (2018)
Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016). https://doi.org/10.1109/CVPR.2016.207
So, D., Le, Q., Liang, C.: The evolved transformer. In: International Conference on Machine Learning, pp. 5877–5886. PMLR (2019)
Tao, X., Gao, H., Liao, R., Wang, J., Jia, J.: Detail-revealing deep video super-resolution. IEEE Computer Society (2017)
Tian, Y., Zhang, Y., Fu, Y., Xu, C.: TDAN: temporally-deformable alignment network for video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3360–3369 (2020)
Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, p. 0 (2019)
Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. Int. J. Comput. Vision 127(8), 1106–1125 (2019)
Yi, P., Wang, Z., Jiang, K., Jiang, J., Ma, J.: Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations. In: IEEE International Conference on Computer Vision, pp. 3106–3115 (2019)
Zangeneh, E., Rahmati, M., Mohsenzadeh, Y.: Low resolution face recognition using a two-branch deep convolutional neural network architecture. Expert Syst. Appl. 139, 112854 (2020)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp. 7354–7363. PMLR (2019)
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 294–310. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_18
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, G., Wang, X., Zha, D., Wang, L., Zhao, L. (2021). Efficient, Low-Cost, Real-Time Video Super-Resolution Network. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13111. Springer, Cham. https://doi.org/10.1007/978-3-030-92273-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-92273-3_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92272-6
Online ISBN: 978-3-030-92273-3
eBook Packages: Computer ScienceComputer Science (R0)