skip to main content
10.1145/3394171.3413899acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

MRS-Net: Multi-Scale Recurrent Scalable Network for Face Quality Enhancement of Compressed Videos

Published: 12 October 2020 Publication History

Abstract

The past decade has witnessed the explosive growth of faces in video multimedia systems, e.g., videoconferencing and live shows. However, these videos are normally compressed at low bit-rates due to the bandwidth-hungry issue, leading to heavy quality degradation on face regions. This paper addresses the problem of face quality enhancement in compressed videos. Specifically, we establish a compressed face video (CFV) database, which includes 87,607 faces in 113 raw video sequences and their corresponding 904 compressed sequences. We find that the faces of compressed videos exhibit tremendous scale variation and quality fluctuation. Motivated by scalable video coding, we propose a multi-scale recurrent scalable network (MRS-Net) to enhance the quality of multi-scale faces in compressed videos. The MRS-Net is comprised by one base and two refined enhancement levels, corresponding to the quality enhancement of small-, medium- and large-scale faces, respectively. In the multi-level architecture of our MRS-Net, small-/medium-scale face quality enhancement serves as the basis for facilitating the quality enhancement of medium-/large-scale faces. Finally, experimental results show that our MRS-Net method is effective in enhancing the quality of multi-scale faces for compressed videos, significantly outperforming other state-of-the-art methods.

Supplementary Material

MP4 File (3394171.3413899.mp4)
This paper addresses the problem of face quality enhancement in compressed videos. Specifically, we establish a compressed face video (CFV) database, which includes 87,607 faces in 113 raw video sequences and their corresponding 904 compressed sequences. We find that the faces of compressed videos exhibit tremendous scale variation and quality fluctuation. Motivated by scalable video coding, we propose a multi-scale recurrent scalable network (MRS-Net) to enhance the quality of multi-scale faces in compressed videos. The MRS-Net is comprised by one base and two refined enhancement levels, corresponding to the quality enhancement of small-, medium- and large-scale faces, respectively. In the multi-level architecture of our MRS-Net, small-/medium-scale face quality enhancement serves as the basis for facilitating the quality enhancement of medium-/large-scale faces.

References

[1]
https://www.cdvl.org/. The Consumer Digital Video Library.
[2]
https://www.harmonicinc.com/free-4k-demo-footage/. Harmonic.
[3]
https://www.its.bldrdoc.gov/vqeg/video-datasets-and-organizations.aspx/. Video Quality Experts Group.
[4]
Bo Ai, Andreas F Molisch, Markus Rupp, and Zhang-Dui Zhong. 2020. 5G Key Technologies for Smart Railways. Proc. IEEE, Vol. 108, 6 (2020), 856--893.
[5]
Esra Ataer-Cansizoglu and Michael Jones. 2018. Super-resolution of Very Low-Resolution Faces from Videos. In British Machine Vision Conference .
[6]
Frank Bossen et al. 2013. Common test conditions and software reference configurations. JCTVC-L1100, Vol. 12 (2013).
[7]
Lukas Cavigelli, Pascal Hager, and Luca Benini. 2017. CAS-CNN: A deep convolutional neural network for image compression artifact suppression. In 2017 International Joint Conference on Neural Networks. IEEE, 752--759.
[8]
Huibin Chang, Michael K Ng, and Tieyong Zeng. 2013. Reducing artifacts in JPEG decompression via a learned dictionary. IEEE transactions on signal processing, Vol. 62, 3 (2013), 718--728.
[9]
Yu Chen, Ying Tai, Xiaoming Liu, Chunhua Shen, and Jian Yang. 2018. Fsrnet: End-to-end learning face super-resolution with facial priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2492--2501.
[10]
Kai Cui and Eckehard G Steinbach. 2018. Decoder Side Image Quality Enhancement exploiting Inter-channel Correlation in a 3-stage CNN: Submission to CLIC 2018. In CVPR Workshops. 2571--2574.
[11]
Yuanying Dai, Dong Liu, and Feng Wu. 2017. A convolutional neural network approach for post-processing in HEVC intra coding. In International Conference on Multimedia Modeling. Springer, 28--39.
[12]
Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2015. Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE International Conference on Computer Vision. 576--584.
[13]
Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2007. Pointwise shape-adaptive DCT for high-quality denoising and deblocking of grayscale and color images. IEEE transactions on image processing, Vol. 16, 5 (2007), 1395--1411.
[14]
Z. Guan, Q. Xing, M. Xu, R. Yang, T. Liu, and Z. Wang. 2019. MFQE 2.0: A New Approach for Multi-frame Quality Enhancement on Compressed Video. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019), 1--16. https://doi.org/10.1109/TPAMI.2019.2944806
[15]
Jun Guo and Hongyang Chao. 2016. Building dual-domain representations for compression artifacts reduction. In European Conference on Computer Vision. Springer, 628--644.
[16]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In CVPR, Vol. 1. 3.
[17]
Jeremy Jancsary, Sebastian Nowozin, and Carsten Rother. 2012. Loss-specific training of non-parametric image restoration models: A new state of the art. In European Conference on Computer Vision. Springer, 112--125.
[18]
Lai Jiang, Mai Xu, Tie Liu, Minglang Qiao, and Zulin Wang. 2018. Deepvs: A deep learning based video saliency prediction approach. In Proceedings of the european conference on computer vision (eccv). 602--617.
[19]
Zhipeng Jin, Ping An, Chao Yang, and Liquan Shen. 2018. Quality enhancement for intra frame coding via cnns: An adversarial approach. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 1368--1372.
[20]
Amin Jourabloo, Mao Ye, Xiaoming Liu, and Liu Ren. 2017. Pose-invariant face alignment with a single cnn. In Proceedings of the IEEE International Conference on Computer Vision. 3200--3209.
[21]
Cheolkon Jung, Licheng Jiao, Hongtao Qi, and Tian Sun. 2012. Image deblocking via sparse representation. Signal Processing: Image Communication, Vol. 27, 6 (2012), 663--677.
[22]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[23]
Ke Li, Bahetiyaer Bare, and Bo Yan. 2017a. An efficient deep convolutional neural networks model for compressed image deblocking. In 2017 IEEE International Conference on Multimedia and Expo. IEEE, 1320--1325.
[24]
Xiaoming Li, Ming Liu, Yuting Ye, Wangmeng Zuo, Liang Lin, and Ruigang Yang. 2018. Learning warped guidance for blind face restoration. In Proceedings of the European Conference on Computer Vision. 272--289.
[25]
Yijun Li, Sifei Liu, Jimei Yang, and Ming-Hsuan Yang. 2017b. Generative face completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3911--3919.
[26]
AW-C Liew and Hong Yan. 2004. Blocking artifacts suppression in block-coded images using overcomplete wavelet representation. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 14, 4 (2004), 450--461.
[27]
Jianxin Lin, Tiankuang Zhou, and Zhibo Chen. 2018. Multi-scale face restoration with sequential gating ensemble network. In Thirty-Second AAAI Conference on Artificial Intelligence .
[28]
Feng Liu, Dan Zeng, Qijun Zhao, and Xiaoming Liu. 2016. Joint face alignment and 3D face reconstruction. In European Conference on Computer Vision. Springer, 545--560.
[29]
Tie Liu, Mai Xu, and Zulin Wang. 2019. Removing rain in videos: a large-scale database and a two-stream ConvLSTM approach. In 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 664--669.
[30]
Xinyu Liu, Zongliang Gan, and Feng Liu. 2018. Hierarchical Subspace Regression for Compressed Face Image Restoration. In 2018 10th International Conference on Wireless Communications and Signal Processing. IEEE, 1--6.
[31]
Yufan Liu, Songyang Zhang, Mai Xu, and Xuming He. 2017. Predicting salient face in multiple-face videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4420--4428.
[32]
Woon-Sung Park and Munchurl Kim. 2016. CNN-based in-loop filtering for coding efficiency improvement. In 2016 IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop. IEEE, 1--5.
[33]
H. Schwarz, D. Marpe, and T. Wiegand. 2007. Overview of the Scalable Video Coding Extension of the H.264/AVC Standard. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 17, 9 (2007), 1103--1120.
[34]
Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on circuits and systems for video technology, Vol. 22, 12 (2012), 1649--1668.
[35]
Ying Tai, Jian Yang, Xiaoming Liu, and Chunyan Xu. 2017. Memnet: A persistent memory network for image restoration. In Proceedings of the IEEE international conference on computer vision. 4539--4547.
[36]
Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, and Lior Wolf. 2014. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1701--1708.
[37]
Xu Tang, Daniel K Du, Zeqiang He, and Jingtuo Liu. 2018. Pyramidbox: A context-assisted single shot face detector. In Proceedings of the European Conference on Computer Vision. 797--813.
[38]
Mousa et al. 2015. ROI encryption for the HEVC coded video contents. In ICIP. IEEE.
[39]
Evgeniya Ustinova and Victor Lempitsky. 2017. Deep multi-frame face super-resolution. arXiv preprint arXiv:1709.03196 (2017).
[40]
Ci Wang, Jun Zhou, and Shu Liu. 2013. Adaptive non-local means filter for image deblocking. Signal Processing: Image Communication, Vol. 28, 5 (2013), 522--530.
[41]
Tingting Wang, Mingjin Chen, and Hongyang Chao. 2017. A novel deep learning-based method of improving coding efficiency from the decoder-end for HEVC. In 2017 Data Compression Conference. IEEE, 410--419.
[42]
Xiaogang Wang and Xiaoou Tang. 2005. Hallucinating face by eigentransformation. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 35, 3 (2005), 425--434.
[43]
Zhangyang Wang, Ding Liu, Shiyu Chang, Qing Ling, Yingzhen Yang, and Thomas S Huang. 2016. D3: Deep dual-domain based fast restoration of jpeg-compressed images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2764--2772.
[44]
Thomas Wiegand, Gary J Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H. 264/AVC video coding standard. IEEE Transactions on circuits and systems for video technology, Vol. 13, 7 (2003), 560--576.
[45]
Jingwei Xin, Nannan Wang, Xinbo Gao, and Jie Li. 2019. Residual Attribute Attention Network for Face Image Super-Resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9054--9061.
[46]
Chih-Yuan Yang, Sifei Liu, and Ming-Hsuan Yang. 2013. Structured face hallucination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1099--1106.
[47]
Jian Yang, Lei Luo, Jianjun Qian, Ying Tai, Fanlong Zhang, and Yong Xu. 2016. Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 1 (2016), 156--171.
[48]
R. Yang, M. Xu, T. Liu, Z. Wang, and Z. Guan. 2019. Enhancing Quality for HEVC Compressed Videos. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 29, 7 (July 2019), 2039--2054.
[49]
Ren Yang, Mai Xu, and Zulin Wang. 2017. Decoder-side HEVC quality enhancement with scalable convolutional neural network. In 2017 IEEE International Conference on Multimedia and Expo. IEEE, 817--822.
[50]
Ren Yang, Mai Xu, Zulin Wang, and Tianyi Li. 2018. Multi-frame quality enhancement for compressed video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6664--6673.
[51]
Jaeyoung Yoo, Sang-ho Lee, and Nojun Kwak. 2018. Image restoration by estimating frequency distribution of local patches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6684--6692.
[52]
Tomonari Yoshida, Tomokazu Takahashi, Daisuke Deguchi, Ichiro Ide, and Hiroshi Murase. 2012. Robust face super-resolution using free-form deformations for low-quality surveillance video. In 2012 IEEE International Conference on Multimedia and Expo. IEEE, 368--373.
[53]
Jiangang Yu and Bir Bhanu. 2008. Super-resolution of facial images in video with expression changes. In 2008 IEEE Fifth International Conference on Advanced Video and Signal Based Surveillance. IEEE, 184--191.
[54]
Xin Yu, Basura Fernando, Richard Hartley, and Fatih Porikli. 2018. Super-resolving very low-resolution face images with supplementary attributes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 908--917.
[55]
Xin Yu and Fatih Porikli. 2016. Ultra-resolving face images by discriminative generative networks. In European conference on computer vision. Springer, 318--333.
[56]
Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. 2017. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, Vol. 26, 7 (2017), 3142--3155.

Cited By

View all
  • (2023)Progressive Motion Boosting for Video Frame InterpolationIEEE Transactions on Multimedia10.1109/TMM.2022.323331025(8076-8090)Online publication date: 2023
  • (2022)MRS-Net+ for Enhancing Face Quality of Compressed VideosIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2021.310351932:5(2881-2894)Online publication date: May-2022
  • (2021)Spatial Attention-Based Non-Reference Perceptual Quality Prediction Network for Omnidirectional Images2021 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME51207.2021.9428390(1-6)Online publication date: 5-Jul-2021

Index Terms

  1. MRS-Net: Multi-Scale Recurrent Scalable Network for Face Quality Enhancement of Compressed Videos

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '20: Proceedings of the 28th ACM International Conference on Multimedia
    October 2020
    4889 pages
    ISBN:9781450379885
    DOI:10.1145/3394171
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. database
    2. face quality enhancement
    3. scalable structure

    Qualifiers

    • Research-article

    Funding Sources

    • NSFC

    Conference

    MM '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Progressive Motion Boosting for Video Frame InterpolationIEEE Transactions on Multimedia10.1109/TMM.2022.323331025(8076-8090)Online publication date: 2023
    • (2022)MRS-Net+ for Enhancing Face Quality of Compressed VideosIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2021.310351932:5(2881-2894)Online publication date: May-2022
    • (2021)Spatial Attention-Based Non-Reference Perceptual Quality Prediction Network for Omnidirectional Images2021 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME51207.2021.9428390(1-6)Online publication date: 5-Jul-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media