skip to main content
10.1145/3581783.3613799acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Peering into The Sketch: Ultra-Low Bitrate Face Compression for Joint Human and Machine Perception

Published: 27 October 2023 Publication History

Abstract

We propose a novel face compression framework that leverages the external priors for joint human and machine perception under ultra-low bitrate scenarios. The proposed framework leverages the semantic richness of face images by representing the faces into sketches and thumbnails, resulting in improved bitrate utility for both human and machine vision. At the decoder side, the framework introduces a two-stage generative reconstruction, which faithfully enhances the reconstructed image via semi-parametric modeling and retrieved guidance from the external database. In particular, this coarse-to-fine strategy also results in improved identity consistency and analysis performance of the reconstructed image. Extensive evaluations of the proposed method have been conducted on the public face dataset by comparing it with end-to-end image compression techniques as well as traditional image compression standards. The experimental results demonstrate the effectiveness of the proposed method via superior perceptual and analytical performance under ultra-low bitrate conditions.

References

[1]
Alankrita Aggarwal, Mamta Mittal, and Gopi Battineni. 2021. Generative adversarial network: An overview of theory and applications. International Journal of Information Management Data Insights, Vol. 1, 1 (2021), 100004. https://doi.org/10.1016/j.jjimei.2020.100004
[2]
Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, and Luc Van Gool. 2019. Generative Adversarial Networks for Extreme Learned Image Compression. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 221--231. https://doi.org/10.1109/ICCV.2019.00031
[3]
Lynton Ardizzone, Jakob Kruse, Carsten Rother, and Ullrich Köthe. [n.,d.]. Analyzing Inverse Problems with Invertible Neural Networks. In International Conference on Learning Representations.
[4]
Johannes Ballé, Valero Laparra, and Eero P Simoncelli. 2016. End-to-end optimization of nonlinear transform codes for perceptual quality. In 2016 Picture Coding Symposium (PCS). IEEE, 1--5.
[5]
Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. 2018. Variational image compression with a scale hyperprior. arxiv: 1802.01436 [eess.IV]
[6]
Jean Bé gaint, Fabien Racapé, Simon Feltman, and Akshay Pushparaja. 2020. CompressAI: a PyTorch library and evaluation platform for end-to-end compression research. CoRR, Vol. abs/2011.03029 (2020). showeprint[arXiv]2011.03029 https://arxiv.org/abs/2011.03029
[7]
Fabrice Bellard. [n.,d.]. BPG Image Format. https://bellard.org/bpg/2015.
[8]
Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Jianle Chen, Gary J. Sullivan, and Jens-Rainer Ohm. 2021. Overview of the Versatile Video Coding (VVC) Standard and its Applications. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 31, 10 (2021), 3736--3764. https://doi.org/10.1109/TCSVT.2021.3101953
[9]
Jianhui Chang, Jian Zhang, Youmin Xu, Jiguo Li, Siwei Ma, and Wen Gao. 2022a. Consistency-Contrast Learning for Conceptual Coding. In Proceedings of the 30th ACM International Conference on Multimedia. 2681--2690.
[10]
Jianhui Chang, Zhenghui Zhao, Chuanmin Jia, Shiqi Wang, Lingbo Yang, Qi Mao, Jian Zhang, and Siwei Ma. 2022b. Conceptual compression via deep structure and texture synthesis. IEEE Transactions on Image Processing, Vol. 31 (2022), 2809--2823.
[11]
Jianhui Chang, Zhenghui Zhao, Chuanmin Jia, Shiqi Wang, Lingbo Yang, Qi Mao, Jian Zhang, and Siwei Ma. 2022c. Conceptual Compression via Deep Structure and Texture Synthesis. IEEE Transactions on Image Processing, Vol. 31 (2022), 2809--2823. https://doi.org/10.1109/TIP.2022.3159477
[12]
Jianhui Chang, Zhenghui Zhao, Lingbo Yang, Chuanmin Jia, Jian Zhang, and Siwei Ma. 2021. Thousand to one: Semantic prior modeling for conceptual coding. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.
[13]
Bolin Chen, Zhao Wang, Binzhe Li, Rongqun Lin, Shiqi Wang, and Yan Ye. 2022. Beyond keypoint coding: Temporal evolution inference with compact feature representation for talking face video compression. In 2022 Data Compression Conference (DCC). IEEE, 13--22.
[14]
Bolin Chen, Zhao Wang, Binzhe Li, Shiqi Wang, and Yan Ye. 2023. Compact Temporal Trajectory Representation for Talking Face Video Compression. IEEE Transactions on Circuits and Systems for Video Technology (2023), 1-1. https://doi.org/10.1109/TCSVT.2023.3271130
[15]
Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2020. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7939--7948.
[16]
Hyomin Choi and Ivan V Bajić. 2022. Scalable image coding for humans and machines. IEEE Transactions on Image Processing, Vol. 31 (2022), 2739--2754.
[17]
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision. 764--773.
[18]
Jiankang Deng, Jia Guo, Xue Niannan, and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In CVPR.
[19]
Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. 2020. Image quality assessment: Unifying structure and texture similarity. IEEE transactions on pattern analysis and machine intelligence, Vol. 44, 5 (2020), 2567--2581.
[20]
Lingyu Duan, Jiaying Liu, Wenhan Yang, Tiejun Huang, and Wen Gao. 2020. Video coding for machines: A paradigm of collaborative compression and intelligent analytics. IEEE Transactions on Image Processing, Vol. 29 (2020), 8680--8695.
[21]
Jie Gui, Zhenan Sun, Yonggang Wen, Dacheng Tao, and Jieping Ye. 2021. A review on generative adversarial networks: Algorithms, theory, and applications. IEEE transactions on knowledge and data engineering (2021).
[22]
Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, and Hongwei Qin. 2021. Checkerboard context model for efficient learned image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14771--14780.
[23]
Yueyu Hu, Shuai Yang, Wenhan Yang, Ling-Yu Duan, and Jiaying Liu. 2020. Towards coding for human and machine vision: A scalable image coding approach. In 2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.
[24]
Joint Video Experts Team (JVET). [n.,d.]. VVC Official Test Model VTM. https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tree/VTM-20.0?ref_type=tags (13 March 2023).
[25]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. [n.,d.]. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations.
[26]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110--8119.
[27]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[28]
Jooyoung Lee, Seunghyun Cho, Seyoon Jeong, Hyoungjin Kwon, Hyunsuk Ko, Hui Yong Kim, and Jin Soo Choi. 2019. Extended End-to-End optimized Image Compression Method based on a Context-Adaptive Entropy Model. In CVPR Workshops. 0.
[29]
Yikang Li, Tao Ma, Yeqi Bai, Nan Duan, Sining Wei, and Xiaogang Wang. 2019. Pastegan: A semi-parametric method to generate image from scene graph. Advances in Neural Information Processing Systems, Vol. 32 (2019).
[30]
Yang Li, Shiqi Wang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Yue Wang. 2021. Quality assessment of end-to-end learned image compression: The benchmark and objective measure. In Proceedings of the 29th ACM International Conference on Multimedia. 4297--4305.
[31]
Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. 2019. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019).
[32]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).
[33]
Ming Lu and Zhan Ma. 2022. High-Efficiency Lossy Image Coding Through Adaptive Neighborhood Information Aggregation. arXiv preprint arXiv:2204.11448 (2022).
[34]
David Minnen, Johannes Ballé, and George D Toderici. 2018. Joint autoregressive and hierarchical priors for learned image compression. Advances in neural information processing systems, Vol. 31 (2018).
[35]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, Vol. 32 (2019).
[36]
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2021. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2287--2296.
[37]
Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019. First order motion model for image animation. Advances in Neural Information Processing Systems, Vol. 32 (2019).
[38]
Edgar Simo-Serra, Satoshi Iizuka, Kazuma Sasaki, and Hiroshi Ishikawa. 2016. Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup. ACM Transactions on Graphics (SIGGRAPH), Vol. 35, 4 (2016).
[39]
David S Taubman, Michael W Marcellin, and Majid Rabbani. 2002. JPEG2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging, Vol. 11, 2 (2002), 286--287.
[40]
Gregory K Wallace. 1991. The JPEG still picture compression standard. Commun. ACM, Vol. 34, 4 (1991), 30--44.
[41]
Shurun Wang, Shiqi Wang, Wenhan Yang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Wen Gao. 2021b. Towards analysis-friendly face representation with scalable feature and texture compression. IEEE Transactions on Multimedia, Vol. 24 (2021), 3169--3181.
[42]
Shurun Wang, Zhao Wang, Shiqi Wang, and Yan Ye. 2022. Deep Image Compression Towards Machine Vision: A Unified Optimization Framework. IEEE Transactions on Circuits and Systems for Video Technology (2022).
[43]
Ting-Chun Wang, Arun Mallya, and Ming-Yu Liu. 2021a. One-shot free-view neural talking-head synthesis for video conferencing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10039--10049.
[44]
Saining Xie and Zhuowen Tu. 2015. Holistically-Nested Edge Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[45]
Yueqi Xie, Ka Leong Cheng, and Qifeng Chen. 2021. Enhanced invertible encoding for learned image compression. In Proceedings of the 29th ACM international conference on multimedia. 162--170.
[46]
Shuai Yang, Yueyu Hu, Wenhan Yang, Ling-Yu Duan, and Jiaying Liu. 2021. Towards coding for human and machine vision: Scalable face image coding. IEEE Transactions on Multimedia, Vol. 23 (2021), 2957--2971.
[47]
Huanjing Yue, Xiaoyan Sun, Jingyu Yang, and Feng Wu. 2013a. Cloud-based image coding for mobile devices-Toward thousands to one compression. IEEE transactions on multimedia, Vol. 15, 4 (2013), 845--857.
[48]
Huanjing Yue, Xiaoyan Sun, Jingyu Yang, and Feng Wu. 2013b. Landmark image super-resolution by retrieving web images. IEEE Transactions on Image Processing, Vol. 22, 12 (2013), 4865--4878.
[49]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters, Vol. 23, 10 (2016), 1499--1503.
[50]
Michael Zhang, James Lucas, Jimmy Ba, and Geoffrey E Hinton. 2019. Lookahead optimizer: k steps forward, 1 step back. Advances in neural information processing systems, Vol. 32 (2019).
[51]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586--595.

Cited By

View all
  • (2024)Generative Visual Compression: A Review2024 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP51287.2024.10647820(3709-3715)Online publication date: 27-Oct-2024

Index Terms

  1. Peering into The Sketch: Ultra-Low Bitrate Face Compression for Joint Human and Machine Perception

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '23: Proceedings of the 31st ACM International Conference on Multimedia
      October 2023
      9913 pages
      ISBN:9798400701085
      DOI:10.1145/3581783
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 October 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. face image compression
      2. generative compression
      3. ultra-rate image compression

      Qualifiers

      • Research-article

      Funding Sources

      • the Science Technology and Innovation Committee of Shenzhen Municipality
      • the National Natural Science Foundation of China
      • the Hong Kong Research Grants Council General Research Fund (GRF)

      Conference

      MM '23
      Sponsor:
      MM '23: The 31st ACM International Conference on Multimedia
      October 29 - November 3, 2023
      Ottawa ON, Canada

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)146
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 17 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Generative Visual Compression: A Review2024 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP51287.2024.10647820(3709-3715)Online publication date: 27-Oct-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media