research-article

Peering into The Sketch: Ultra-Low Bitrate Face Compression for Joint Human and Machine Perception

Authors:

Dapeng WuAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 2564 - 2572

https://doi.org/10.1145/3581783.3613799

Published: 27 October 2023 Publication History

Abstract

We propose a novel face compression framework that leverages the external priors for joint human and machine perception under ultra-low bitrate scenarios. The proposed framework leverages the semantic richness of face images by representing the faces into sketches and thumbnails, resulting in improved bitrate utility for both human and machine vision. At the decoder side, the framework introduces a two-stage generative reconstruction, which faithfully enhances the reconstructed image via semi-parametric modeling and retrieved guidance from the external database. In particular, this coarse-to-fine strategy also results in improved identity consistency and analysis performance of the reconstructed image. Extensive evaluations of the proposed method have been conducted on the public face dataset by comparing it with end-to-end image compression techniques as well as traditional image compression standards. The experimental results demonstrate the effectiveness of the proposed method via superior perceptual and analytical performance under ultra-low bitrate conditions.

References

[1]

Alankrita Aggarwal, Mamta Mittal, and Gopi Battineni. 2021. Generative adversarial network: An overview of theory and applications. International Journal of Information Management Data Insights, Vol. 1, 1 (2021), 100004. https://doi.org/10.1016/j.jjimei.2020.100004

[2]

Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, and Luc Van Gool. 2019. Generative Adversarial Networks for Extreme Learned Image Compression. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 221--231. https://doi.org/10.1109/ICCV.2019.00031

[3]

Lynton Ardizzone, Jakob Kruse, Carsten Rother, and Ullrich Köthe. [n.,d.]. Analyzing Inverse Problems with Invertible Neural Networks. In International Conference on Learning Representations.

[4]

Johannes Ballé, Valero Laparra, and Eero P Simoncelli. 2016. End-to-end optimization of nonlinear transform codes for perceptual quality. In 2016 Picture Coding Symposium (PCS). IEEE, 1--5.

[5]

Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. 2018. Variational image compression with a scale hyperprior. arxiv: 1802.01436 [eess.IV]

[6]

Jean Bé gaint, Fabien Racapé, Simon Feltman, and Akshay Pushparaja. 2020. CompressAI: a PyTorch library and evaluation platform for end-to-end compression research. CoRR, Vol. abs/2011.03029 (2020). showeprint[arXiv]2011.03029 https://arxiv.org/abs/2011.03029

[7]

Fabrice Bellard. [n.,d.]. BPG Image Format. https://bellard.org/bpg/2015.

[8]

Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Jianle Chen, Gary J. Sullivan, and Jens-Rainer Ohm. 2021. Overview of the Versatile Video Coding (VVC) Standard and its Applications. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 31, 10 (2021), 3736--3764. https://doi.org/10.1109/TCSVT.2021.3101953

[9]

Jianhui Chang, Jian Zhang, Youmin Xu, Jiguo Li, Siwei Ma, and Wen Gao. 2022a. Consistency-Contrast Learning for Conceptual Coding. In Proceedings of the 30th ACM International Conference on Multimedia. 2681--2690.

Digital Library

[10]

Jianhui Chang, Zhenghui Zhao, Chuanmin Jia, Shiqi Wang, Lingbo Yang, Qi Mao, Jian Zhang, and Siwei Ma. 2022b. Conceptual compression via deep structure and texture synthesis. IEEE Transactions on Image Processing, Vol. 31 (2022), 2809--2823.

[11]

Jianhui Chang, Zhenghui Zhao, Chuanmin Jia, Shiqi Wang, Lingbo Yang, Qi Mao, Jian Zhang, and Siwei Ma. 2022c. Conceptual Compression via Deep Structure and Texture Synthesis. IEEE Transactions on Image Processing, Vol. 31 (2022), 2809--2823. https://doi.org/10.1109/TIP.2022.3159477

[12]

Jianhui Chang, Zhenghui Zhao, Lingbo Yang, Chuanmin Jia, Jian Zhang, and Siwei Ma. 2021. Thousand to one: Semantic prior modeling for conceptual coding. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.

[13]

Bolin Chen, Zhao Wang, Binzhe Li, Rongqun Lin, Shiqi Wang, and Yan Ye. 2022. Beyond keypoint coding: Temporal evolution inference with compact feature representation for talking face video compression. In 2022 Data Compression Conference (DCC). IEEE, 13--22.

[14]

Bolin Chen, Zhao Wang, Binzhe Li, Shiqi Wang, and Yan Ye. 2023. Compact Temporal Trajectory Representation for Talking Face Video Compression. IEEE Transactions on Circuits and Systems for Video Technology (2023), 1-1. https://doi.org/10.1109/TCSVT.2023.3271130

Digital Library

[15]

Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2020. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7939--7948.

[16]

Hyomin Choi and Ivan V Bajić. 2022. Scalable image coding for humans and machines. IEEE Transactions on Image Processing, Vol. 31 (2022), 2739--2754.

Digital Library

[17]

Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision. 764--773.

[18]

Jiankang Deng, Jia Guo, Xue Niannan, and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In CVPR.

[19]

Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. 2020. Image quality assessment: Unifying structure and texture similarity. IEEE transactions on pattern analysis and machine intelligence, Vol. 44, 5 (2020), 2567--2581.

[20]

Lingyu Duan, Jiaying Liu, Wenhan Yang, Tiejun Huang, and Wen Gao. 2020. Video coding for machines: A paradigm of collaborative compression and intelligent analytics. IEEE Transactions on Image Processing, Vol. 29 (2020), 8680--8695.

Digital Library

[21]

Jie Gui, Zhenan Sun, Yonggang Wen, Dacheng Tao, and Jieping Ye. 2021. A review on generative adversarial networks: Algorithms, theory, and applications. IEEE transactions on knowledge and data engineering (2021).

Digital Library

[22]

Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, and Hongwei Qin. 2021. Checkerboard context model for efficient learned image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14771--14780.

[23]

Yueyu Hu, Shuai Yang, Wenhan Yang, Ling-Yu Duan, and Jiaying Liu. 2020. Towards coding for human and machine vision: A scalable image coding approach. In 2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.

[24]

Joint Video Experts Team (JVET). [n.,d.]. VVC Official Test Model VTM. https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tree/VTM-20.0?ref_type=tags (13 March 2023).

[25]

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. [n.,d.]. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations.

[26]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110--8119.

[27]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[28]

Jooyoung Lee, Seunghyun Cho, Seyoon Jeong, Hyoungjin Kwon, Hyunsuk Ko, Hui Yong Kim, and Jin Soo Choi. 2019. Extended End-to-End optimized Image Compression Method based on a Context-Adaptive Entropy Model. In CVPR Workshops. 0.

[29]

Yikang Li, Tao Ma, Yeqi Bai, Nan Duan, Sining Wei, and Xiaogang Wang. 2019. Pastegan: A semi-parametric method to generate image from scene graph. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[30]

Yang Li, Shiqi Wang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Yue Wang. 2021. Quality assessment of end-to-end learned image compression: The benchmark and objective measure. In Proceedings of the 29th ACM International Conference on Multimedia. 4297--4305.

Digital Library

[31]

Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. 2019. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019).

[32]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).

Digital Library

[33]

Ming Lu and Zhan Ma. 2022. High-Efficiency Lossy Image Coding Through Adaptive Neighborhood Information Aggregation. arXiv preprint arXiv:2204.11448 (2022).

[34]

David Minnen, Johannes Ballé, and George D Toderici. 2018. Joint autoregressive and hierarchical priors for learned image compression. Advances in neural information processing systems, Vol. 31 (2018).

[35]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, Vol. 32 (2019).

[36]

Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2021. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2287--2296.

[37]

Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019. First order motion model for image animation. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[38]

Edgar Simo-Serra, Satoshi Iizuka, Kazuma Sasaki, and Hiroshi Ishikawa. 2016. Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup. ACM Transactions on Graphics (SIGGRAPH), Vol. 35, 4 (2016).

Digital Library

[39]

David S Taubman, Michael W Marcellin, and Majid Rabbani. 2002. JPEG2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging, Vol. 11, 2 (2002), 286--287.

[40]

Gregory K Wallace. 1991. The JPEG still picture compression standard. Commun. ACM, Vol. 34, 4 (1991), 30--44.

Digital Library

[41]

Shurun Wang, Shiqi Wang, Wenhan Yang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Wen Gao. 2021b. Towards analysis-friendly face representation with scalable feature and texture compression. IEEE Transactions on Multimedia, Vol. 24 (2021), 3169--3181.

Digital Library

[42]

Shurun Wang, Zhao Wang, Shiqi Wang, and Yan Ye. 2022. Deep Image Compression Towards Machine Vision: A Unified Optimization Framework. IEEE Transactions on Circuits and Systems for Video Technology (2022).

[43]

Ting-Chun Wang, Arun Mallya, and Ming-Yu Liu. 2021a. One-shot free-view neural talking-head synthesis for video conferencing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10039--10049.

[44]

Saining Xie and Zhuowen Tu. 2015. Holistically-Nested Edge Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).

Digital Library

[45]

Yueqi Xie, Ka Leong Cheng, and Qifeng Chen. 2021. Enhanced invertible encoding for learned image compression. In Proceedings of the 29th ACM international conference on multimedia. 162--170.

Digital Library

[46]

Shuai Yang, Yueyu Hu, Wenhan Yang, Ling-Yu Duan, and Jiaying Liu. 2021. Towards coding for human and machine vision: Scalable face image coding. IEEE Transactions on Multimedia, Vol. 23 (2021), 2957--2971.

[47]

Huanjing Yue, Xiaoyan Sun, Jingyu Yang, and Feng Wu. 2013a. Cloud-based image coding for mobile devices-Toward thousands to one compression. IEEE transactions on multimedia, Vol. 15, 4 (2013), 845--857.

[48]

Huanjing Yue, Xiaoyan Sun, Jingyu Yang, and Feng Wu. 2013b. Landmark image super-resolution by retrieving web images. IEEE Transactions on Image Processing, Vol. 22, 12 (2013), 4865--4878.

Digital Library

[49]

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters, Vol. 23, 10 (2016), 1499--1503.

[50]

Michael Zhang, James Lucas, Jimmy Ba, and Geoffrey E Hinton. 2019. Lookahead optimizer: k steps forward, 1 step back. Advances in neural information processing systems, Vol. 32 (2019).

[51]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586--595.

Cited By

Chen BYin SChen PWang SYe Y(2024)Generative Visual Compression: A Review2024 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP51287.2024.10647820(3709-3715)Online publication date: 27-Oct-2024
https://doi.org/10.1109/ICIP51287.2024.10647820

Index Terms

Peering into The Sketch: Ultra-Low Bitrate Face Compression for Joint Human and Machine Perception
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction
      2. Computer vision representations
        Image representations

Recommendations

Low-complexity and low-memory entropy coder for image compression

A low-complexity and low-memory entropy coder (LLEC) is proposed for image compression. The two key elements in the LLEC are zerotree coding and Golomb-Rice (1966, 1991) codes. Zerotree coding exploits the zerotree structure of transformed coefficients ...
Mixed-resolution HEVC based multiview video codec for low bitrate transmission

There has been increasing demand for multiview video transmission over band limited channel over past years and various techniques have been proposed to fulfil this need. In this paper, a High Efficiency Video Codec (HEVC) based spatial resolution ...
Variable bit allocation method based on meta-heuristic algorithms for facial image compression
Abstract
High spatial resolution is one of the most important factors in increasing image quality, but it increases the amount of storage memory. On the other hand, in the field of face image compression research, one of the existing challenges is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the Science Technology and Innovation Committee of Shenzhen Municipality
the National Natural Science Foundation of China
the Hong Kong Research Grants Council General Research Fund (GRF)

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
235
Total Downloads

Downloads (Last 12 months)146
Downloads (Last 6 weeks)10

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen BYin SChen PWang SYe Y(2024)Generative Visual Compression: A Review2024 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP51287.2024.10647820(3709-3715)Online publication date: 27-Oct-2024
https://doi.org/10.1109/ICIP51287.2024.10647820

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents