skip to main content
10.1145/3607541.3616809acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Taming Vector-Wise Quantization for Wide-Range Image Blending with Smooth Transition

Published:29 October 2023Publication History

ABSTRACT

Wide-range image blending is a novel image processing technique that merges two different images into a panorama with a transition region. Conventional image inpainting and outpainting methods have been used to complete this task, but always create significant distorted and blurry structures. The State-Of-The-Art (SOTA) method uses a U-Net-like model with a feature prediction module for content inference. However, it fails to generate panoramas with smooth transitions and visual realness, particularly when the input images have distinct scenery features. It indicates that the predicted features may deviate from the natural latent distribution of authentic images. In this paper, we propose an effective deep-learning model that integrates vector-wise quantization for feature prediction. This approach searches for the most-like latent features from a discrete codebook, resulting in high-quality wide-range image blending. In addition, we propose to use the global-local discriminator for adversarial training to improve the predicted content quality and smooth the transition. Our experiments demonstrate that our method generates visually appealing panoramic images and outperforms baseline approaches on the Scenery6000 dataset.

References

  1. Naoufal Amrani, Joan Serra-Sagristà, Pascal Peter, and Joachim Weickert. 2017. Diffusion-based inpainting for coding remote-sensing data. IEEE Geoscience and Remote Sensing Letters , Vol. 14, 8 (2017), 1203--1207.Google ScholarGoogle ScholarCross RefCross Ref
  2. Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester. 2000. Image inpainting. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques. 417--424.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Mikołaj Bi'nkowski, Danica J Sutherland, Michael Arbel, and Arthur Gretton. 2018. Demystifying mmd gans. arXiv preprint arXiv:1801.01401 (2018).Google ScholarGoogle Scholar
  4. Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, and Ming-Hsuan Yang. 2022. InOut: Diverse Image Outpainting via GAN Inversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11431--11440.Google ScholarGoogle ScholarCross RefCross Ref
  5. Maxime Daisy, David Tschumperlé, and Olivier Lézoray. 2013. A fast spatial patch blending algorithm for artefact reduction in pattern-based image inpainting. In SIGGRAPH Asia 2013 Technical Briefs. 1--4.Google ScholarGoogle Scholar
  6. Patrick Esser, Robin Rombach, and Bjorn Ommer. 2021. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12873--12883.Google ScholarGoogle ScholarCross RefCross Ref
  7. Penglei Gao, Xi Yang, Rui Zhang, Kaizhu Huang, and Yujie Geng. 2022. Generalised Image Outpainting with U-Transformer. arXiv preprint arXiv:2201.11403 (2022).Google ScholarGoogle Scholar
  8. Dongsheng Guo, Hongzhi Liu, Haoru Zhao, Yunhao Cheng, Qingwei Song, Zhaorui Gu, Haiyong Zheng, and Bing Zheng. 2020. Spiral generative network for image extrapolation. In European Conference on Computer Vision. Springer, 701--717.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Qiang Guo, Shanshan Gao, Xiaofeng Zhang, Yilong Yin, and Caiming Zhang. 2017. Patch-based image inpainting via two-stage low rank approximation. IEEE transactions on visualization and computer graphics, Vol. 24, 6 (2017), 2023--2036.Google ScholarGoogle Scholar
  10. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems , Vol. 30 (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG) , Vol. 36, 4 (2017), 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Haodong Li, Weiqi Luo, and Jiwu Huang. 2017. Localization of diffusion-based inpainting in digital images. IEEE transactions on information forensics and security, Vol. 12, 12 (2017), 3050--3064.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Han Lin, Maurice Pagnucco, and Yang Song. 2021. Edge guided progressively generative image outpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 806--815.Google ScholarGoogle ScholarCross RefCross Ref
  14. Guilin Liu, Fitsum A Reda, Kevin J Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European conference on computer vision (ECCV). 85--100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hongyu Liu, Bin Jiang, Yi Xiao, and Chao Yang. 2019. Coherent semantic attention for image inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4170--4179.Google ScholarGoogle ScholarCross RefCross Ref
  16. Chia-Ni Lu, Ya-Chu Chang, and Wei-Chen Chiu. 2021. Bridging the visual gap: Wide-range image blending. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 843--851.Google ScholarGoogle ScholarCross RefCross Ref
  17. Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2794--2802.Google ScholarGoogle ScholarCross RefCross Ref
  18. Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. 2016. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2536--2544.Google ScholarGoogle ScholarCross RefCross Ref
  19. Yurui Ren, Xiaoming Yu, Ruonan Zhang, Thomas H Li, Shan Liu, and Ge Li. 2019. Structureflow: Image inpainting via structure-aware appearance flow. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 181--190.Google ScholarGoogle ScholarCross RefCross Ref
  20. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.Google ScholarGoogle ScholarCross RefCross Ref
  21. Tijana Ruvz ić and Aleksandra Pivz urica. 2014. Context-aware patch-based image inpainting using Markov random field modeling. IEEE transactions on image processing , Vol. 24, 1 (2014), 444--456.Google ScholarGoogle Scholar
  22. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR , Vol. abs/1409.1556 (2015).Google ScholarGoogle Scholar
  23. Aaron Van Den Oord, Oriol Vinyals, et al. 2017. Neural discrete representation learning. Advances in neural information processing systems , Vol. 30 (2017).Google ScholarGoogle Scholar
  24. Ziyu Wan, Jingbo Zhang, Dongdong Chen, and Jing Liao. 2021. High-fidelity pluralistic image completion with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4692--4701.Google ScholarGoogle ScholarCross RefCross Ref
  25. Yi Wang, Xin Tao, Xiaoyong Shen, and Jiaya Jia. 2019. Wide-context semantic image extrapolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1399--1408.Google ScholarGoogle ScholarCross RefCross Ref
  26. Huikai Wu, Shuai Zheng, Junge Zhang, and Kaiqi Huang. 2019. Gp-gan: Towards realistic high-resolution image blending. In Proceedings of the 27th ACM international conference on multimedia. 2487--2495.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, and Shuicheng Yan. 2019. Very long natural scenery image prediction by outpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10561--10570.Google ScholarGoogle ScholarCross RefCross Ref
  28. Zili Yi, Qiang Tang, Shekoofeh Azizi, Daesik Jang, and Zhan Xu. 2020. Contextual residual aggregation for ultra high-resolution image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7508--7517.Google ScholarGoogle ScholarCross RefCross Ref
  29. Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2018. Generative image inpainting with contextual attention. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5505--5514.Google ScholarGoogle ScholarCross RefCross Ref
  30. Yanhong Zeng, Jianlong Fu, Hongyang Chao, and Baining Guo. 2019. Learning pyramid-context encoder network for high-quality image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1486--1494.Google ScholarGoogle ScholarCross RefCross Ref
  31. Yu Zeng, Zhe Lin, Jimei Yang, Jianming Zhang, Eli Shechtman, and Huchuan Lu. 2020. High-resolution image inpainting with iterative confidence feedback and guided upsampling. In European conference on computer vision. Springer, 1--17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586--595.Google ScholarGoogle ScholarCross RefCross Ref
  33. Chuanxia Zheng, Tat-Jen Cham, and Jianfei Cai. 2019. Pluralistic image completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1438--1447. ioGoogle ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Taming Vector-Wise Quantization for Wide-Range Image Blending with Smooth Transition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      McGE '23: Proceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice
      October 2023
      151 pages
      ISBN:9798400702785
      DOI:10.1145/3607541
      • General Chairs:
      • Cheng Jin,
      • Liang He,
      • Mingli Song,
      • Rui Wang

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia
    • Article Metrics

      • Downloads (Last 12 months)34
      • Downloads (Last 6 weeks)5

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader