Image Inpainting with Semantic U-Transformer

Yuan, Lingfan; Yu, Wenxin; Che, Lu; Zhang, Zhiqiang; Chen, Shiyu; Liu, Lu; Chen, Peng

doi:10.1007/978-981-99-8181-6_7

Lingfan Yuan^10,13,
Wenxin Yu¹⁰,
Lu Che¹⁰,
Zhiqiang Zhang¹¹,
Shiyu Chen¹⁰,
Lu Liu¹⁰ &
…
Peng Chen¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1968))

Included in the following conference series:

International Conference on Neural Information Processing

484 Accesses

Abstract

With the driving force of powerful convolutional neural networks, image inpainting has made tremendous progress. Recently, transformer has demonstrated its effectiveness in various vision tasks, mainly due to its capacity to model long-term relationships. However, when it comes to image inpainting tasks, the transformer tends to fall short in terms of modeling local information, and interference from damaged regions can pose challenges. To tackle these issues, we introduce a novel Semantic U-shaped Transformer (SUT) in this work. The SUT is designed with spectral transformer blocks in its shallow layers, effectively capturing local information. Conversely, deeper layers utilize BRA transformer blocks to model global information. A key feature of the SUT is its attention mechanism, which employs bi-level routing attention. This approach significantly reduces the interference of damaged regions on overall information, making the SUT more suitable for image inpainting tasks. Experiments on several datasets indicate that the performance of the proposed method outperforms the current state-of-the-art (SOTA) inpainting approaches. In general, the PSNR of our method is on average 0.93 dB higher than SOTA, and the SSIM is higher by 0.026.

This Research is Supported by National Key Research and Development Program from Ministry of Science and Technology of the PRC (No.2018AAA0101801), (No.2021ZD0110600), Sichuan Science and Technology Program (No.2022ZYD0116), Sichuan Provincial M. C. Integration Office Program, And IEDA Laboratory Of SWUST.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ballester, C., Bertalmio, M., Caselles, V., Sapiro, G., Verdera, J.: Filling-in by joint interpolation of vector fields and gray levels. IEEE Trans. Image Process. 10(8), 1200–1211 (2001)
Article MathSciNet MATH Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Chi, L., Jiang, B., Mu, Y.: Fast fourier convolution. Adv. Neural. Inf. Process. Syst. 33, 4479–4488 (2020)
Google Scholar
Dong, Q., Cao, C., Fu, Y.: Incremental transformer structure enhanced image inpainting with masking positional encoding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11358–11368 (2022)
Google Scholar
Dong, X., et al.: Cswin transformer: a general vision transformer backbone with cross-shaped windows. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12124–12134 (2022)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423 (2016)
Google Scholar
Guo, X., Yang, H., Huang, D.: Image inpainting via conditional texture and structure dual generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14134–14143 (October 2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. (ToG) 36(4), 1–14 (2017)
Article Google Scholar
Jiang, L., Dai, B., Wu, W., Loy, C.C.: Focal frequency loss for image reconstruction and synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13919–13929 (2021)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lee, J.H., Choi, I., Kim, M.H.: Laplacian patch-based image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2727–2735 (2016)
Google Scholar
Li, S., Xue, K., Zhu, B., Ding, C., Gao, X., Wei, D., Wan, T.: Falcon: A fourier transform based approach for fast and secure convolutional neural network predictions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8705–8714 (2020)
Google Scholar
Liu, G., Reda, F.A., Shih, K.J., Wang, T.-C., Tao, A., Catanzaro, B.: Image inpainting for irregular holes using partial convolutions. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 89–105. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_6
Chapter Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Nazeri, K., Ng, E., Joseph, T., Qureshi, F.Z., Ebrahimi, M.: Edgeconnect: generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212 (2019)
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)
Google Scholar
Rao, Y., Zhao, W., Zhu, Z., Lu, J., Zhou, J.: Global filter networks for image classification. Adv. Neural. Inf. Process. Syst. 34, 980–993 (2021)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Si, C., Yu, W., Zhou, P., Zhou, Y., Wang, X., Yan, S.: Inception transformer. arXiv preprint arXiv:2205.12956 (2022)
Suvorov, R., et al.: Resolution-robust large mask inpainting with fourier convolutions. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2149–2159 (2022)
Google Scholar
Wan, Z., Zhang, J., Chen, D., Liao, J.: High-fidelity pluralistic image completion with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4692–4701 (2021)
Google Scholar
Wang, W., et al.: Crossformer: a versatile vision transformer hinging on cross-scale attention. arXiv preprint arXiv:2108.00154 (2021)
Yang, C., Lu, X., Lin, Z., Shechtman, E., Wang, O., Li, H.: High-resolution image inpainting using multi-scale neural patch synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6721–6729 (2017)
Google Scholar
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Free-form image inpainting with gated convolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4471–4480 (2019)
Google Scholar
Zheng, C., Cham, T.J., Cai, J., Phung, D.: Bridging global context interactions for high-fidelity image completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11512–11522 (2022)
Google Scholar
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2017)
Article Google Scholar
Zhu, L., Wang, X., Ke, Z., Zhang, W., Lau, R.W.: Biformer: vision transformer with bi-level routing attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10323–10333 (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

Southwest University of Science and Technology, Sichuan, China
Lingfan Yuan, Wenxin Yu, Lu Che, Shiyu Chen & Lu Liu
Hosei University, Tokyo, Japan
Zhiqiang Zhang
Chengdu Hongchengyun Technology Co., Ltd, Sichuan, China
Peng Chen
Instrumentation Technology and Economy Institute, Sichuan, People’s Republic of China
Lingfan Yuan

Authors

Lingfan Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Lu Che
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shiyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wenxin Yu or Lu Che .

Editor information

Editors and Affiliations

Scholl of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yuan, L. et al. (2024). Image Inpainting with Semantic U-Transformer. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1968. Springer, Singapore. https://doi.org/10.1007/978-981-99-8181-6_7

Download citation

DOI: https://doi.org/10.1007/978-981-99-8181-6_7
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8180-9
Online ISBN: 978-981-99-8181-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Image Inpainting with Semantic U-Transformer