L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer

Chang, Zheng; Weng, Shuchen; Li, Yu; Li, Si; Shi, Boxin

doi:10.1007/978-3-031-19797-0_21

Zheng Chang¹²,
Shuchen Weng¹³,
Yu Li¹⁴,
Si Li¹² &
…
Boxin Shi¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13678))

Included in the following conference series:

European Conference on Computer Vision

2980 Accesses
4 Citations

Abstract

Language-based colorization requires the colorized image to be consistent with the user-provided language caption. A most recent work proposes to decouple the language into color and object conditions in solving the problem. Though decent progress has been made, its performance is limited by three key issues. (i) The large gap between vision and language modalities using independent feature extractors makes it difficult to fully understand the language. (ii) The inaccurate language features are never refined by the image features such that the language may fail to colorize the image precisely. (iii) The local region does not perceive the whole image, producing global inconsistent colors. In this work, we introduce transformer into language-based colorization to tackle the aforementioned issues while keeping the language decoupling property. Our method unifies the modalities of image and language, and further performs color conditions evolving with image features in a coarse-to-fine manner. In addition, thanks to the global receptive field, our method is robust to the strong local variation. Extensive experiments demonstrate our method is able to produce realistic colorization and outperforms prior arts in terms of consistency with the caption.

Z. Chang and S. Weng—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Antic, J.: A deep learning based project for colorizing and restoring old images (and video!). https://github.com/jantic/DeOldify
Ardizzone, L., Lüth, C., Kruse, J., Rother, C., Köthe, U.: Guided image generation with conditional invertible neural networks. arXiv preprint arXiv:1907.02392 (2019)
Chen, J., Shen, Y., Gao, J., Liu, J., Liu, X.: Language-based image editing with recurrent attentive models. In: CVPR (2018)
Google Scholar
Cheng, Z., Yang, Q., Sheng, B.: Deep colorization. In: ICCV (2015)
Google Scholar
Deshpande, A., Lu, J., Yeh, M.C., Jin Chong, M., Forsyth, D.: Learning diverse image colorization. In: CVPR (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
Google Scholar
Ding, H., Liu, C., Wang, S., Jiang, X.: Vision-language transformer and query generation for referring segmentation. In: ICCV (2021)
Google Scholar
Ding, M., et al.: CogView: mastering text-to-image generation via transformers. In: NIPS (2021)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 x 16 words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
He, M., Chen, D., Liao, J., Sander, P.V., Yuan, L.: Deep exemplar-based colorization. ACM TOG 37(4), 1–16 (2018)
Google Scholar
Huynh-Thu, Q., Ghanbari, M.: Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 44(13), 800–801 (2008)
Article Google Scholar
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM ToG 35(4), 1–11 (2016)
Article Google Scholar
Kim, W., Son, B., Kim, I.: ViLT: vision-and-language transformer without convolution or region supervision. In: ICML (2021)
Google Scholar
Kumar, M., Weissenborn, D., Kalchbrenner, N.: Colorization transformer. In: ICLR (2021)
Google Scholar
Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: ECCV (2016)
Google Scholar
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: image restoration using swin transformer. In: ICCV (2021)
Google Scholar
Liu, R., ET AL.: FuseFormer: fusing fine-grained information in transformers for video inpainting. In: ICCV (2021)
Google Scholar
Lu, P., Yu, J., Peng, X., Zhao, Z., Wang, X.: Gray2colornet: transfer more colors from reference image. In: ACM MM (2020)
Google Scholar
Manjunatha, V., Iyyer, M., Boyd-Graber, J., Davis, L.: Learning to color from language. In: NAACL (2018)
Google Scholar
Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: Scribbler: controlling deep image synthesis with sketch and color. In: CVPR (2017)
Google Scholar
Su, J.W., Chu, H.K., Huang, J.B.: Instance-aware image colorization. In: CVPR (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Google Scholar
Vitoria, P., Raad, L., Ballester, C.: ChromaGAN: adversarial picture colorization with semantic class distribution. In: WACV (2020)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. TIP 13(4), 600–612 (2004)
Google Scholar
Weng, S., Wu, H., Chang, Z.C., Tang, J., Li, S., Shi, B.: L-code: language-based colorization using color-object decoupled conditions. In: AAAI (2022)
Google Scholar
Wu, Y., Wang, X., Li, Y., Zhang, H., Zhao, X., Shan, Y.: Towards vivid and diverse image colorization with generative color prior. In: ICCV (2021)
Google Scholar
Xie, Y.: Language-guided image colorization. Master’s thesis, ETH Zurich, Departement of Computer Science (2018)
Google Scholar
Xu, Z., Wang, T., Fang, F., Sheng, Y., Zhang, G.: Stylization-based architecture for fast deep exemplar colorization. In: CVPR (2020)
Google Scholar
Zhang, H., Koh, J.Y., Baldridge, J., Lee, H., Yang, Y.: Cross-modal contrastive learning for text-to-image generation. In: CVPR (2021)
Google Scholar
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: ECCV (2016)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Zhang, R., et al.: Real-time user-guided image colorization with learned deep priors. ACM TOG (2017)
Google Scholar
Zhao, J., Liu, L., Snoek, C.G., Han, J., Shao, L.: Pixel-level semantics guided image colorization. In: BMVC (2018)
Google Scholar
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR (2021)
Google Scholar
Zhou, Y., et al.: TRAR: routing the attention spans in transformer for visual question answering. In: ICCV (2021)
Google Scholar
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: deformable transformers for end-to-end object detection. In: ICLR (2020)
Google Scholar

Download references

Acknowledgements

This project is supported by National Natural Science Foundation of China under Grant No. 62136001.

Author information

Authors and Affiliations

School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
Zheng Chang & Si Li
NERCVT, School of Computer Science, Peking University, Beijing, China
Shuchen Weng & Boxin Shi
International Digital Economy Academy, Shenzhen, China
Yu Li

Authors

Zheng Chang
View author publications
You can also search for this author in PubMed Google Scholar
Shuchen Weng
View author publications
You can also search for this author in PubMed Google Scholar
Yu Li
View author publications
You can also search for this author in PubMed Google Scholar
Si Li
View author publications
You can also search for this author in PubMed Google Scholar
Boxin Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Si Li .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1999 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chang, Z., Weng, S., Li, Y., Li, S., Shi, B. (2022). L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13678. Springer, Cham. https://doi.org/10.1007/978-3-031-19797-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-19797-0_21
Published: 03 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19796-3
Online ISBN: 978-3-031-19797-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer