Skip to main content

L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13678))

Included in the following conference series:

Abstract

Language-based colorization requires the colorized image to be consistent with the user-provided language caption. A most recent work proposes to decouple the language into color and object conditions in solving the problem. Though decent progress has been made, its performance is limited by three key issues. (i) The large gap between vision and language modalities using independent feature extractors makes it difficult to fully understand the language. (ii) The inaccurate language features are never refined by the image features such that the language may fail to colorize the image precisely. (iii) The local region does not perceive the whole image, producing global inconsistent colors. In this work, we introduce transformer into language-based colorization to tackle the aforementioned issues while keeping the language decoupling property. Our method unifies the modalities of image and language, and further performs color conditions evolving with image features in a coarse-to-fine manner. In addition, thanks to the global receptive field, our method is robust to the strong local variation. Extensive experiments demonstrate our method is able to produce realistic colorization and outperforms prior arts in terms of consistency with the caption.

Z. Chang and S. Weng—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Antic, J.: A deep learning based project for colorizing and restoring old images (and video!). https://github.com/jantic/DeOldify

  2. Ardizzone, L., Lüth, C., Kruse, J., Rother, C., Köthe, U.: Guided image generation with conditional invertible neural networks. arXiv preprint arXiv:1907.02392 (2019)

  3. Chen, J., Shen, Y., Gao, J., Liu, J., Liu, X.: Language-based image editing with recurrent attentive models. In: CVPR (2018)

    Google Scholar 

  4. Cheng, Z., Yang, Q., Sheng, B.: Deep colorization. In: ICCV (2015)

    Google Scholar 

  5. Deshpande, A., Lu, J., Yeh, M.C., Jin Chong, M., Forsyth, D.: Learning diverse image colorization. In: CVPR (2017)

    Google Scholar 

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)

    Google Scholar 

  7. Ding, H., Liu, C., Wang, S., Jiang, X.: Vision-language transformer and query generation for referring segmentation. In: ICCV (2021)

    Google Scholar 

  8. Ding, M., et al.: CogView: mastering text-to-image generation via transformers. In: NIPS (2021)

    Google Scholar 

  9. Dosovitskiy, A., et al.: An image is worth 16 x 16 words: transformers for image recognition at scale. In: ICLR (2021)

    Google Scholar 

  10. He, M., Chen, D., Liao, J., Sander, P.V., Yuan, L.: Deep exemplar-based colorization. ACM TOG 37(4), 1–16 (2018)

    Google Scholar 

  11. Huynh-Thu, Q., Ghanbari, M.: Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 44(13), 800–801 (2008)

    Article  Google Scholar 

  12. Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM ToG 35(4), 1–11 (2016)

    Article  Google Scholar 

  13. Kim, W., Son, B., Kim, I.: ViLT: vision-and-language transformer without convolution or region supervision. In: ICML (2021)

    Google Scholar 

  14. Kumar, M., Weissenborn, D., Kalchbrenner, N.: Colorization transformer. In: ICLR (2021)

    Google Scholar 

  15. Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: ECCV (2016)

    Google Scholar 

  16. Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: image restoration using swin transformer. In: ICCV (2021)

    Google Scholar 

  17. Liu, R., ET AL.: FuseFormer: fusing fine-grained information in transformers for video inpainting. In: ICCV (2021)

    Google Scholar 

  18. Lu, P., Yu, J., Peng, X., Zhao, Z., Wang, X.: Gray2colornet: transfer more colors from reference image. In: ACM MM (2020)

    Google Scholar 

  19. Manjunatha, V., Iyyer, M., Boyd-Graber, J., Davis, L.: Learning to color from language. In: NAACL (2018)

    Google Scholar 

  20. Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: Scribbler: controlling deep image synthesis with sketch and color. In: CVPR (2017)

    Google Scholar 

  21. Su, J.W., Chu, H.K., Huang, J.B.: Instance-aware image colorization. In: CVPR (2020)

    Google Scholar 

  22. Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)

    Google Scholar 

  23. Vitoria, P., Raad, L., Ballester, C.: ChromaGAN: adversarial picture colorization with semantic class distribution. In: WACV (2020)

    Google Scholar 

  24. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. TIP 13(4), 600–612 (2004)

    Google Scholar 

  25. Weng, S., Wu, H., Chang, Z.C., Tang, J., Li, S., Shi, B.: L-code: language-based colorization using color-object decoupled conditions. In: AAAI (2022)

    Google Scholar 

  26. Wu, Y., Wang, X., Li, Y., Zhang, H., Zhao, X., Shan, Y.: Towards vivid and diverse image colorization with generative color prior. In: ICCV (2021)

    Google Scholar 

  27. Xie, Y.: Language-guided image colorization. Master’s thesis, ETH Zurich, Departement of Computer Science (2018)

    Google Scholar 

  28. Xu, Z., Wang, T., Fang, F., Sheng, Y., Zhang, G.: Stylization-based architecture for fast deep exemplar colorization. In: CVPR (2020)

    Google Scholar 

  29. Zhang, H., Koh, J.Y., Baldridge, J., Lee, H., Yang, Y.: Cross-modal contrastive learning for text-to-image generation. In: CVPR (2021)

    Google Scholar 

  30. Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: ECCV (2016)

    Google Scholar 

  31. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  32. Zhang, R., et al.: Real-time user-guided image colorization with learned deep priors. ACM TOG (2017)

    Google Scholar 

  33. Zhao, J., Liu, L., Snoek, C.G., Han, J., Shao, L.: Pixel-level semantics guided image colorization. In: BMVC (2018)

    Google Scholar 

  34. Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR (2021)

    Google Scholar 

  35. Zhou, Y., et al.: TRAR: routing the attention spans in transformer for visual question answering. In: ICCV (2021)

    Google Scholar 

  36. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: deformable transformers for end-to-end object detection. In: ICLR (2020)

    Google Scholar 

Download references

Acknowledgements

This project is supported by National Natural Science Foundation of China under Grant No. 62136001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Si Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1999 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chang, Z., Weng, S., Li, Y., Li, S., Shi, B. (2022). L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13678. Springer, Cham. https://doi.org/10.1007/978-3-031-19797-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19797-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19796-3

  • Online ISBN: 978-3-031-19797-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics