MMC: Multi-modal colorization of images using textual description

Ghosh, Subhankar; Bhattacharya, Saumik; Roy, Prasun; Pal, Umapada; Blumenstein, Michael

doi:10.1007/s11760-024-03650-y

MMC: Multi-modal colorization of images using textual description

Original Paper
Published: 09 December 2024

Volume 19, article number 107, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Subhankar Ghosh¹,
Saumik Bhattacharya²,
Prasun Roy¹,
Umapada Pal³ &
…
Michael Blumenstein¹

128 Accesses
1 Citation
Explore all metrics

Abstract

Handling various objects with different colours is a significant challenge for image colourisation techniques. Thus, for complex real-world scenes, the existing image colourisation algorithms often fail to maintain colour consistency. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the greyscale image that is to be colourised, to improve the fidelity of the colourisation process. To do so, we have proposed a deep network that takes two inputs (greyscale image and the respective encoded text description) and tries to predict the relevant colour components. Also, we have predicted each object in the image and have colourised them with their individual description to incorporate their specific attributes in the colourisation process. After that, a fusion model fuses all the image objects (segments) to generate the final colourised image. As the respective textual descriptions contain colour information of the objects in the image, text encoding helps improve the overall quality of predicted colours. In terms of performance, the proposed method outperforms existing colourisation techniques in terms of LPIPS, PSNR and SSIM metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Lightweight Image Colorization Model Based on U-Net Architecture

A Deep Neural Approach Toward Staining and Tinting of Monochrome Images

Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes

Data availability

Data will be made available at a reasonable request.

Code Availability

The code will be made available at a reasonable request.

References

Caesar, H., Uijlings, J.R.R., Ferrari, V.: Coco-stuff: Thing and stuff classes in context. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1209–1218 (2018)
Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-ucsd birds 200. In: Technical Report CNS-TR-2010-001, California Institute of Technology (2010)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255 (2009)
Kumar, M., Weissenborn, D., Kalchbrenner, N.: Colorization transformer. arXiv:2102.04432 (2021)
Wu, Y., Wang, X., Li, Y., Zhang, H., Zhao, X., Shan, Y.: Towards vivid and diverse image colorization with generative color prior. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 14357–14366 (2021)
Su, J.-W., Chu, H.-k., Huang, J.-B.: Instance-aware image colorization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7965–7974 (2020)
Parihar, A.S., Verma, O.P., Khanna, C.: Fuzzy-contextual contrast enhancement. IEEE Trans. Image Process. 26, 1810–1819 (2017)
Article MathSciNet MATH Google Scholar
Verma, O.P., Parihar, A.S.: An optimal fuzzy system for edge detection in color images using bacterial foraging algorithm. IEEE Trans. Fuzzy Syst. 25, 114–127 (2017)
Article MATH Google Scholar
Wu, D.N.B.H., Gan, J., Zhou, J., Wang, J., Gao, W.: Fine-grained semantic ethnic costume high-resolution image colorization with conditional gan. Int. J. Intell. Syst. 37, 2952–2968 (2022)
Article Google Scholar
Huang, S., Jin, X., Jiang, Q., Liu, L.: Deep learning for image colorization: current and future prospects. Eng. Appl. Artif. Intell. 114, 105006 (2022)
Article MATH Google Scholar
Xiao, Y., Jiang, A., Liu, C., Wang, M.: Semantic-aware automatic image colorization via unpaired cycle-consistent self-supervised network. Int. J. Intell. Syst. 37, 1222–1238 (2022)
Article MATH Google Scholar
Luo, F., Li, Y., Zeng, G., Peng, P., Wang, G., Li, Y.: Thermal infrared image colorization for nighttime driving scenes with top-down guided attention. IEEE Trans. Intell. Transp. Syst. 23, 15808–15823 (2022)
Article Google Scholar
Treneska, S., Zdravevski, E., Pires, I., Lameski, P., Gievska, S.: Gan-based image colorization for self-supervised visual feature learning. Sensors (Basel, Switzerland) 22 (2022)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing (TIP) (2004)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004)
Article MATH Google Scholar
Huang, Y.-C., Tung, Y.-S., Chen, J.-C., Wang, S.-W., Wu, J.-L.: An adaptive edge detection based colorization algorithm and its applications. In: MULTIMEDIA ’05 (2005)
Nie, D., Ma, Q., Ma, L., Xiao, S.: Optimization based grayscale image colorization. Pattern Recognit. Lett. 28, 1445–1451 (2007)
Article MATH Google Scholar
Wang, P., Patel, V.M.: Generating high quality visible images from SAR images using CNNS. In: 2018 IEEE Radar Conference (RadarConf18), 0570–0575 (2018)
Tola, E., Lepetit, V., Fua, P.V.: A fast local descriptor for dense matching. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, 1–8 (2008)
Perazzi, F., Pont-Tuset, J., McWilliams, B., Gool, L.V., Gross, M.H., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 724–732 (2016)
Lu, Y., Yang, X., Li, X., Wang, X.E., Wang, W.Y.: Llmscore: Unveiling the power of large language models in text-to-image synthesis evaluation. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems, vol. 36, pp. 23075–23093. Curran Associates, Inc., (2023)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: ECCV (2016)
Koley, S., Bhunia, A.K., Sain, A., Chowdhury, P.N., Xiang, T., Song, Y.-Z.: You’ll never walk alone: A sketch and text duet for fine-grained image retrieval. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16509–16519 (2024)
Cheng, Z., Yang, Q., Sheng, B.: Deep colorization. In: 2015 IEEE International Conference on Computer Vision (ICCV), 415–423 (2015)
Carlucci, F.M., Russo, P., Caputo, B.: $(de)^2co$: Deep depth colorization. IEEE Robotics and Automation Letters (2018)
Bahng, H., Yoo, S., Cho, W., Park, D.K., Wu, Z., Ma, X., Choo, J.: Coloring with words: Guiding image colorization through text-based palette generation. In: ECCV (2018)
Zhang, R., Zhu, J.-Y., Isola, P., Geng, X., Lin, A.S., Yu, T., Efros, A.A.: Real-time user-guided image colorization with learned deep priors. ACM Transactions on Graphics (TOG) 36, 1–11 (2017)
Google Scholar
Weng, S., Wu, H., Chang, Z., Tang, J., Li, S., Shi, B.: L-code: Language-based colorization using color-object decoupled conditions. In: AAAI (2022)
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask r-cnn. In: 2017 IEEE International Conference on Computer Vision (ICCV), 2980–2988 (2017)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2019)
Ren, M., Kiros, R., Zemel, R.S.: Exploring models and data for image question answering. In: NIPS (2015)
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color! ACM Transactions on Graphics (TOG) 35, 1–11 (2016)
Article Google Scholar
Antic., J.: A deep learning based project for colorizing and restoring old images (and video!). https://github.com/jantic/deoldify,. (2019)
Lei, C., Chen, Q.: Fully automatic video colorization with self-regularization and diversity. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3748–3756 (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: The International Conference on Learning Representations (ICLR) (2015)
Manjunatha, V., Iyyer, M., Boyd-Graber, J.L., Davis, L.S.: Learning to color from language. In: NAACL (2018)
Chang, Z., Weng, S., Li, Y., Li, S., Shi, B.: L-coder: Language-based colorization with color-object decoupling transformer. In: ECCV (2022)

Download references

Author information

Authors and Affiliations

Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW, Australia
Subhankar Ghosh, Prasun Roy & Michael Blumenstein
E&ECE Department, Indian Institute of Technology Kharagpur, Kharagpur, India
Saumik Bhattacharya
CVPR Unit, Indian Statistical Institute, Kolkata, India
Umapada Pal

Authors

Subhankar Ghosh
View author publications
You can also search for this author inPubMed Google Scholar
Saumik Bhattacharya
View author publications
You can also search for this author inPubMed Google Scholar
Prasun Roy
View author publications
You can also search for this author inPubMed Google Scholar
Umapada Pal
View author publications
You can also search for this author inPubMed Google Scholar
Michael Blumenstein
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Subhankar Ghosh wrote the main manuscript text. Saumik Bhattacharya helps with logical understanding and wrote the manuscript. Prasun Roy, Umapada Pal, and Michael Blumenstein reviewed the manuscript.

Corresponding author

Correspondence to Subhankar Ghosh.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest

Ethics approval and consent to participate

Not applicable

Consent for publication

Yes

Materials availability

Materials will be made available at a reasonable request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ghosh, S., Bhattacharya, S., Roy, P. et al. MMC: Multi-modal colorization of images using textual description. SIViP 19, 107 (2025). https://doi.org/10.1007/s11760-024-03650-y

Download citation

Received: 24 May 2024
Revised: 01 October 2024
Accepted: 05 October 2024
Published: 09 December 2024
DOI: https://doi.org/10.1007/s11760-024-03650-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MMC: Multi-modal colorization of images using textual description

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Lightweight Image Colorization Model Based on U-Net Architecture

A Deep Neural Approach Toward Staining and Tinting of Monochrome Images

Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes

Data availability

Code Availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval and consent to participate

Consent for publication

Materials availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now