An unsupervised approach for thermal to visible image translation using autoencoder and generative adversarial network

Patel, Heena; Upla, Kishor P.

doi:10.1007/s00138-021-01223-4

An unsupervised approach for thermal to visible image translation using autoencoder and generative adversarial network

Original Paper
Published: 26 June 2021

Volume 32, article number 99, (2021)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

615 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

The thermal to visible image translation is essential for night-vision applications since images acquired during night-time using visible camera are relying on the amount of illumination present around the objects being observed. The poor lighting and/or illumination during night-time results into inadequate details in the acquired scene using the visible camera, and hence, they are no longer useful for high-end applications. The current research on image-to-image translation for day-time has achieved remarkable performance using deep learning methods. However, it is very challenging to obtain same performance for night-time images, especially for the situations when low/no sources of light are available. The existing state-of-the-art image-to-image methods suffer from lack of preservation of fine details and also with incorrect mapping for night-time images due to unavailability of better corresponding visible images. Therefore, a novel architecture is proposed here to provide better visual information in night-time scenarios using unsupervised training. It consists of generative adversarial networks (GANs) and Autoencoders with a newly proposed Residual Block to extract versatile features from thermal and visible images. In order to learn better visualization of night-time images, we also introduce the gradient-based loss function along with standard GAN and cycle consistency losses in the proposed method. A weight sharing concept is implied further to relate features of thermal and visible domains. The experimental validation of the proposed method implies committed qualitative improvement and quantitative performance in terms of no-reference quality metrics such as NIQE, BRISQUE, BIQAA and BLIINDS over the other existing methods. Such work could be useful to the many vision-based applications specifically for night-time situations including the surveillance systems at border.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Illuminating the Roads: Night-to-Day Image Translation for Improved Visibility at Night

TVA-GAN: attention guided generative adversarial network for thermal to visible image transformations

Article 04 July 2023

Unpaired Thermal to Visible Spectrum Transfer Using Adversarial Training

References

Bavirisetti, D.P., Dhuli, R.: Two-scale image fusion of visible and infrared images using saliency detection. Infrared Phys. Technol. 76, 52–64 (2016)
Article Google Scholar
Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), IEEE, vol. 2, pp. 60–65 (2005)
Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8797 (2018)
Choi, Y., Kim, N., Hwang, S., Kweon, I.S.: Thermal image enhancement using convolutional neural network. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 223–230 (2016)
Cogswell, M., Ahmed, F., Girshick, R., Zitnick, L., Batra, D.: Reducing overfitting in deep networks by decorrelating representations. arXiv preprint arXiv:1511.06068 (2015)
Gabarda, S., Cristóbal, G.: Blind image quality assessment through anisotropy. JOSA A 24(12), B42–B51 (2007)
Article Google Scholar
Gao, R., Vorobyov, S.A., Zhao, H.: Image fusion with cosparse analysis operator. IEEE Signal Process. Lett. 24(7), 943–947 (2017)
Article Google Scholar
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)
MATH Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Guo, Y., Li, Y., Feris, R., Wang, L., Rosing, T.: Depthwise convolution is all you need for learning multiple visual domains, vol. 2. arXiv preprint arXiv:1902.00927 (2019)
Gupta, R.K., Chia, A.Y.S., Rajan, D., Ng, E.S., Zhiyong, H.: Image colorization using similar images. In: Proceedings of the 20th ACM International Conference on Multimedia, ACM, pp. 369–378 (2012)
Hamam, T., Dordek, Y., Cohen, D.: Single-band infrared texture-based image colorization. In: 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel, IEEE, pp. 1–5 (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.: Image analogies. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, ACM, pp. 327–340 (2001)
Hou, R., Zhou, D., Nie, R., Liu, D., Xiong, L., Guo, Y., Yu, C.: Vif-net: an unsupervised framework for infrared and visible image fusion. IEEE Trans. Comput. Imaging 6, 640–651 (2020)
Article Google Scholar
Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–189 (2018)
Hwang, S., Park, J., Kim, N., Choi, Y., So Kweon, I.: Multispectral pedestrian detection: Benchmark dataset and baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1037–1045 (2015)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Kandel, I., Castelli, M.: The effect of batch size on the generalizability of the convolutional neural networks on a histopathology dataset. ICT Express (2020)
Karlik, B., Olgac, A.V.: Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int. J. Artif. Intell. Expert Syst. 1(4), 111–122 (2011)
Google Scholar
Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. In: Advances in Neural Information Processing Systems, pp. 2575–2583 (2015)
Laffont, P.Y., Ren, Z., Tao, X., Qian, C., Hays, J.: Transient attributes for high-level understanding and editing of outdoor scenes. ACM Trans. Gr. 33(4), 149 (2014)
Article Google Scholar
Le, Q.V.: Building high-level features using large scale unsupervised learning. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8595–8598 (2013)
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 35–51 (2018)
Li, C., Guo, J., Porikli, F., Pang, Y.: Lightennet: a convolutional neural network for weakly illuminated image enhancement. Pattern Recognit. Lett. 104, 15–22 (2018)
Article Google Scholar
Li, C., Wand, M.: Precomputed real-time texture synthesis with Markovian generative adversarial networks. In: European Conference on Computer Vision, Springer, pp. 702–716 (2016)
Li, H., Wu, X.: Densefuse: a fusion approach to infrared and visible images. IEEE Trans. Image Process. 28, 1–10 (2019)
Article MathSciNet Google Scholar
Li, H., Wu, X., Kittler, J.: Infrared and visible image fusion using a deep learning framework. In: 2018 24th International Conference on Pattern Recognition (ICPR) pp. 2705–2710 (2018)
Li, H., Wu, X.J., Kittler, J.: Mdlatlrr: a novel decomposition method for infrared and visible image fusion. IEEE Trans. Image Process. 29, 4733–4746 (2020)
Article Google Scholar
Li, S., Kang, X., Fang, L., Hu, J., Yin, H.: Pixel-level image fusion: a survey of the state of the art. Inf. Fusion 33, 100–112 (2017)
Article Google Scholar
Li, X., Feng, R., Guan, X., Shen, H., Zhang, L.: Remote sensing image mosaicking: achievements and challenges. IEEE Geosci. Remote Sens. Mag. 7(4), 8–22 (2019)
Article Google Scholar
Li, X., Zhang, R., Wang, Q., Zhang, H.: Autoencoder constrained clustering with adaptive neighbors. IEEE Trans. Neural Netw. Learn. Syst. 32, 443 (2020)
Article Google Scholar
Limmer, M., Lensch, H.P.: Infrared colorization using deep convolutional neural networks. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, pp. 61–68 (2016)
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Advances in Neural Information Processing Systems, pp. 700–708 (2017)
Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 469–477 (2016)
Liu, Y., Liu, S., Wang, Z.: A general framework for image fusion based on multi-scale transform and sparse representation. Inf. Fusion 24, 147–164 (2015)
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Lore, K.G., Akintayo, A., Sarkar, S.: Llnet: a deep autoencoder approach to natural low-light image enhancement. Pattern Recognit. 61, 650–662 (2017)
Article Google Scholar
Ma, G., Yang, X., Zhang, B., Shi, Z.: Multi-feature fusion deep networks. Neurocomputing 218, 164–171 (2016)
Article Google Scholar
Ma, J., Chen, C., Li, C., Huang, J.: Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fusion 31, 100–109 (2016)
Article Google Scholar
Ma, J., Ma, Y., Li, C.: Infrared and visible image fusion methods and applications: a survey. Inf. Fusion 45, 153–178 (2019)
Article Google Scholar
Ma, J., Yu, W., Liang, P., Li, C., Jiang, J.: Fusiongan: a generative adversarial network for infrared and visible image fusion. Inf. Fusion 48, 11–26 (2019)
Article Google Scholar
Ma, J., Zhou, Z., Wang, B., Zong, H.: Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Phys. Technol. 82, 8–17 (2017)
Article Google Scholar
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML, vol. 30, p. 3 (2013)
Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012)
Article MathSciNet Google Scholar
Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 20(3), 209–212 (2012)
Article Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)
Ohkawa, T., Inoue, N., Kataoka, H., Inoue, N.: Augmented cyclic consistency regularization for unpaired image-to-image translation. arXiv preprint arXiv:2003.00187 (2020)
Pang, H., Zhu, M., Guo, L.: Multifocus color image fusion using quaternion wavelet transform. In: 2012 5th International Congress on Image and Signal Processing (CISP), IEEE, pp. 543–546 (2012)
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)
Patel, H., Prajapati, K., Chudasama, V., Upla, K.P.: An approach for fusion of thermal and visible images. In: International Conference on Emerging Technology Trends in Electronics Communication and Networking, Springer, pp. 225–234 (2020)
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)
Rajkumar, S., Mouli, C.: Infrared and visible image fusion using entropy and neuro-fuzzy concepts. In: ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India, vol. I, Springer, pp. 93–100 (2014)
Saad, M.A., Bovik, A.C., Charrier, C.: Blind image quality assessment: a natural scene statistics approach in the DCT domain. IEEE Trans. Image Process. 21(8), 3339–3352 (2012)
Article MathSciNet Google Scholar
Stout, A., Madineni, K., Tremblay, L., Tane, Z.: The development of synthetic thermal image generation tools and training data at flir. In: Automatic Target Recognition XXIX, International Society for Optics and Photonics, vol. 10988, p. 1098814 (2019)
Suárez, P.L., Sappa, A.D., Vintimilla, B.X.: Infrared image colorization based on a triplet DCGAN architecture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 18–23 (2017)
Suárez, P.L., Sappa, A.D., Vintimilla, B.X.: Learning to colorize infrared images. In: International Conference on Practical Applications of Agents and Multi-Agent Systems, Springer, pp. 164–172 (2017)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Tao, L., Zhu, C., Song, J., Lu, T., Jia, H., Xie, X.: Low-light image enhancement using CNN and bright channel prior. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp. 3215–3219 (2017)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)
Wang, X., Nie, R., Guo, X.: Two-scale image fusion of visible and infrared images using guided filter. In: Proceedings of the 7th International Conference on Informatics, Environment, Energy and Applications, ACM, pp. 217–221 (2018)
Xu, D., Wang, Y., Xu, S., Zhu, K., Zhang, N., Zhang, X.: Infrared and visible image fusion with a generative adversarial network and a residual network. Appl. Sci. 10(2), 554 (2020)
Article Google Scholar
Yoo, J., Eom, H., Choi, Y.S.: Image-to-image translation using a cross-domain auto-encoder and decoder. Appl. Sci. 9(22), 4780 (2019)
Article Google Scholar
Zhang, C., Wang, K., An, Y., He, K., Tong, T., Tian, J.: Improved generative adversarial networks using the total gradient loss for the resolution enhancement of fluorescence images. Biomed. Opt. Express 10(9), 4742–4756 (2019)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: European Conference on Computer Vision, Springer, pp. 649–666 (2016)
Zhang, R., Li, X.: Regularized regression with fuzzy membership embedding for unsupervised feature selection. IEEE Trans. Fuzzy Syst. (2020). https://doi.org/10.1109/TFUZZ.2020.3026834
Article Google Scholar
Zhang, R., Li, X., Zhang, H., Jiao, Z.: Geodesic multi-class SVM with stiefel manifold embedding. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3069498
Article Google Scholar
Zhang, R., Li, X., Zhang, H., Nie, F.: Deep fuzzy k-means with adaptive loss and entropy regularization. IEEE Trans. Fuzzy Syst. 28(11), 2814–2824 (2019)
Article Google Scholar
Zhang, R., Tong, H.: Robust principal component analysis with adaptive neighbors. In: NeuIPS (2019)
Zhang, R., Tong, H., Xia, Y., Zhu, Y.: Robust embedded deep k-means clustering. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1181–1190 (2019)
Zhang, R., Zhang, H., Li, X.: Robust multi-task learning with flexible manifold constraint. IEEE Trans. Pattern Anal. Mach. Intell. 43, 2150 (2020)
Article Google Scholar
Zhang, X., Ma, Y., Fan, F., Zhang, Y., Huang, J.: Infrared and visible image fusion via saliency analysis and local edge-preserving multi-scale decomposition. JOSA A 34(8), 1400–1410 (2017)
Article Google Scholar
Zhao, J., Chen, Y., Feng, H., Xu, Z., Li, Q.: Infrared image enhancement through saliency feature analysis based on multi-scale decomposition. Infrared Phys. Technol. 62, 86–93 (2014)
Article Google Scholar
Zhao, J., Cui, G., Gong, X., Zang, Y., Tao, S., Wang, D.: Fusion of visible and infrared images using global entropy and gradient constrained regularization. Infrared Phys. Technol. 81, 201–209 (2017)
Article Google Scholar
Zheng, Y., Blasch, E., Liu, Z.: Multispectral Image Fusion and Colorization, vol. 481. SPIE Press, Bellingham (2018)
Google Scholar
Zhou, Y., Berg, T.L.: Learning temporal transformations from time-lapse videos. In: European Conference on Computer Vision, Springer, pp. 262–277 (2016)
Zhu, J.Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision, Springer, pp. 597–613 (2016)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Zhu, J.Y., Zhang, R., Pathak, D., Darrell, T., Efros, A.A., Wang, O., Shechtman, E.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems, pp. 465–476 (2017)
Zong, J.J., Qiu, T.S.: Medical image fusion based on sparse representation of classified image patches. Biomed. Signal Process. Control 34, 195–205 (2017)
Article Google Scholar

Download references

Acknowledgements

Authors are thankful to Science and Engineering Research Board (SERB), a statutory body of Department of Science and Technology (DST), Government of India for providing GPU computer system for experiments (ECR/2017/003268).

Author information

Authors and Affiliations

Electronics Engineering Department, Sardar Vallabhbhai National Institute of Technology, Surat, 395007, India
Heena Patel
Sardar Vallabhbhai National Institute of Technology, Surat, 395007, India
Kishor P. Upla

Authors

Heena Patel
View author publications
You can also search for this author in PubMed Google Scholar
Kishor P. Upla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kishor P. Upla.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Patel, H., Upla, K.P. An unsupervised approach for thermal to visible image translation using autoencoder and generative adversarial network. Machine Vision and Applications 32, 99 (2021). https://doi.org/10.1007/s00138-021-01223-4

Download citation

Received: 19 August 2020
Revised: 18 May 2021
Accepted: 09 June 2021
Published: 26 June 2021
DOI: https://doi.org/10.1007/s00138-021-01223-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An unsupervised approach for thermal to visible image translation using autoencoder and generative adversarial network

Abstract

Access this article

Similar content being viewed by others

Illuminating the Roads: Night-to-Day Image Translation for Improved Visibility at Night

TVA-GAN: attention guided generative adversarial network for thermal to visible image transformations

Unpaired Thermal to Visible Spectrum Transfer Using Adversarial Training

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An unsupervised approach for thermal to visible image translation using autoencoder and generative adversarial network

Abstract

Access this article

Similar content being viewed by others

Illuminating the Roads: Night-to-Day Image Translation for Improved Visibility at Night

TVA-GAN: attention guided generative adversarial network for thermal to visible image transformations

Unpaired Thermal to Visible Spectrum Transfer Using Adversarial Training

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation