Skip to main content
Log in

An unsupervised approach for thermal to visible image translation using autoencoder and generative adversarial network

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

The thermal to visible image translation is essential for night-vision applications since images acquired during night-time using visible camera are relying on the amount of illumination present around the objects being observed. The poor lighting and/or illumination during night-time results into inadequate details in the acquired scene using the visible camera, and hence, they are no longer useful for high-end applications. The current research on image-to-image translation for day-time has achieved remarkable performance using deep learning methods. However, it is very challenging to obtain same performance for night-time images, especially for the situations when low/no sources of light are available. The existing state-of-the-art image-to-image methods suffer from lack of preservation of fine details and also with incorrect mapping for night-time images due to unavailability of better corresponding visible images. Therefore, a novel architecture is proposed here to provide better visual information in night-time scenarios using unsupervised training. It consists of generative adversarial networks (GANs) and Autoencoders with a newly proposed Residual Block to extract versatile features from thermal and visible images. In order to learn better visualization of night-time images, we also introduce the gradient-based loss function along with standard GAN and cycle consistency losses in the proposed method. A weight sharing concept is implied further to relate features of thermal and visible domains. The experimental validation of the proposed method implies committed qualitative improvement and quantitative performance in terms of no-reference quality metrics such as NIQE, BRISQUE, BIQAA and BLIINDS over the other existing methods. Such work could be useful to the many vision-based applications specifically for night-time situations including the surveillance systems at border.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Bavirisetti, D.P., Dhuli, R.: Two-scale image fusion of visible and infrared images using saliency detection. Infrared Phys. Technol. 76, 52–64 (2016)

    Article  Google Scholar 

  2. Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), IEEE, vol. 2, pp. 60–65 (2005)

  3. Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8797 (2018)

  4. Choi, Y., Kim, N., Hwang, S., Kweon, I.S.: Thermal image enhancement using convolutional neural network. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 223–230 (2016)

  5. Cogswell, M., Ahmed, F., Girshick, R., Zitnick, L., Batra, D.: Reducing overfitting in deep networks by decorrelating representations. arXiv preprint arXiv:1511.06068 (2015)

  6. Gabarda, S., Cristóbal, G.: Blind image quality assessment through anisotropy. JOSA A 24(12), B42–B51 (2007)

    Article  Google Scholar 

  7. Gao, R., Vorobyov, S.A., Zhao, H.: Image fusion with cosparse analysis operator. IEEE Signal Process. Lett. 24(7), 943–947 (2017)

    Article  Google Scholar 

  8. Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  9. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

  10. Guo, Y., Li, Y., Feris, R., Wang, L., Rosing, T.: Depthwise convolution is all you need for learning multiple visual domains, vol. 2. arXiv preprint arXiv:1902.00927 (2019)

  11. Gupta, R.K., Chia, A.Y.S., Rajan, D., Ng, E.S., Zhiyong, H.: Image colorization using similar images. In: Proceedings of the 20th ACM International Conference on Multimedia, ACM, pp. 369–378 (2012)

  12. Hamam, T., Dordek, Y., Cohen, D.: Single-band infrared texture-based image colorization. In: 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel, IEEE, pp. 1–5 (2012)

  13. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)

  14. Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.: Image analogies. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, ACM, pp. 327–340 (2001)

  15. Hou, R., Zhou, D., Nie, R., Liu, D., Xiong, L., Guo, Y., Yu, C.: Vif-net: an unsupervised framework for infrared and visible image fusion. IEEE Trans. Comput. Imaging 6, 640–651 (2020)

    Article  Google Scholar 

  16. Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–189 (2018)

  17. Hwang, S., Park, J., Kim, N., Choi, Y., So Kweon, I.: Multispectral pedestrian detection: Benchmark dataset and baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1037–1045 (2015)

  18. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)

  19. Kandel, I., Castelli, M.: The effect of batch size on the generalizability of the convolutional neural networks on a histopathology dataset. ICT Express (2020)

  20. Karlik, B., Olgac, A.V.: Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int. J. Artif. Intell. Expert Syst. 1(4), 111–122 (2011)

    Google Scholar 

  21. Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. In: Advances in Neural Information Processing Systems, pp. 2575–2583 (2015)

  22. Laffont, P.Y., Ren, Z., Tao, X., Qian, C., Hays, J.: Transient attributes for high-level understanding and editing of outdoor scenes. ACM Trans. Gr. 33(4), 149 (2014)

    Article  Google Scholar 

  23. Le, Q.V.: Building high-level features using large scale unsupervised learning. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8595–8598 (2013)

  24. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)

  25. Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 35–51 (2018)

  26. Li, C., Guo, J., Porikli, F., Pang, Y.: Lightennet: a convolutional neural network for weakly illuminated image enhancement. Pattern Recognit. Lett. 104, 15–22 (2018)

    Article  Google Scholar 

  27. Li, C., Wand, M.: Precomputed real-time texture synthesis with Markovian generative adversarial networks. In: European Conference on Computer Vision, Springer, pp. 702–716 (2016)

  28. Li, H., Wu, X.: Densefuse: a fusion approach to infrared and visible images. IEEE Trans. Image Process. 28, 1–10 (2019)

    Article  MathSciNet  Google Scholar 

  29. Li, H., Wu, X., Kittler, J.: Infrared and visible image fusion using a deep learning framework. In: 2018 24th International Conference on Pattern Recognition (ICPR) pp. 2705–2710 (2018)

  30. Li, H., Wu, X.J., Kittler, J.: Mdlatlrr: a novel decomposition method for infrared and visible image fusion. IEEE Trans. Image Process. 29, 4733–4746 (2020)

    Article  Google Scholar 

  31. Li, S., Kang, X., Fang, L., Hu, J., Yin, H.: Pixel-level image fusion: a survey of the state of the art. Inf. Fusion 33, 100–112 (2017)

    Article  Google Scholar 

  32. Li, X., Feng, R., Guan, X., Shen, H., Zhang, L.: Remote sensing image mosaicking: achievements and challenges. IEEE Geosci. Remote Sens. Mag. 7(4), 8–22 (2019)

    Article  Google Scholar 

  33. Li, X., Zhang, R., Wang, Q., Zhang, H.: Autoencoder constrained clustering with adaptive neighbors. IEEE Trans. Neural Netw. Learn. Syst. 32, 443 (2020)

    Article  Google Scholar 

  34. Limmer, M., Lensch, H.P.: Infrared colorization using deep convolutional neural networks. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, pp. 61–68 (2016)

  35. Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Advances in Neural Information Processing Systems, pp. 700–708 (2017)

  36. Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 469–477 (2016)

  37. Liu, Y., Liu, S., Wang, Z.: A general framework for image fusion based on multi-scale transform and sparse representation. Inf. Fusion 24, 147–164 (2015)

    Article  Google Scholar 

  38. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  39. Lore, K.G., Akintayo, A., Sarkar, S.: Llnet: a deep autoencoder approach to natural low-light image enhancement. Pattern Recognit. 61, 650–662 (2017)

    Article  Google Scholar 

  40. Ma, G., Yang, X., Zhang, B., Shi, Z.: Multi-feature fusion deep networks. Neurocomputing 218, 164–171 (2016)

    Article  Google Scholar 

  41. Ma, J., Chen, C., Li, C., Huang, J.: Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fusion 31, 100–109 (2016)

    Article  Google Scholar 

  42. Ma, J., Ma, Y., Li, C.: Infrared and visible image fusion methods and applications: a survey. Inf. Fusion 45, 153–178 (2019)

    Article  Google Scholar 

  43. Ma, J., Yu, W., Liang, P., Li, C., Jiang, J.: Fusiongan: a generative adversarial network for infrared and visible image fusion. Inf. Fusion 48, 11–26 (2019)

    Article  Google Scholar 

  44. Ma, J., Zhou, Z., Wang, B., Zong, H.: Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Phys. Technol. 82, 8–17 (2017)

    Article  Google Scholar 

  45. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML, vol. 30, p. 3 (2013)

  46. Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012)

    Article  MathSciNet  Google Scholar 

  47. Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 20(3), 209–212 (2012)

    Article  Google Scholar 

  48. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)

  49. Ohkawa, T., Inoue, N., Kataoka, H., Inoue, N.: Augmented cyclic consistency regularization for unpaired image-to-image translation. arXiv preprint arXiv:2003.00187 (2020)

  50. Pang, H., Zhu, M., Guo, L.: Multifocus color image fusion using quaternion wavelet transform. In: 2012 5th International Congress on Image and Signal Processing (CISP), IEEE, pp. 543–546 (2012)

  51. Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)

  52. Patel, H., Prajapati, K., Chudasama, V., Upla, K.P.: An approach for fusion of thermal and visible images. In: International Conference on Emerging Technology Trends in Electronics Communication and Networking, Springer, pp. 225–234 (2020)

  53. Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)

  54. Rajkumar, S., Mouli, C.: Infrared and visible image fusion using entropy and neuro-fuzzy concepts. In: ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India, vol. I, Springer, pp. 93–100 (2014)

  55. Saad, M.A., Bovik, A.C., Charrier, C.: Blind image quality assessment: a natural scene statistics approach in the DCT domain. IEEE Trans. Image Process. 21(8), 3339–3352 (2012)

    Article  MathSciNet  Google Scholar 

  56. Stout, A., Madineni, K., Tremblay, L., Tane, Z.: The development of synthetic thermal image generation tools and training data at flir. In: Automatic Target Recognition XXIX, International Society for Optics and Photonics, vol. 10988, p. 1098814 (2019)

  57. Suárez, P.L., Sappa, A.D., Vintimilla, B.X.: Infrared image colorization based on a triplet DCGAN architecture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 18–23 (2017)

  58. Suárez, P.L., Sappa, A.D., Vintimilla, B.X.: Learning to colorize infrared images. In: International Conference on Practical Applications of Agents and Multi-Agent Systems, Springer, pp. 164–172 (2017)

  59. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  60. Tao, L., Zhu, C., Song, J., Lu, T., Jia, H., Xie, X.: Low-light image enhancement using CNN and bright channel prior. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp. 3215–3219 (2017)

  61. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)

  62. Wang, X., Nie, R., Guo, X.: Two-scale image fusion of visible and infrared images using guided filter. In: Proceedings of the 7th International Conference on Informatics, Environment, Energy and Applications, ACM, pp. 217–221 (2018)

  63. Xu, D., Wang, Y., Xu, S., Zhu, K., Zhang, N., Zhang, X.: Infrared and visible image fusion with a generative adversarial network and a residual network. Appl. Sci. 10(2), 554 (2020)

    Article  Google Scholar 

  64. Yoo, J., Eom, H., Choi, Y.S.: Image-to-image translation using a cross-domain auto-encoder and decoder. Appl. Sci. 9(22), 4780 (2019)

    Article  Google Scholar 

  65. Zhang, C., Wang, K., An, Y., He, K., Tong, T., Tian, J.: Improved generative adversarial networks using the total gradient loss for the resolution enhancement of fluorescence images. Biomed. Opt. Express 10(9), 4742–4756 (2019)

  66. Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: European Conference on Computer Vision, Springer, pp. 649–666 (2016)

  67. Zhang, R., Li, X.: Regularized regression with fuzzy membership embedding for unsupervised feature selection. IEEE Trans. Fuzzy Syst. (2020). https://doi.org/10.1109/TFUZZ.2020.3026834

    Article  Google Scholar 

  68. Zhang, R., Li, X., Zhang, H., Jiao, Z.: Geodesic multi-class SVM with stiefel manifold embedding. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3069498

    Article  Google Scholar 

  69. Zhang, R., Li, X., Zhang, H., Nie, F.: Deep fuzzy k-means with adaptive loss and entropy regularization. IEEE Trans. Fuzzy Syst. 28(11), 2814–2824 (2019)

    Article  Google Scholar 

  70. Zhang, R., Tong, H.: Robust principal component analysis with adaptive neighbors. In: NeuIPS (2019)

  71. Zhang, R., Tong, H., Xia, Y., Zhu, Y.: Robust embedded deep k-means clustering. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1181–1190 (2019)

  72. Zhang, R., Zhang, H., Li, X.: Robust multi-task learning with flexible manifold constraint. IEEE Trans. Pattern Anal. Mach. Intell. 43, 2150 (2020)

    Article  Google Scholar 

  73. Zhang, X., Ma, Y., Fan, F., Zhang, Y., Huang, J.: Infrared and visible image fusion via saliency analysis and local edge-preserving multi-scale decomposition. JOSA A 34(8), 1400–1410 (2017)

    Article  Google Scholar 

  74. Zhao, J., Chen, Y., Feng, H., Xu, Z., Li, Q.: Infrared image enhancement through saliency feature analysis based on multi-scale decomposition. Infrared Phys. Technol. 62, 86–93 (2014)

    Article  Google Scholar 

  75. Zhao, J., Cui, G., Gong, X., Zang, Y., Tao, S., Wang, D.: Fusion of visible and infrared images using global entropy and gradient constrained regularization. Infrared Phys. Technol. 81, 201–209 (2017)

    Article  Google Scholar 

  76. Zheng, Y., Blasch, E., Liu, Z.: Multispectral Image Fusion and Colorization, vol. 481. SPIE Press, Bellingham (2018)

    Google Scholar 

  77. Zhou, Y., Berg, T.L.: Learning temporal transformations from time-lapse videos. In: European Conference on Computer Vision, Springer, pp. 262–277 (2016)

  78. Zhu, J.Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision, Springer, pp. 597–613 (2016)

  79. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)

  80. Zhu, J.Y., Zhang, R., Pathak, D., Darrell, T., Efros, A.A., Wang, O., Shechtman, E.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems, pp. 465–476 (2017)

  81. Zong, J.J., Qiu, T.S.: Medical image fusion based on sparse representation of classified image patches. Biomed. Signal Process. Control 34, 195–205 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

Authors are thankful to Science and Engineering Research Board (SERB), a statutory body of Department of Science and Technology (DST), Government of India for providing GPU computer system for experiments (ECR/2017/003268).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kishor P. Upla.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patel, H., Upla, K.P. An unsupervised approach for thermal to visible image translation using autoencoder and generative adversarial network. Machine Vision and Applications 32, 99 (2021). https://doi.org/10.1007/s00138-021-01223-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-021-01223-4

Keywords

Navigation