Abstract
The thermal to visible image translation is essential for night-vision applications since images acquired during night-time using visible camera are relying on the amount of illumination present around the objects being observed. The poor lighting and/or illumination during night-time results into inadequate details in the acquired scene using the visible camera, and hence, they are no longer useful for high-end applications. The current research on image-to-image translation for day-time has achieved remarkable performance using deep learning methods. However, it is very challenging to obtain same performance for night-time images, especially for the situations when low/no sources of light are available. The existing state-of-the-art image-to-image methods suffer from lack of preservation of fine details and also with incorrect mapping for night-time images due to unavailability of better corresponding visible images. Therefore, a novel architecture is proposed here to provide better visual information in night-time scenarios using unsupervised training. It consists of generative adversarial networks (GANs) and Autoencoders with a newly proposed Residual Block to extract versatile features from thermal and visible images. In order to learn better visualization of night-time images, we also introduce the gradient-based loss function along with standard GAN and cycle consistency losses in the proposed method. A weight sharing concept is implied further to relate features of thermal and visible domains. The experimental validation of the proposed method implies committed qualitative improvement and quantitative performance in terms of no-reference quality metrics such as NIQE, BRISQUE, BIQAA and BLIINDS over the other existing methods. Such work could be useful to the many vision-based applications specifically for night-time situations including the surveillance systems at border.
Similar content being viewed by others
References
Bavirisetti, D.P., Dhuli, R.: Two-scale image fusion of visible and infrared images using saliency detection. Infrared Phys. Technol. 76, 52–64 (2016)
Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), IEEE, vol. 2, pp. 60–65 (2005)
Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8797 (2018)
Choi, Y., Kim, N., Hwang, S., Kweon, I.S.: Thermal image enhancement using convolutional neural network. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 223–230 (2016)
Cogswell, M., Ahmed, F., Girshick, R., Zitnick, L., Batra, D.: Reducing overfitting in deep networks by decorrelating representations. arXiv preprint arXiv:1511.06068 (2015)
Gabarda, S., Cristóbal, G.: Blind image quality assessment through anisotropy. JOSA A 24(12), B42–B51 (2007)
Gao, R., Vorobyov, S.A., Zhao, H.: Image fusion with cosparse analysis operator. IEEE Signal Process. Lett. 24(7), 943–947 (2017)
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Guo, Y., Li, Y., Feris, R., Wang, L., Rosing, T.: Depthwise convolution is all you need for learning multiple visual domains, vol. 2. arXiv preprint arXiv:1902.00927 (2019)
Gupta, R.K., Chia, A.Y.S., Rajan, D., Ng, E.S., Zhiyong, H.: Image colorization using similar images. In: Proceedings of the 20th ACM International Conference on Multimedia, ACM, pp. 369–378 (2012)
Hamam, T., Dordek, Y., Cohen, D.: Single-band infrared texture-based image colorization. In: 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel, IEEE, pp. 1–5 (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.: Image analogies. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, ACM, pp. 327–340 (2001)
Hou, R., Zhou, D., Nie, R., Liu, D., Xiong, L., Guo, Y., Yu, C.: Vif-net: an unsupervised framework for infrared and visible image fusion. IEEE Trans. Comput. Imaging 6, 640–651 (2020)
Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–189 (2018)
Hwang, S., Park, J., Kim, N., Choi, Y., So Kweon, I.: Multispectral pedestrian detection: Benchmark dataset and baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1037–1045 (2015)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Kandel, I., Castelli, M.: The effect of batch size on the generalizability of the convolutional neural networks on a histopathology dataset. ICT Express (2020)
Karlik, B., Olgac, A.V.: Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int. J. Artif. Intell. Expert Syst. 1(4), 111–122 (2011)
Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. In: Advances in Neural Information Processing Systems, pp. 2575–2583 (2015)
Laffont, P.Y., Ren, Z., Tao, X., Qian, C., Hays, J.: Transient attributes for high-level understanding and editing of outdoor scenes. ACM Trans. Gr. 33(4), 149 (2014)
Le, Q.V.: Building high-level features using large scale unsupervised learning. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8595–8598 (2013)
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 35–51 (2018)
Li, C., Guo, J., Porikli, F., Pang, Y.: Lightennet: a convolutional neural network for weakly illuminated image enhancement. Pattern Recognit. Lett. 104, 15–22 (2018)
Li, C., Wand, M.: Precomputed real-time texture synthesis with Markovian generative adversarial networks. In: European Conference on Computer Vision, Springer, pp. 702–716 (2016)
Li, H., Wu, X.: Densefuse: a fusion approach to infrared and visible images. IEEE Trans. Image Process. 28, 1–10 (2019)
Li, H., Wu, X., Kittler, J.: Infrared and visible image fusion using a deep learning framework. In: 2018 24th International Conference on Pattern Recognition (ICPR) pp. 2705–2710 (2018)
Li, H., Wu, X.J., Kittler, J.: Mdlatlrr: a novel decomposition method for infrared and visible image fusion. IEEE Trans. Image Process. 29, 4733–4746 (2020)
Li, S., Kang, X., Fang, L., Hu, J., Yin, H.: Pixel-level image fusion: a survey of the state of the art. Inf. Fusion 33, 100–112 (2017)
Li, X., Feng, R., Guan, X., Shen, H., Zhang, L.: Remote sensing image mosaicking: achievements and challenges. IEEE Geosci. Remote Sens. Mag. 7(4), 8–22 (2019)
Li, X., Zhang, R., Wang, Q., Zhang, H.: Autoencoder constrained clustering with adaptive neighbors. IEEE Trans. Neural Netw. Learn. Syst. 32, 443 (2020)
Limmer, M., Lensch, H.P.: Infrared colorization using deep convolutional neural networks. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, pp. 61–68 (2016)
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Advances in Neural Information Processing Systems, pp. 700–708 (2017)
Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 469–477 (2016)
Liu, Y., Liu, S., Wang, Z.: A general framework for image fusion based on multi-scale transform and sparse representation. Inf. Fusion 24, 147–164 (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Lore, K.G., Akintayo, A., Sarkar, S.: Llnet: a deep autoencoder approach to natural low-light image enhancement. Pattern Recognit. 61, 650–662 (2017)
Ma, G., Yang, X., Zhang, B., Shi, Z.: Multi-feature fusion deep networks. Neurocomputing 218, 164–171 (2016)
Ma, J., Chen, C., Li, C., Huang, J.: Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fusion 31, 100–109 (2016)
Ma, J., Ma, Y., Li, C.: Infrared and visible image fusion methods and applications: a survey. Inf. Fusion 45, 153–178 (2019)
Ma, J., Yu, W., Liang, P., Li, C., Jiang, J.: Fusiongan: a generative adversarial network for infrared and visible image fusion. Inf. Fusion 48, 11–26 (2019)
Ma, J., Zhou, Z., Wang, B., Zong, H.: Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Phys. Technol. 82, 8–17 (2017)
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML, vol. 30, p. 3 (2013)
Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012)
Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 20(3), 209–212 (2012)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)
Ohkawa, T., Inoue, N., Kataoka, H., Inoue, N.: Augmented cyclic consistency regularization for unpaired image-to-image translation. arXiv preprint arXiv:2003.00187 (2020)
Pang, H., Zhu, M., Guo, L.: Multifocus color image fusion using quaternion wavelet transform. In: 2012 5th International Congress on Image and Signal Processing (CISP), IEEE, pp. 543–546 (2012)
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)
Patel, H., Prajapati, K., Chudasama, V., Upla, K.P.: An approach for fusion of thermal and visible images. In: International Conference on Emerging Technology Trends in Electronics Communication and Networking, Springer, pp. 225–234 (2020)
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)
Rajkumar, S., Mouli, C.: Infrared and visible image fusion using entropy and neuro-fuzzy concepts. In: ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India, vol. I, Springer, pp. 93–100 (2014)
Saad, M.A., Bovik, A.C., Charrier, C.: Blind image quality assessment: a natural scene statistics approach in the DCT domain. IEEE Trans. Image Process. 21(8), 3339–3352 (2012)
Stout, A., Madineni, K., Tremblay, L., Tane, Z.: The development of synthetic thermal image generation tools and training data at flir. In: Automatic Target Recognition XXIX, International Society for Optics and Photonics, vol. 10988, p. 1098814 (2019)
Suárez, P.L., Sappa, A.D., Vintimilla, B.X.: Infrared image colorization based on a triplet DCGAN architecture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 18–23 (2017)
Suárez, P.L., Sappa, A.D., Vintimilla, B.X.: Learning to colorize infrared images. In: International Conference on Practical Applications of Agents and Multi-Agent Systems, Springer, pp. 164–172 (2017)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Tao, L., Zhu, C., Song, J., Lu, T., Jia, H., Xie, X.: Low-light image enhancement using CNN and bright channel prior. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp. 3215–3219 (2017)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)
Wang, X., Nie, R., Guo, X.: Two-scale image fusion of visible and infrared images using guided filter. In: Proceedings of the 7th International Conference on Informatics, Environment, Energy and Applications, ACM, pp. 217–221 (2018)
Xu, D., Wang, Y., Xu, S., Zhu, K., Zhang, N., Zhang, X.: Infrared and visible image fusion with a generative adversarial network and a residual network. Appl. Sci. 10(2), 554 (2020)
Yoo, J., Eom, H., Choi, Y.S.: Image-to-image translation using a cross-domain auto-encoder and decoder. Appl. Sci. 9(22), 4780 (2019)
Zhang, C., Wang, K., An, Y., He, K., Tong, T., Tian, J.: Improved generative adversarial networks using the total gradient loss for the resolution enhancement of fluorescence images. Biomed. Opt. Express 10(9), 4742–4756 (2019)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: European Conference on Computer Vision, Springer, pp. 649–666 (2016)
Zhang, R., Li, X.: Regularized regression with fuzzy membership embedding for unsupervised feature selection. IEEE Trans. Fuzzy Syst. (2020). https://doi.org/10.1109/TFUZZ.2020.3026834
Zhang, R., Li, X., Zhang, H., Jiao, Z.: Geodesic multi-class SVM with stiefel manifold embedding. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3069498
Zhang, R., Li, X., Zhang, H., Nie, F.: Deep fuzzy k-means with adaptive loss and entropy regularization. IEEE Trans. Fuzzy Syst. 28(11), 2814–2824 (2019)
Zhang, R., Tong, H.: Robust principal component analysis with adaptive neighbors. In: NeuIPS (2019)
Zhang, R., Tong, H., Xia, Y., Zhu, Y.: Robust embedded deep k-means clustering. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1181–1190 (2019)
Zhang, R., Zhang, H., Li, X.: Robust multi-task learning with flexible manifold constraint. IEEE Trans. Pattern Anal. Mach. Intell. 43, 2150 (2020)
Zhang, X., Ma, Y., Fan, F., Zhang, Y., Huang, J.: Infrared and visible image fusion via saliency analysis and local edge-preserving multi-scale decomposition. JOSA A 34(8), 1400–1410 (2017)
Zhao, J., Chen, Y., Feng, H., Xu, Z., Li, Q.: Infrared image enhancement through saliency feature analysis based on multi-scale decomposition. Infrared Phys. Technol. 62, 86–93 (2014)
Zhao, J., Cui, G., Gong, X., Zang, Y., Tao, S., Wang, D.: Fusion of visible and infrared images using global entropy and gradient constrained regularization. Infrared Phys. Technol. 81, 201–209 (2017)
Zheng, Y., Blasch, E., Liu, Z.: Multispectral Image Fusion and Colorization, vol. 481. SPIE Press, Bellingham (2018)
Zhou, Y., Berg, T.L.: Learning temporal transformations from time-lapse videos. In: European Conference on Computer Vision, Springer, pp. 262–277 (2016)
Zhu, J.Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision, Springer, pp. 597–613 (2016)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Zhu, J.Y., Zhang, R., Pathak, D., Darrell, T., Efros, A.A., Wang, O., Shechtman, E.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems, pp. 465–476 (2017)
Zong, J.J., Qiu, T.S.: Medical image fusion based on sparse representation of classified image patches. Biomed. Signal Process. Control 34, 195–205 (2017)
Acknowledgements
Authors are thankful to Science and Engineering Research Board (SERB), a statutory body of Department of Science and Technology (DST), Government of India for providing GPU computer system for experiments (ECR/2017/003268).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Patel, H., Upla, K.P. An unsupervised approach for thermal to visible image translation using autoencoder and generative adversarial network. Machine Vision and Applications 32, 99 (2021). https://doi.org/10.1007/s00138-021-01223-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-021-01223-4