Skip to main content

Subspace Enhancement and Colorization Network for Infrared Video Action Recognition

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13033))

Abstract

Human action recognition is an essential area of research in the field of computer vision. However, existing methods ignore the essence of infrared image spectral imaging. Compared with the visible modality with all three channels, the infrared modality with approximate single-channel pays more attention to the lightness contrast and loses the channel information. Therefore, we explore channel duplication and tend to investigate more appropriate feature presentations. We propose a subspace enhancement and colorization network (S\(^2\)ECNet) to recognize infrared video action recognition. Specifically, we apply the subspace enhancement (S\(^2\)E) module to promote edge contour extraction with subspace. Meanwhile, a subspace colorization (S\(^2\)C) module is utilized for better completing missing semantic information. What is more, the optical flow provides effective supplements for temporal information. Experiments conducted on the infrared action recognition dataset InfAR demonstrates the competitiveness of the proposed method compared with the state-of-the-arts.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pp. 568–576 (2014)

    Google Scholar 

  2. Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4489–4497 (2015)

    Google Scholar 

  3. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2

    Chapter  Google Scholar 

  4. Fan, Z., Zhao, X., Lin, T., Su, H.: Attention-based multiview re-observation fusion network for skeletal action recognition. IEEE Trans. Multimed. 21(2), 363–374 (2019)

    Article  Google Scholar 

  5. Lee, E.J., Ko, B.C., Nam, J.Y.: Recognizing pedestrian’s unsafe behaviors in far-infrared imagery at night. Infrared Phys. Technol. 76, 261–270 (2016)

    Article  Google Scholar 

  6. Akula, A., Shah, A.K., Ghosh, R.: Deep learning approach for human action recognition in infrared images. Cogn. Syst. Res. 50, 146–154 (2018)

    Article  Google Scholar 

  7. Huang, Z., Wang, Z., Tsai, C.C., Satoh, S., Lin, C.W.: DotSCN: group re-identification via domain-transferred single and couple representation learning. IEEE Trans. Circ. Syst. Video Technol. 31(7), 2739–2750 (2021)

    Article  Google Scholar 

  8. Kansal, K., Subramanyam, A.V., Wang, Z., Satoh, S.: SDL: spectrum-disentangled representation learning for visible-infrared person re-identification. IEEE Trans. Circ. Syst. Video Technol. 30(10), 3422–3432 (2020)

    Article  Google Scholar 

  9. Zhong, X., Lu, T., Huang, W., Ye, M., Jia, X., Lin, C.: Grayscale enhancement colorization network for visible-infrared person re-identification. IEEE Trans. Circ. Syst. Video Technol. (2021). https://doi.org/10.1109/TCSVT.2021.3072171

  10. Gowda, S.N., Yuan, C.: ColorNet: investigating the importance of color spaces for image classification. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11364, pp. 581–596. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20870-7_36

    Chapter  Google Scholar 

  11. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)

    Article  Google Scholar 

  12. Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6450–6459 (2018)

    Google Scholar 

  13. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733 (2017)

    Google Scholar 

  14. Lin, J., Gan, C., Han, S.: TSM: temporal shift module for efficient video understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7082–7092 (2019)

    Google Scholar 

  15. Wu, H., Ma, X., Li, Y.: Convolutional networks with channel and STIPs attention model for action recognition in videos. IEEE Trans. Multimed. 22(9), 2293–2306 (2020)

    Article  Google Scholar 

  16. Han, J., Bhanu, B.: Fusion of color and infrared video for moving human detection. Pattern Recogn. 40(6), 1771–1784 (2007)

    Article  Google Scholar 

  17. Zhu, Y., Guo, G.: A study on visible to infrared action recognition. IEEE Sig. Process. Lett. 20(9), 897–900 (2013)

    Article  Google Scholar 

  18. Gao, C., et al.: Infar dataset: infrared action recognition at different times. Neurocomputing 212, 36–47 (2016)

    Article  Google Scholar 

  19. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005). https://doi.org/10.1007/s11263-005-1838-7

    Article  Google Scholar 

  20. Kläser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of the BMVA British Machine Vision Conference (BMVC), pp. 1–10 (2008)

    Google Scholar 

  21. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the ACM International Conference on Multimedia (ACM MM), pp. 357–360 (2007)

    Google Scholar 

  22. Shi, Y., Tian, Y., Wang, Y., Zeng, W., Huang, T.: Learning long-term dependencies for action recognition with a biologically-inspired deep network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 716–725 (2017)

    Google Scholar 

  23. Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311 (2010)

    Google Scholar 

  24. Tsai, D.-M., Chiu, W.-Y., Lee, M.-H.: Optical flow-motion history image (OF-MHI) for action recognition. SIViP 9(8), 1897–1906 (2014). https://doi.org/10.1007/s11760-014-0677-9

    Article  Google Scholar 

  25. Hilsenbeck, B., Münch, D., Grosselfinger, A., Hübner, W., Arens, M.: Action recognition in the longwave infrared and the visible spectrum using hough forests. In: Proceedings of the IEEE International Symposium on Multimedia (ISM), pp. 329–332 (2016)

    Google Scholar 

  26. Jiang, Z., Rozgic, V., Adali, S.: Learning spatiotemporal features for infrared action recognition with 3D convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR), pp. 309–317 (2017)

    Google Scholar 

  27. Wang, L., Gao, C., Yang, L., Zhao, Y., Zuo, W., Meng, D.: PM-GANs: discriminative representation learning for action recognition using partial-modalities. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 389–406. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_24

    Chapter  Google Scholar 

  28. Wang, L., Gao, C., Zhao, Y., Song, T., Feng, Q.: Infrared and visible image registration using transformer adversarial network. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 1248–1252 (2018)

    Google Scholar 

  29. Ali, S., Bouguila, N.: Variational learning of beta-liouville hidden Markov models for infrared action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR), pp. 898–906 (2019)

    Google Scholar 

  30. de la Riva, M., Mettes, P.: Bayesian 3D convnets for action recognition from few examples. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCV), pp. 1337–1343 (2019)

    Google Scholar 

  31. Liu, Y., Lu, Z., Li, J., Yang, T., Yao, C.: Global temporal representation based CNNs for infrared action recognition. IEEE Sig. Process. Lett. 25(6), 848–852 (2018)

    Article  Google Scholar 

  32. Imran, J., Raman, B.: Deep residual infrared action recognition by integrating local and global spatio-temporal cues. Infrared Phys. Technol. 102, 103014 (2019)

    Article  Google Scholar 

  33. Chen, X., Gao, C., Li, C., Yang, Y., Meng, D.: Infrared action detection in the dark via cross-stream attention mechanism. IEEE Trans. Multimed. (2021). https://doi.org/10.1109/TMM.2021.3050069

  34. Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40

    Chapter  Google Scholar 

  35. He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2020)

    Article  Google Scholar 

  36. Su, J., Chu, H., Huang, J.: Instance-aware image colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7965–7974 (2020)

    Google Scholar 

  37. Wang, H., Kläser, A., Schmid, C., Liu, C.: Action recognition by dense trajectories. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176 (2011)

    Google Scholar 

  38. Khebli, A., Meglouli, H., Bentabet, L., Airouche, M.: A new technique based on 3D convolutional neural networks and filtering optical flow maps for action classification in infrared video. Control Eng. Appl. Inform. 21(4), 43–50 (2019)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Fundamental Research Funds for the Central Universities of China under Grant 191010001, in part by the Hubei Key Laboratory of Transportation Internet of Things under Grant 2020III026GX.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xian Zhong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, L., Zhong, X., Liu, W., Zhao, S., Yang, Z., Zhong, L. (2021). Subspace Enhancement and Colorization Network for Infrared Video Action Recognition. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds) PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021. Lecture Notes in Computer Science(), vol 13033. Springer, Cham. https://doi.org/10.1007/978-3-030-89370-5_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89370-5_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89369-9

  • Online ISBN: 978-3-030-89370-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics