Action Recognition in Haze Using an Efficient Fusion of Spatial and Temporal Features

Tanneru, Sri Girinadh; Mukherjee, Snehasis

doi:10.1007/978-981-16-1092-9_3

Sri Girinadh Tanneru⁹ &
Snehasis Mukherjee¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1377))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

1490 Accesses

Abstract

Action recognition in video sequences is an active research problem in Computer Vision. However, no significant efforts have been made for recognizing actions in hazy videos. This paper proposes a novel unified model for action recognition in hazy video using an efficient combination of a Convolutional Neural Network (CNN) for obtaining the dehazed video first, followed by extracting spatial features from each frame, and a deep bidirectional LSTM (DB-LSTM) network for extracting the temporal features during action. First, each frame of the hazy video is fed into the AOD-Net (All-in-One Dehazing Network) model to obtain the clear representation of frames. Next, spatial features are extracted from every sampled dehazed frame (produced by the AOD-Net model) by using a pre-trained VGG-16 architecture, which helps reduce the redundancy and complexity. Finally, the temporal information across the frames are learnt using a DB-LSTM network, where multiple LSTM layers are stacked together in both the forward and backward passes of the network. The proposed unified model is the first attempt to recognize human action in hazy videos. Experimental results on a synthetic hazy video dataset show state-of-the-art performances in recognizing actions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Spatial–temporal injection network: exploiting auxiliary losses for action recognition with apparent difference and self-attention

Article 11 August 2022

Leveraging cross-resolution attention for effective extreme low-resolution video action recognition

Article 16 September 2023

Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition

Article Open access 11 July 2021

References

Kopf, J., et al.: Deep photo: model-based photograph enhancement and viewing. In: SIGGRAPH Asia (2008)
Google Scholar
Fattal, R.: Single image dehazing. ACM Trans. Graph 27(3), 72:1–72:9 (2008)
Article Google Scholar
Narasimhan, S.G., Nayar, S.K.: Interactive deweathering of an image using physical models. In: Workshop on Color and Photometric Methods in Computer Vision (2003)
Google Scholar
Tan, R.: Visibility in bad weather from a single image. In: CVPR (2008)
Google Scholar
He, K., Sun, J., Tang, X.: Single image haze removal using dark channel prior. In: CVPR. IEEE (2009)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Ren, Z., et al.: Bidirectional homeostatic regulation of a depression-related brain state by gamma-aminobutyric acidergic deficits and ketamine treatment. Biol. Psychiatry 80, 457–468 (2016)
Article Google Scholar
Li, B., Peng, X., Wang, Z., Xu, J., Feng, D.: AOD-net: all-in-one dehazing network (2017)
Google Scholar
Ren, W., et al.: Single image dehazing via multi-scale convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 154–169. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_10
Chapter Google Scholar
Cai, B., Xu, X., Jia, K., Qing, C., Tao, D.: DehazeNet: an end-to-end system for single image haze removal. IEEE Trans. Image Process. 25(11), 5187–5198 (2016)
Article MathSciNet Google Scholar
Santra, S., Mondal, R., Chanda, B.: Learning a patch quality comparator for single image dehazing. IEEE Trans. Image Process. 27(9), 4598–4607 (2018)
Article MathSciNet Google Scholar
Hong, J., Cho, B., Hong, Y.W., Byun, H.: Contextual action cues from camera sensor for multi-stream action recognition. Sensors 19(6), 1382 (2019)
Article Google Scholar
Crasto, N., Weinzaepfel, P., Alahari, K., Schmid, C.: MARS: motion-augmented RGB stream for action recognition. In: CVPR, pp. 7882–7891 (2019)
Google Scholar
Hanson, A., PNVR, K., Krishnagopal, S., Davis, L.: Bidirectional convolutional LSTM for the detection of violence in videos. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11130, pp. 280–295. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11012-3_24
Chapter Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human action classes from videos in the wild. Report no. CRCV-TR-12-01 (November 2012)
Google Scholar
Borkar, K., Mukherjee, S.: Video dehazing using LMNN with respect to augmented MRF. In: ICVGIP 2018, pp. 42:1–42:9. ACM (2018)
Google Scholar
Kong, Y., Fu, Y.: Human action recognition and prediction: a survey. arXiv:1806.11230 (2018)
Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing interactions between human performers by dominating pose doublet. Mach. Vis. Appl. 25(4), 1033–1052 (2014)
Article Google Scholar
Mukherjee, S., Singh, K.K.: Human action and event recognition using a novel descriptor based on improved dense trajectories. Multimedia Tools Appl. 77(11), 13661–13678 (2018)
Article Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV (2003)
Google Scholar
Vinodh, B., Sunitha Gowd, T., Mukherjee, S.: Event recognition in egocentric videos using a novel trajectory based feature. In: ICVGIP 2016, pp. 76:1–76:8 (2016)
Google Scholar

Download references

Acknowledgement

The authors wish to thank the NVIDIA for providing a TITANX GPU which was used for conducting experiments related to this study.

Author information

Authors and Affiliations

IIIT SriCity, Chittoor, India
Sri Girinadh Tanneru
Shiv Nadar University, Greater Noida, India
Snehasis Mukherjee

Authors

Sri Girinadh Tanneru
View author publications
You can also search for this author in PubMed Google Scholar
Snehasis Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Indian Institute of Information Technology Allahabad, Prayagraj, India
Satish Kumar Singh
Indian Institute of Technology Roorkee, Roorkee, India
Partha Roy
Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Information Technology Allahabad, Prayagraj, India
P. Nagabhushan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tanneru, S.G., Mukherjee, S. (2021). Action Recognition in Haze Using an Efficient Fusion of Spatial and Temporal Features. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds) Computer Vision and Image Processing. CVIP 2020. Communications in Computer and Information Science, vol 1377. Springer, Singapore. https://doi.org/10.1007/978-981-16-1092-9_3

Download citation

DOI: https://doi.org/10.1007/978-981-16-1092-9_3
Published: 28 March 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1091-2
Online ISBN: 978-981-16-1092-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics