Abstract
Vehicle exhaust emissions are notorious for being unhealthy both for humans and the environment. Smoke vehicle, emitting excess levels of visible black smoke, is representative heavy pollution vehicle. It is a challenging task to recognize smoke vehicles from traffic surveillance due to the large variance of smoke color, texture, and interference. To solve this problem, this paper proposes smoke vehicle detection methods by learning spatial-temporal representation from image sequences. Firstly, motion detection algorithm is used to obtain the rear section of vehicle that need to be identified. Then, space information of each suspected frame is captured by Inception V3 convolutional neural network (CNN), and a temporal Multi-Layer Perception (MLP) or Long Short Term Memory network (LSTM) is used to effectively train the smoke vehicle model. The first method attempts to jointly model spatial-temporal clues for smoke vehicle detection in the video by fully-connected layers. The second method aims to learn temporal dependencies between video frames with LSTM. LSTM networks could combine image information in video over a longer period of time. Experimental results on our dataset have shown that the LSTM-based model achieve a highly accuracy of 97.6875%, and there is 9.25% improvement over the single frame model.
Similar content being viewed by others
References
Barnich O, Van DM (2011) ViBe: a universal background subtraction algorithm for video sequences. [J]. IEEE Trans Image Process 20(6):1709–1724
Bengio Y, Simard P, Frasconi P (2002) Learning long-term dependencies with gradient descent is difficult.[J]. IEEE Trans Neural Netw 5(2):157–166
Cardoso GC, Mestha LK (2014) Image-based determination of CO and CO2 concentrations in vehicle exhaust gas emissions: U.S. Patent 8,854,223[P]
Chen J, Song X, Nie L, et al (2016) Micro tells macro: predicting the popularity of micro-videos via a transductive model[C]//Proceedings of the 24th ACM international conference on Multimedia. ACM, p 898–907
Favorskaya M, Pyataeva A, Popov A (2015) Verification of smoke detection in video sequences based on spatio-temporal local binary patterns[J]. Procedia Comput Sci 60:671–680
Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position[J]. Biol Cybern 36(4):193–202
Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks[J]. J Mach Learn Res 3:115–143
Graves A (1997) Long short-term memory[J]. Neural Comput 9(8):1735–1780
Gubbi J, Marusic S, Palaniswami M (2009) Smoke detection in video using wavelets and support vector machines[J]. Fire Saf J 44(8):1110–1115
Hu Y, Lu X (2018) Real-time video fire smoke detection by utilizing spatial-temporal ConvNet features[J]. Multimed Tools Appl 77(8):1–19
Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat's visual cortex[J]. J Physiol 160(1):106–154
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167
Ji Z, He E, Wang H et al (2019) Image-attribute reciprocally guided attention network for pedestrian attribute recognition[J]. Pattern Recogn Lett 120:89–95
Ji Z, Xiong K, Pang Y, Li X (2017) Video summarization with attention-based encoder-decoder networks[J]. arXiv preprint arXiv:1708.09545
Kaabi R, Frizzi S, Bouchouicha M, et al (2017) Video smoke detection review: State of the art of smoke detection in visible and IR range[C]//2017 International Conference on Smart, Monitored and Controlled Cities (SM2C). IEEE, p 81–86
Lin M, Chen Q, Yan S (2013) Network in network[J]. arXiv preprint arXiv:1312.4400
Liu H, Chen S, Kubota N (2013) Intelligent video systems and analytics: a survey[J]. IEEE Trans Ind Inf 9(3):1222–1233
Liu W , Anguelov D , Erhan D, et al (2016) SSD: single shot MultiBox detector[C]// European Conference on Computer Vision. Springer, Cham
Liu YH, Liao WY, Li L et al (2017) Vehicle emission trends in China's Guangdong Province from 1994 to 2014[J]. Sci Total Environ 586:512–521
Mozer MC (1995) A focused backpropagation algorithm for temporal pattern recognition[M]// Backpropagation. L. Erlbaum Associates Inc., p 349–381
Pyykönen P, Peussa P, Kutila M, et al (2016) Multi-camera-based smoke detection and traffic pollution analysis system[C]// IEEE, International Conference on Intelligent Computer Communication and Processing. IEEE, p 233–238
Raj M, Semwal VB, Nandi GC (2018) Bidirectional association of joint angle trajectories for humanoid locomotion: the restricted Boltzmann machine approach[J]. Neural Comput & Applic 30(6):1747–1755
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. p 7263–7271
Robinson A J, Fallside F (1987) The utility driven dynamic error propagation network[M]. University of Cambridge Department of Engineering
Salehinejad H, Sankar S, Barfett J, et al (2017) Recent advances in recurrent neural networks[J]. arXiv preprint arXiv:1801.01078
Semwal VB, Raj M, Nandi GC (2015) Biometric gait identification based on a multilayer perceptron[J]. Robot Auton Syst 65:65–75
Semwal VB, Mondal K, Nandi GC (2017) Robust and accurate feature selection for humanoid push recovery and classification: deep learning approach[J]. Neural Comput & Applic 28(3):565–574
Semwal VB, Singha J, Sharma PK et al (2017) An optimized feature selection technique based on incremental feature analysis for bio-metric gait data classification[J]. Multimed Tools Appl 76(22):24457–24475
Semwal VB, Gaud N, Nandi GC (2019) Human gait state prediction using cellular automata and classification using ELM[M]//Machine Intelligence and Signal Analysis. Springer, Singapore, 135–145.
Song X, Feng F, Liu J, et al (2017) Neurostylist: neural compatibility modeling for clothing matching[C]//Proceedings of the 25th ACM international conference on Multimedia. ACM, p 753–761
Song X, Feng F, Han X, et al (2018) Neural compatibility modeling with attentive knowledge distillation[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, p 5–14
Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting[J]. J Mach Learn Res 15(1):1929–1958
Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. p 1–9.
Szegedy C, Vanhoucke V, Ioffe S, et al (2016) Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. p 2818–2826
Tao H, Lu X (2018) Smoky vehicle detection based on multi-scale block Tamura features[J]. SIViP 12(6):1061–1068
Tao H, Lu X (2018) Smoky vehicle detection based on multi-feature fusion and ensemble neural networks[J]. Multimed Tools Appl 77(24):32153–32177
Tao H, Lu X (2018) Smoky vehicle detection in surveillance video based on gray level co-occurrence matrix[C]//Tenth International Conference on Digital Image Processing (ICDIP 2018). International Society for Optics and Photonics, 10806:1080642
Tao H, Lu X (2019) Contour-based smoky vehicle detection from surveillance video for alarm systems[J]. SIViP 13(2):217–225
Tao D, Lin X, Jin L et al (2016) Principal component 2-D long short-term memory for font recognition on single Chinese characters[J]. IEEE Trans Cybern 46(3):756–765
Tao D, Guo Y, Li Y, et al (2017) Tensor rank preserving discriminant analysis for facial recognition[J]. IEEE Trans Image Process PP(99):1–1
Tao D, Guo Y, Yu B et al (2018) Deep multi-view feature learning for person re-identification[J]. IEEE Trans Circuits Syst Video Technol 28(10):2657–2666
Tatikonda RR, Kulkarni VB (2017) Exhaust gas emission analysis of automotive vehicles using FPGA[C]//Proceedings of the International Conference on Data Engineering and Communication Technology. Springer, Singapore, p 109–117
Tian H, Li W, Ogunbona P, et al (2011) Smoke detection in videos using Non-Redundant Local Binary Pattern-based features[C]// IEEE 13th International Workshop on Multimedia Signal Processing (MMSP 2011), Hangzhou, China, October 17–19, 2011. IEEE
Tian H, Li W, Ogunbona PO et al (2017) Detection and separation of smoke from single image frames[J]. IEEE Trans Image Process 27(3):1164–1177
Töreyin BU, Dedeoğlu Y, Cetin AE (2005) Wavelet based real-time smoke detection in video[C]//2005 13th European Signal Processing Conference. IEEE, p 1–4
Werbos PJ (1988) Generalization of backpropagation with application to a recurrent gas market model[J]. Neural Netw 1(4):339–356
Yin Z, Wan B, Yuan F et al (2017) A deep normalization and convolutional neural network for image smoke detection[J]. IEEE Access 5(99):18429–18438
Yin M, Lang C, Li Z et al (2019) Recurrent convolutional network for video-based smoke detection[J]. Multimed Tools Appl 78(1):237–256
Yuan F (2011) Video-based smoke detection with histogram sequence of LBP and LBPV pyramids[J]. Fire Saf J 46(3):132–139
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No.61871123), Key Research and Development Program in Jiangsu Province (No.BE2016739) and a Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cao, Y., Lu, X. Learning spatial-temporal representation for smoke vehicle detection. Multimed Tools Appl 78, 27871–27889 (2019). https://doi.org/10.1007/s11042-019-07926-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-07926-1