Skip to main content

Advertisement

Log in

Learning spatial-temporal representation for smoke vehicle detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Vehicle exhaust emissions are notorious for being unhealthy both for humans and the environment. Smoke vehicle, emitting excess levels of visible black smoke, is representative heavy pollution vehicle. It is a challenging task to recognize smoke vehicles from traffic surveillance due to the large variance of smoke color, texture, and interference. To solve this problem, this paper proposes smoke vehicle detection methods by learning spatial-temporal representation from image sequences. Firstly, motion detection algorithm is used to obtain the rear section of vehicle that need to be identified. Then, space information of each suspected frame is captured by Inception V3 convolutional neural network (CNN), and a temporal Multi-Layer Perception (MLP) or Long Short Term Memory network (LSTM) is used to effectively train the smoke vehicle model. The first method attempts to jointly model spatial-temporal clues for smoke vehicle detection in the video by fully-connected layers. The second method aims to learn temporal dependencies between video frames with LSTM. LSTM networks could combine image information in video over a longer period of time. Experimental results on our dataset have shown that the LSTM-based model achieve a highly accuracy of 97.6875%, and there is 9.25% improvement over the single frame model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Barnich O, Van DM (2011) ViBe: a universal background subtraction algorithm for video sequences. [J]. IEEE Trans Image Process 20(6):1709–1724

    Article  MathSciNet  MATH  Google Scholar 

  2. Bengio Y, Simard P, Frasconi P (2002) Learning long-term dependencies with gradient descent is difficult.[J]. IEEE Trans Neural Netw 5(2):157–166

    Article  Google Scholar 

  3. Cardoso GC, Mestha LK (2014) Image-based determination of CO and CO2 concentrations in vehicle exhaust gas emissions: U.S. Patent 8,854,223[P]

  4. Chen J, Song X, Nie L, et al (2016) Micro tells macro: predicting the popularity of micro-videos via a transductive model[C]//Proceedings of the 24th ACM international conference on Multimedia. ACM, p 898–907

  5. Favorskaya M, Pyataeva A, Popov A (2015) Verification of smoke detection in video sequences based on spatio-temporal local binary patterns[J]. Procedia Comput Sci 60:671–680

  6. Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position[J]. Biol Cybern 36(4):193–202

    Article  MATH  Google Scholar 

  7. Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks[J]. J Mach Learn Res 3:115–143

    MathSciNet  MATH  Google Scholar 

  8. Graves A (1997) Long short-term memory[J]. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  9. Gubbi J, Marusic S, Palaniswami M (2009) Smoke detection in video using wavelets and support vector machines[J]. Fire Saf J 44(8):1110–1115

    Article  Google Scholar 

  10. Hu Y, Lu X (2018) Real-time video fire smoke detection by utilizing spatial-temporal ConvNet features[J]. Multimed Tools Appl 77(8):1–19

    Google Scholar 

  11. Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat's visual cortex[J]. J Physiol 160(1):106–154

    Article  Google Scholar 

  12. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167

  13. Ji Z, He E, Wang H et al (2019) Image-attribute reciprocally guided attention network for pedestrian attribute recognition[J]. Pattern Recogn Lett 120:89–95

    Article  Google Scholar 

  14. Ji Z, Xiong K, Pang Y, Li X (2017) Video summarization with attention-based encoder-decoder networks[J]. arXiv preprint arXiv:1708.09545

  15. Kaabi R, Frizzi S, Bouchouicha M, et al (2017) Video smoke detection review: State of the art of smoke detection in visible and IR range[C]//2017 International Conference on Smart, Monitored and Controlled Cities (SM2C). IEEE, p 81–86

  16. Lin M, Chen Q, Yan S (2013) Network in network[J]. arXiv preprint arXiv:1312.4400

  17. Liu H, Chen S, Kubota N (2013) Intelligent video systems and analytics: a survey[J]. IEEE Trans Ind Inf 9(3):1222–1233

    Article  Google Scholar 

  18. Liu W , Anguelov D , Erhan D, et al (2016) SSD: single shot MultiBox detector[C]// European Conference on Computer Vision. Springer, Cham

  19. Liu YH, Liao WY, Li L et al (2017) Vehicle emission trends in China's Guangdong Province from 1994 to 2014[J]. Sci Total Environ 586:512–521

    Article  Google Scholar 

  20. Mozer MC (1995) A focused backpropagation algorithm for temporal pattern recognition[M]// Backpropagation. L. Erlbaum Associates Inc., p 349–381

  21. Pyykönen P, Peussa P, Kutila M, et al (2016) Multi-camera-based smoke detection and traffic pollution analysis system[C]// IEEE, International Conference on Intelligent Computer Communication and Processing. IEEE, p 233–238

  22. Raj M, Semwal VB, Nandi GC (2018) Bidirectional association of joint angle trajectories for humanoid locomotion: the restricted Boltzmann machine approach[J]. Neural Comput & Applic 30(6):1747–1755

    Article  Google Scholar 

  23. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. p 7263–7271

  24. Robinson A J, Fallside F (1987) The utility driven dynamic error propagation network[M]. University of Cambridge Department of Engineering

  25. Salehinejad H, Sankar S, Barfett J, et al (2017) Recent advances in recurrent neural networks[J]. arXiv preprint arXiv:1801.01078

  26. Semwal VB, Raj M, Nandi GC (2015) Biometric gait identification based on a multilayer perceptron[J]. Robot Auton Syst 65:65–75

    Article  Google Scholar 

  27. Semwal VB, Mondal K, Nandi GC (2017) Robust and accurate feature selection for humanoid push recovery and classification: deep learning approach[J]. Neural Comput & Applic 28(3):565–574

    Article  Google Scholar 

  28. Semwal VB, Singha J, Sharma PK et al (2017) An optimized feature selection technique based on incremental feature analysis for bio-metric gait data classification[J]. Multimed Tools Appl 76(22):24457–24475

    Article  Google Scholar 

  29. Semwal VB, Gaud N, Nandi GC (2019) Human gait state prediction using cellular automata and classification using ELM[M]//Machine Intelligence and Signal Analysis. Springer, Singapore, 135–145.

  30. Song X, Feng F, Liu J, et al (2017) Neurostylist: neural compatibility modeling for clothing matching[C]//Proceedings of the 25th ACM international conference on Multimedia. ACM, p 753–761

  31. Song X, Feng F, Han X, et al (2018) Neural compatibility modeling with attentive knowledge distillation[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, p 5–14

  32. Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting[J]. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  33. Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. p 1–9.

  34. Szegedy C, Vanhoucke V, Ioffe S, et al (2016) Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. p 2818–2826

  35. Tao H, Lu X (2018) Smoky vehicle detection based on multi-scale block Tamura features[J]. SIViP 12(6):1061–1068

    Article  Google Scholar 

  36. Tao H, Lu X (2018) Smoky vehicle detection based on multi-feature fusion and ensemble neural networks[J]. Multimed Tools Appl 77(24):32153–32177

    Article  Google Scholar 

  37. Tao H, Lu X (2018) Smoky vehicle detection in surveillance video based on gray level co-occurrence matrix[C]//Tenth International Conference on Digital Image Processing (ICDIP 2018). International Society for Optics and Photonics, 10806:1080642

  38. Tao H, Lu X (2019) Contour-based smoky vehicle detection from surveillance video for alarm systems[J]. SIViP 13(2):217–225

    Article  Google Scholar 

  39. Tao D, Lin X, Jin L et al (2016) Principal component 2-D long short-term memory for font recognition on single Chinese characters[J]. IEEE Trans Cybern 46(3):756–765

  40. Tao D, Guo Y, Li Y, et al (2017) Tensor rank preserving discriminant analysis for facial recognition[J]. IEEE Trans Image Process PP(99):1–1

  41. Tao D, Guo Y, Yu B et al (2018) Deep multi-view feature learning for person re-identification[J]. IEEE Trans Circuits Syst Video Technol 28(10):2657–2666

    Article  Google Scholar 

  42. Tatikonda RR, Kulkarni VB (2017) Exhaust gas emission analysis of automotive vehicles using FPGA[C]//Proceedings of the International Conference on Data Engineering and Communication Technology. Springer, Singapore, p 109–117

  43. Tian H, Li W, Ogunbona P, et al (2011) Smoke detection in videos using Non-Redundant Local Binary Pattern-based features[C]// IEEE 13th International Workshop on Multimedia Signal Processing (MMSP 2011), Hangzhou, China, October 17–19, 2011. IEEE

  44. Tian H, Li W, Ogunbona PO et al (2017) Detection and separation of smoke from single image frames[J]. IEEE Trans Image Process 27(3):1164–1177

    Article  MathSciNet  MATH  Google Scholar 

  45. Töreyin BU, Dedeoğlu Y, Cetin AE (2005) Wavelet based real-time smoke detection in video[C]//2005 13th European Signal Processing Conference. IEEE, p 1–4

  46. Werbos PJ (1988) Generalization of backpropagation with application to a recurrent gas market model[J]. Neural Netw 1(4):339–356

    Article  Google Scholar 

  47. Yin Z, Wan B, Yuan F et al (2017) A deep normalization and convolutional neural network for image smoke detection[J]. IEEE Access 5(99):18429–18438

    Article  Google Scholar 

  48. Yin M, Lang C, Li Z et al (2019) Recurrent convolutional network for video-based smoke detection[J]. Multimed Tools Appl 78(1):237–256

    Article  Google Scholar 

  49. Yuan F (2011) Video-based smoke detection with histogram sequence of LBP and LBPV pyramids[J]. Fire Saf J 46(3):132–139

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No.61871123), Key Research and Development Program in Jiangsu Province (No.BE2016739) and a Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaobo Lu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, Y., Lu, X. Learning spatial-temporal representation for smoke vehicle detection. Multimed Tools Appl 78, 27871–27889 (2019). https://doi.org/10.1007/s11042-019-07926-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-07926-1

Keywords

Navigation