Abstract
Smoke is a typical symptom of early fire, and the appearance of a large amount of abnormal smoke usually indicates an impending abnormal accident. A smart smoke detection method can substantially reduce damage caused by fires in cities, factories and forests, it is also an important component of intelligent surveillance system. However, existing image-based detection methods often suffer from the lack of dynamic information, and video-based methods are usually computing-expensive because more input images need to be processed. In this work, we propose a novel and efficient Quasi Video Smoke Detector (QuasiVSD) to bridge the gap between image-based and video-based smoke detection. By regarding an unannotated image as reference, QuasiVSD can obtain motion-aware attention from just two frames. Moreover, Weakly Guided Attention Module is designed to further refine the feature representation for smoke regions. Finally, extensive experiments on real-world dataset show that our QuasiVSD achieves clear improvements against the image-based best competitors (CenterNet) by 4.71 with almost same parameters and FLOPs. And the computational complexity of QuasiVSD is just a fraction of that of general video understanding framework. Code will be available at: https://github.com/Caoyichao/VSDT.
Similar content being viewed by others
References
Gaur A, Singh A, Kumar A, Kumar A, Kapoor K (2020) Video flame and smoke based fire detection algorithms: a literature review. Fire Technol 56(5):1943–1980. https://doi.org/10.1007/s10694-020-00986-y
Gunay O, Toreyin BU, Kose K, Cetin AE (2012) Entropy-functional-based online adaptive decision fusion framework with application to wildfire detection in video. IEEE Trans Image Process 21(5):2853–2865
Tian H, Li W, Wang L, Ogunbona P (2014) Smoke detection in video: an image separation approach. Int J Comput Vision 106(2):192–209
Tian H, Li W, Ogunbona PO, Wang L (2018) Detection and separation of smoke from single image frames. IEEE Trans Image Process 27(3):1164–1177. https://doi.org/10.1109/TIP.2017.2771499
Yin Z, Wan B, Yuan F, Xia X, Shi J (2017) A deep normalization and convolutional neural network for image smoke detection. IEEE Access 5:18429–18438
Yuan F, Zhang L, Xia X, Huang Q, Li X (2019) A wave-shaped deep neural network for smoke density estimation. IEEE Transactions on Image Processing, pp 1-1. https://doi.org/10.1109/TIP.2019.2946126.
Muhammad K, Khan S, Palade V, Mehmood I, de Albuquerque VHC (2020) Edge intelligence-assisted smoke detection in foggy surveillance environments. IEEE Trans Industr Inf 16(2):1067–1075. https://doi.org/10.1109/TII.2019.2915592
Li S, Yan Q, Liu P (2020) An efficient fire detection method based on multiscale feature extraction, implicit deep supervision and channel attention mechanism. IEEE Trans Image Process 29:8467–8475. https://doi.org/10.1109/TIP.2020.3016431
Khan S, Muhammad K, Mumtaz S, Baik SW, de Albuquerque VHC (2019) Energy-efficient deep CNN for smoke detection in foggy IoT environment. IEEE Internet Things J 6(6):9237–9245. https://doi.org/10.1109/JIOT.2019.2896120
Dimitropoulos K, Barmpoutis P, Grammalidis N (2017) Higher order linear dynamical systems for smoke detection in video surveillance applications. IEEE Trans Circuits Syst Video Technol 27(5):1143–1154. https://doi.org/10.1109/TCSVT.2016.2527340
Lin G, Zhang Y, Xu G, Zhang Q (2019) Smoke detection on video sequences using 3D convolutional neural networks. Fire Technol, https://doi.org/10.1007/s10694-019-00832-w.
Yuan FN (2012) A double mapping framework for extraction of shape-invariant features based on multi-scale partitions with AdaBoost for video smoke detection. Pattern Recogn 45(12):4326–4336
Long C et al. (2010) Transmission: a new feature for computer vision based smoke detection. In: Artificial intelligence and computational intelligence. vol. 6319, F. L. Wang, H. Deng, Y. Gao, and J. Lei, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 389-396
Liu L et al (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318. https://doi.org/10.1007/s11263-019-01247-4
Zhao Z-Q, Zheng P, Xu S-T, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865
Girshick R, Donahue J, Darrell T, Malik J (2020) Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv:1311.2524 [cs], Oct. 2014, Accessed: Dec. 07, 2020. [Online]. Available: arXiv:1311.2524
Girshick R (2020) Fast R-CNN. arXiv:1504.08083 [cs], Sep. 2015, Accessed: Dec. 07, 2020. [Online]. Available: arXiv:1504.08083
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
He K, Gkioxari G, Dollár P, Girshick R (2020) Mask R-CNN. arXiv:1703.06870 [cs], Jan. 2018, Accessed: Dec. 07, 2020. [Online]. Available: arXiv:1703.06870
Redmon J, Farhadi A (2020) YOLO9000: Better, faster, stronger. arXiv:1612.08242 [cs], Dec. 2016, Accessed: Jun. 24, 2020. [Online]. Available: arXiv:1612.08242
Liu W et al. (2016) SSD: Single shot multiBox detector. In: Computer Vision-ECCV. Cham 2016:21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Zhou X, Wang D, Krähenbühl P (2020) Objects as points. arXiv:1904.07850 [cs], Apr. 2019, Accessed: Dec. 06, 2020. [Online]. Available: arXiv:1904.07850
Donahue J et al. (2015) Long-term recurrent convolutional networks for visual recognition and description. Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2625-2634
Zhou B, Andonian A, Oliva A, Torralba A (2018) Temporal relational reasoning in videos. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision—ECCV 2018, vol 11205. Springer International Publishing, Cham, pp 831–846
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In 2015 IEEE International Conference on Computer Vision (ICCV), pp 4489-4497, https://doi.org/10.1109/ICCV.2015.510.
Zolfaghari M, Singh K, Brox T (2018) ECO: Efficient convolutional network for online video understanding. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision—ECCV 2018, vol 11206. Springer International Publishing, Cham, pp 713–730
Carreira J, Zisserman A (2017) Quo vadis, Action recognition? A new model and the kinetics dataset. Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6299-6308
Xie D, Deng C, Wang H, Li C, Tao D (2019) Semantic adversarial network with multi-scale pyramid attention for video classification. Proceed AAAI Conf Artif Intell 33:9030–9037. https://doi.org/10.1609/aaai.v33i01.33019030
Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp 936-944, https://doi.org/10.1109/CVPR.2017.106.
Szegedy Cet al. (2015) Going deeper with convolutions. pp 1-9, Accessed: Jul. 25, 2019. [Online]
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. pp 770-778, Accessed: Oct. 10, 2019. [Online]
Huang G, Liu Z, van der Maaten L, Weinberger KQ (2018) Densely connected convolutional networks. arXiv:1608.06993 [cs], Accessed: Dec. 11, 2020. [Online]. Available: arXiv:1608.06993
Howard Aet al. (2019) Searching for MobileNetV3. arXiv:1905.02244 [cs], Accessed: Dec. 11, 2020. [Online]. Available: arXiv:1905.02244
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv:1406.2199 [cs], Accessed: Oct. 14, 2020. [Online]. Available: arXiv:1406.2199
Lin T Y, Goyal P, Girshick R et al. Focal loss for dense object detection. Presented at the Proceedings of the IEEE International Conference on Computer Vision, pp 2980-2988
Huang Z, Zhang T, Heng W, Shi B, Zhou S (2020) RIFE: Real-time intermediate flow estimation for video frame interpolation. arXiv:2011.06294 [cs], Accessed: Nov. 24, 2020. [Online]. Available: arXiv:2011.06294
Kingma D P, Ba J (2017) Adam: A method for stochastic optimization. arXiv:1412.6980 [cs], Accessed: Dec. 19, 2020. [Online]. Available: arXiv:1412.6980
Bochkovskiy A, Wang CY, and Liao HYM (2020) YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934 [cs, eess], Accessed: Dec. 19, 2020. [Online]. Available: arXiv:2004.10934
Tan M, Pang R, Le QV (2020) EfficientDet: Scalable and efficient object detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 10778-10787, https://doi.org/10.1109/CVPR42600.2020.01079.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No.61871123), Key Research and Development Program in Jiangsu Province (No.BE2016739) and is a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions. We thank the Big Data Center of Southeast University for providing facility support for the numerical calculations in this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cao, Y., Tang, Q., Xu, S. et al. QuasiVSD: efficient dual-frame smoke detection. Neural Comput & Applic 34, 8539–8550 (2022). https://doi.org/10.1007/s00521-021-06606-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06606-2