Abstract
Traffic-related multimedia analysis has become increasingly important in both research community and industry. In this paper, we study the problem of image-based classification of abnormal traffic objects. Different from previous works that focusing on only the normal object categories, our work aims to classify both the category and the working status of a traffic object. We construct a new dataset, namely Abnormal Traffic Object Classification (ATOC), for the study of the above problem. ATOC contains 6 kinds of traffic objects and for each main category there are also two sub-categories covering the normal and abnormal status of the objects. We propose a novel deep-learning based framework to solve our problem and provide a strong baseline for future studies. Specifically, we adopt a pre-trained deep convolutional network for feature extraction and use support vector machine for classification. We also utilize random sample pairing to augment the dataset and introduce attention mechanism to further refine the feature representation. Experimental results demonstrate that the proposed method achieves superior performance than the state-of-art deep learning approaches for the recognition of objects’ categories and the corresponding working status in traffic scenarios.
Similar content being viewed by others
References
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition, CVPR 2005, pp 886–893
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings ninth IEEE international conference on computer vision, ICCV 2003, pp 1470–1477
Bin Y, Yang Y, Shen F, Xie N, Shen HT, Li X (2018) Describing video with attention based bidirectional lstm. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2018.2831447
Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Sys Technol (TIST) 2(3):27
Chang X, Yu Y-L, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39 (8):1617–1632
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv:1405.3531
Cheng Z, Ding Y, He X, Zhu L, Song X, Kankanhalli MS (2018) A ̂ 3ncf: an adaptive aspect attention model for rating prediction. In: IJCAI, pp 3748–3754
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255
Gao L, Guo Z, Zhang H, Xu X, Shen HT (2017) Video captioning with attention-based LSTM and semantic consistency. IEEE Trans Multimedia 19 (9):2045–2055
Guo Y, Cheng Z, Nie L, Wang Y, Ma J, Kankanhalli M (2019) Attentive long short-term preference modeling for personalized product search. ACM Trans Inform Sys (TOIS) 37(2):19
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp 770–778
Hu M, Yang Y, Shen F, Xie N, Shen HT (2018) Hashing with angular reconstructive embeddings. IEEE Trans Image Process 27(2):545–555
Inoue H (2018) Data augmentation by pairing samples for images classification. arXiv:1801.02929
Khosla A, Jayadevaprakash N, Yao B, Fei-Fei L (2011) Novel dataset for fine-grained image categorization. In: First workshop on fine-grained visual categorization, IEEE conference on computer vision and pattern recognition, Colorado Springs, CO
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: 2013 IEEE international conference on computer vision workshops, pp 554–561
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, USA, pp 1106–1114
Kummerer M, Wallis TS, Gatys LA, Bethge M (2017) Understanding low- and high-level contributions to fixation prediction. In: 2017 IEEE international conference on computer vision (ICCV), pp 4799–4808
Li C, Huang Z, Yang Y, Cao J, Sun X, Shen HT (2017) Hierarchical latent concept discovery for video event detection. IEEE Trans Image Process 26(5):2149–2162. https://doi.org/10.1109/TIP.2017.2670782
Li Z, Nie F, Chang X, Yang Y (2017) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng 29(10):2100–2110
Liao L, Hu R, Xiao J, Wang Q, Xiao J, Chen J (2015) Exploiting effects of parts in fine-grained categorization of vehicles. In: 2015 IEEE international conference on image processing (ICIP), pp 745–749
Lin Y-L, Morariu VI, Hsu WH, Davis LS (2014) Jointly optimizing 3d model fitting and fine-grained classification. In: European conference on computer vision, pp 466–480
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Shen F, Xu Y, Liu L, Yang Y, Huang Z, Tao Shen H (2018) Unsupervised deep hashing with similarity-adaptive and discrete optimization. https://doi.org/10.1109/TPAMI.2018.2789887
Shen F, Yang Y, Liu L, Liu W, Tao D, Shen HT (2017) Asymmetric binary coding for image search. IEEE Trans Multimed 19(9):2022–2032
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition international conference on learning representations
Sochor J, Herout A, Havel J (2016) Boxcars: 3d boxes as cnn input for improved fine-grained vehicle recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3006–3015
Stark M, Krause J, Pepik B, Meger D, Little JJ, Schiele B, Koller D (2012) Fine-grained categorization for 3d scene understanding. In: British machine vision conference 2012, pp 1–12
Sun X, Cao J, Li C, Zhu L, Shen HT (2017) Web-based semantic fragment discovery for on-line lingual-visual similarity. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4-9, 2017, San Francisco, California, USA, pp 182–188
Sun X, Huang Z, Yin H, Shen HT (2017) An integrated model for effective saliency prediction. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4-9, 2017, San Francisco, California, USA, pp 274–281
Sun X, Yao H, Ji R (2012) What are we looking for: towards statistical modeling of saccadic eye movements and visual saliency. In: 2012 IEEE conference on computer vision and pattern recognition, pp 1552–1559
Sun X, Yao H, Ji R (2013) Visual attention modeling based on short-term environmental adaption. J Vis Commun Image Represent 24(2):171–180
Sun X, Yao H, Ji R, Liu S (2009) Photo assessment based on computational visual attention model. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 541–544
Sun X, Yao H, Ji R, Liu X-M (2014) Toward statistical modeling of saccadic eye-movement and visual saliency. IEEE Trans Image Process 23(11):4649–4662
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, ICDAR 1995, pp 278–282
Vedaldi A, Lenc K (2015) Matconvnet: convolutional neural networks for matlab. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 689–692
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset
Yang L, Luo P, Loy CC, Tang X (2015) A large-scale car dataset for fine-grained categorization and verification. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3973–3981
Yang Y, Zhou J, Ai J, Bin Y, Hanjalic A, Shen HT (2018) Video captioning by adversarial lstm. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2018.2855422
Yu W, Sun X, Yang K, Rui Y, Yao H (2018) Hierarchical semantic image matching using cnn feature pyramid. Comput Vis Image Understand
Zeng Z, Li Z, Cheng D, Zhang H, Zhan K, Yang Y (2018) Two-stream multirate recurrent neural network for video-based pedestrian reidentification. IEEE Trans Industrial Inform 14(7):3179–3186
Zhuo T, Cheng Z, Zhang P, Wong Y, Kankanhalli M (2018) Unsupervised online video object segmentation with motion property understanding. arXiv:1810.03783
Acknowledgments
This work is supported in part by National Natural Science Foundation of China No.61702136 and Central Guide to Local Science and Technology Project No.ZY18A01.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, C., Zhu, S., Lyu, D. et al. What is damaged: a benchmark dataset for abnormal traffic object classification. Multimed Tools Appl 79, 18481–18494 (2020). https://doi.org/10.1007/s11042-019-08265-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08265-x