What is damaged: a benchmark dataset for abnormal traffic object classification

Wang, Chen; Zhu, Shifan; Lyu, Desheng; Sun, Xiaoshuai

doi:10.1007/s11042-019-08265-x

What is damaged: a benchmark dataset for abnormal traffic object classification

Published: 04 March 2020

Volume 79, pages 18481–18494, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chen Wang^1,2,
Shifan Zhu¹,
Desheng Lyu² &
…
Xiaoshuai Sun²

237 Accesses
1 Citation
Explore all metrics

Abstract

Traffic-related multimedia analysis has become increasingly important in both research community and industry. In this paper, we study the problem of image-based classification of abnormal traffic objects. Different from previous works that focusing on only the normal object categories, our work aims to classify both the category and the working status of a traffic object. We construct a new dataset, namely Abnormal Traffic Object Classification (ATOC), for the study of the above problem. ATOC contains 6 kinds of traffic objects and for each main category there are also two sub-categories covering the normal and abnormal status of the objects. We propose a novel deep-learning based framework to solve our problem and provide a strong baseline for future studies. Specifically, we adopt a pre-trained deep convolutional network for feature extraction and use support vector machine for classification. We also utilize random sample pairing to augment the dataset and introduce attention mechanism to further refine the feature representation. Experimental results demonstrate that the proposed method achieves superior performance than the state-of-art deep learning approaches for the recognition of objects’ categories and the corresponding working status in traffic scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cluster-TRnet: Jointed Model for Real-Time Traffic Identification with High Accuracy

Traffic Congestion Detection Based on the Image Classification with CNN

Deep Appearance Features for Abnormal Behavior Detection in Video

Notes

References

Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition, CVPR 2005, pp 886–893
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings ninth IEEE international conference on computer vision, ICCV 2003, pp 1470–1477
Bin Y, Yang Y, Shen F, Xie N, Shen HT, Li X (2018) Describing video with attention based bidirectional lstm. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2018.2831447
Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Sys Technol (TIST) 2(3):27
Google Scholar
Chang X, Yu Y-L, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39 (8):1617–1632
Article Google Scholar
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv:1405.3531
Cheng Z, Ding Y, He X, Zhu L, Song X, Kankanhalli MS (2018) A ̂ 3ncf: an adaptive aspect attention model for rating prediction. In: IJCAI, pp 3748–3754
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255
Gao L, Guo Z, Zhang H, Xu X, Shen HT (2017) Video captioning with attention-based LSTM and semantic consistency. IEEE Trans Multimedia 19 (9):2045–2055
Article Google Scholar
Guo Y, Cheng Z, Nie L, Wang Y, Ma J, Kankanhalli M (2019) Attentive long short-term preference modeling for personalized product search. ACM Trans Inform Sys (TOIS) 37(2):19
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp 770–778
Hu M, Yang Y, Shen F, Xie N, Shen HT (2018) Hashing with angular reconstructive embeddings. IEEE Trans Image Process 27(2):545–555
Article MathSciNet Google Scholar
Inoue H (2018) Data augmentation by pairing samples for images classification. arXiv:1801.02929
Khosla A, Jayadevaprakash N, Yao B, Fei-Fei L (2011) Novel dataset for fine-grained image categorization. In: First workshop on fine-grained visual categorization, IEEE conference on computer vision and pattern recognition, Colorado Springs, CO
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: 2013 IEEE international conference on computer vision workshops, pp 554–561
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, USA, pp 1106–1114
Kummerer M, Wallis TS, Gatys LA, Bethge M (2017) Understanding low- and high-level contributions to fixation prediction. In: 2017 IEEE international conference on computer vision (ICCV), pp 4799–4808
Li C, Huang Z, Yang Y, Cao J, Sun X, Shen HT (2017) Hierarchical latent concept discovery for video event detection. IEEE Trans Image Process 26(5):2149–2162. https://doi.org/10.1109/TIP.2017.2670782
Article MathSciNet MATH Google Scholar
Li Z, Nie F, Chang X, Yang Y (2017) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng 29(10):2100–2110
Article Google Scholar
Liao L, Hu R, Xiao J, Wang Q, Xiao J, Chen J (2015) Exploiting effects of parts in fine-grained categorization of vehicles. In: 2015 IEEE international conference on image processing (ICIP), pp 745–749
Lin Y-L, Morariu VI, Hsu WH, Davis LS (2014) Jointly optimizing 3d model fitting and fine-grained classification. In: European conference on computer vision, pp 466–480
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Article Google Scholar
Shen F, Xu Y, Liu L, Yang Y, Huang Z, Tao Shen H (2018) Unsupervised deep hashing with similarity-adaptive and discrete optimization. https://doi.org/10.1109/TPAMI.2018.2789887
Shen F, Yang Y, Liu L, Liu W, Tao D, Shen HT (2017) Asymmetric binary coding for image search. IEEE Trans Multimed 19(9):2022–2032
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition international conference on learning representations
Sochor J, Herout A, Havel J (2016) Boxcars: 3d boxes as cnn input for improved fine-grained vehicle recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3006–3015
Stark M, Krause J, Pepik B, Meger D, Little JJ, Schiele B, Koller D (2012) Fine-grained categorization for 3d scene understanding. In: British machine vision conference 2012, pp 1–12
Sun X, Cao J, Li C, Zhu L, Shen HT (2017) Web-based semantic fragment discovery for on-line lingual-visual similarity. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4-9, 2017, San Francisco, California, USA, pp 182–188
Sun X, Huang Z, Yin H, Shen HT (2017) An integrated model for effective saliency prediction. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4-9, 2017, San Francisco, California, USA, pp 274–281
Sun X, Yao H, Ji R (2012) What are we looking for: towards statistical modeling of saccadic eye movements and visual saliency. In: 2012 IEEE conference on computer vision and pattern recognition, pp 1552–1559
Sun X, Yao H, Ji R (2013) Visual attention modeling based on short-term environmental adaption. J Vis Commun Image Represent 24(2):171–180
Article Google Scholar
Sun X, Yao H, Ji R, Liu S (2009) Photo assessment based on computational visual attention model. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 541–544
Sun X, Yao H, Ji R, Liu X-M (2014) Toward statistical modeling of saccadic eye-movement and visual saliency. IEEE Trans Image Process 23(11):4649–4662
Article MathSciNet Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, ICDAR 1995, pp 278–282
Vedaldi A, Lenc K (2015) Matconvnet: convolutional neural networks for matlab. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 689–692
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset
Yang L, Luo P, Loy CC, Tang X (2015) A large-scale car dataset for fine-grained categorization and verification. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3973–3981
Yang Y, Zhou J, Ai J, Bin Y, Hanjalic A, Shen HT (2018) Video captioning by adversarial lstm. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2018.2855422
Yu W, Sun X, Yang K, Rui Y, Yao H (2018) Hierarchical semantic image matching using cnn feature pyramid. Comput Vis Image Understand
Zeng Z, Li Z, Cheng D, Zhang H, Zhan K, Yang Y (2018) Two-stream multirate recurrent neural network for video-based pedestrian reidentification. IEEE Trans Industrial Inform 14(7):3179–3186
Article Google Scholar
Zhuo T, Cheng Z, Zhang P, Wong Y, Kankanhalli M (2018) Unsupervised online video object segmentation with motion property understanding. arXiv:1810.03783

Download references

Acknowledgments

This work is supported in part by National Natural Science Foundation of China No.61702136 and Central Guide to Local Science and Technology Project No.ZY18A01.

Author information

Authors and Affiliations

Harbin Engineering University, Harbin, China
Chen Wang & Shifan Zhu
Harbin Institute of Technology, Harbin, China
Chen Wang, Desheng Lyu & Xiaoshuai Sun

Authors

Chen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shifan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Desheng Lyu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoshuai Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Zhu, S., Lyu, D. et al. What is damaged: a benchmark dataset for abnormal traffic object classification. Multimed Tools Appl 79, 18481–18494 (2020). https://doi.org/10.1007/s11042-019-08265-x

Download citation

Received: 06 December 2018
Revised: 24 January 2019
Accepted: 16 September 2019
Published: 04 March 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11042-019-08265-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What is damaged: a benchmark dataset for abnormal traffic object classification

Abstract

Access this article

Similar content being viewed by others

Cluster-TRnet: Jointed Model for Real-Time Traffic Identification with High Accuracy

Traffic Congestion Detection Based on the Image Classification with CNN

Deep Appearance Features for Abnormal Behavior Detection in Video

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

What is damaged: a benchmark dataset for abnormal traffic object classification

Abstract

Access this article

Similar content being viewed by others

Cluster-TRnet: Jointed Model for Real-Time Traffic Identification with High Accuracy

Traffic Congestion Detection Based on the Image Classification with CNN

Deep Appearance Features for Abnormal Behavior Detection in Video

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation