Skip to main content
Log in

What is damaged: a benchmark dataset for abnormal traffic object classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Traffic-related multimedia analysis has become increasingly important in both research community and industry. In this paper, we study the problem of image-based classification of abnormal traffic objects. Different from previous works that focusing on only the normal object categories, our work aims to classify both the category and the working status of a traffic object. We construct a new dataset, namely Abnormal Traffic Object Classification (ATOC), for the study of the above problem. ATOC contains 6 kinds of traffic objects and for each main category there are also two sub-categories covering the normal and abnormal status of the objects. We propose a novel deep-learning based framework to solve our problem and provide a strong baseline for future studies. Specifically, we adopt a pre-trained deep convolutional network for feature extraction and use support vector machine for classification. We also utilize random sample pairing to augment the dataset and introduce attention mechanism to further refine the feature representation. Experimental results demonstrate that the proposed method achieves superior performance than the state-of-art deep learning approaches for the recognition of objects’ categories and the corresponding working status in traffic scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.vlfeat.org/matconvnet/

  2. https://deepgaze.bethgelab.org/

  3. https://www.csie.ntu.edu.tw/~cjlin/libsvm/

  4. https://pytorch.org

References

  1. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition, CVPR 2005, pp 886–893

  2. Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings ninth IEEE international conference on computer vision, ICCV 2003, pp 1470–1477

  3. Bin Y, Yang Y, Shen F, Xie N, Shen HT, Li X (2018) Describing video with attention based bidirectional lstm. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2018.2831447

  4. Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Sys Technol (TIST) 2(3):27

    Google Scholar 

  5. Chang X, Yu Y-L, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39 (8):1617–1632

    Article  Google Scholar 

  6. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv:1405.3531

  7. Cheng Z, Ding Y, He X, Zhu L, Song X, Kankanhalli MS (2018) A ̂ 3ncf: an adaptive aspect attention model for rating prediction. In: IJCAI, pp 3748–3754

  8. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  9. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255

  10. Gao L, Guo Z, Zhang H, Xu X, Shen HT (2017) Video captioning with attention-based LSTM and semantic consistency. IEEE Trans Multimedia 19 (9):2045–2055

    Article  Google Scholar 

  11. Guo Y, Cheng Z, Nie L, Wang Y, Ma J, Kankanhalli M (2019) Attentive long short-term preference modeling for personalized product search. ACM Trans Inform Sys (TOIS) 37(2):19

    Google Scholar 

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp 770–778

  13. Hu M, Yang Y, Shen F, Xie N, Shen HT (2018) Hashing with angular reconstructive embeddings. IEEE Trans Image Process 27(2):545–555

    Article  MathSciNet  Google Scholar 

  14. Inoue H (2018) Data augmentation by pairing samples for images classification. arXiv:1801.02929

  15. Khosla A, Jayadevaprakash N, Yao B, Fei-Fei L (2011) Novel dataset for fine-grained image categorization. In: First workshop on fine-grained visual categorization, IEEE conference on computer vision and pattern recognition, Colorado Springs, CO

  16. Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: 2013 IEEE international conference on computer vision workshops, pp 554–561

  17. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, USA, pp 1106–1114

  18. Kummerer M, Wallis TS, Gatys LA, Bethge M (2017) Understanding low- and high-level contributions to fixation prediction. In: 2017 IEEE international conference on computer vision (ICCV), pp 4799–4808

  19. Li C, Huang Z, Yang Y, Cao J, Sun X, Shen HT (2017) Hierarchical latent concept discovery for video event detection. IEEE Trans Image Process 26(5):2149–2162. https://doi.org/10.1109/TIP.2017.2670782

    Article  MathSciNet  MATH  Google Scholar 

  20. Li Z, Nie F, Chang X, Yang Y (2017) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng 29(10):2100–2110

    Article  Google Scholar 

  21. Liao L, Hu R, Xiao J, Wang Q, Xiao J, Chen J (2015) Exploiting effects of parts in fine-grained categorization of vehicles. In: 2015 IEEE international conference on image processing (ICIP), pp 745–749

  22. Lin Y-L, Morariu VI, Hsu WH, Davis LS (2014) Jointly optimizing 3d model fitting and fine-grained classification. In: European conference on computer vision, pp 466–480

  23. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  24. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  Google Scholar 

  25. Shen F, Xu Y, Liu L, Yang Y, Huang Z, Tao Shen H (2018) Unsupervised deep hashing with similarity-adaptive and discrete optimization. https://doi.org/10.1109/TPAMI.2018.2789887

  26. Shen F, Yang Y, Liu L, Liu W, Tao D, Shen HT (2017) Asymmetric binary coding for image search. IEEE Trans Multimed 19(9):2022–2032

    Article  Google Scholar 

  27. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  28. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition international conference on learning representations

  29. Sochor J, Herout A, Havel J (2016) Boxcars: 3d boxes as cnn input for improved fine-grained vehicle recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3006–3015

  30. Stark M, Krause J, Pepik B, Meger D, Little JJ, Schiele B, Koller D (2012) Fine-grained categorization for 3d scene understanding. In: British machine vision conference 2012, pp 1–12

  31. Sun X, Cao J, Li C, Zhu L, Shen HT (2017) Web-based semantic fragment discovery for on-line lingual-visual similarity. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4-9, 2017, San Francisco, California, USA, pp 182–188

  32. Sun X, Huang Z, Yin H, Shen HT (2017) An integrated model for effective saliency prediction. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4-9, 2017, San Francisco, California, USA, pp 274–281

  33. Sun X, Yao H, Ji R (2012) What are we looking for: towards statistical modeling of saccadic eye movements and visual saliency. In: 2012 IEEE conference on computer vision and pattern recognition, pp 1552–1559

  34. Sun X, Yao H, Ji R (2013) Visual attention modeling based on short-term environmental adaption. J Vis Commun Image Represent 24(2):171–180

    Article  Google Scholar 

  35. Sun X, Yao H, Ji R, Liu S (2009) Photo assessment based on computational visual attention model. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 541–544

  36. Sun X, Yao H, Ji R, Liu X-M (2014) Toward statistical modeling of saccadic eye-movement and visual saliency. IEEE Trans Image Process 23(11):4649–4662

    Article  MathSciNet  Google Scholar 

  37. Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594

  38. Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, ICDAR 1995, pp 278–282

  39. Vedaldi A, Lenc K (2015) Matconvnet: convolutional neural networks for matlab. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 689–692

  40. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset

  41. Yang L, Luo P, Loy CC, Tang X (2015) A large-scale car dataset for fine-grained categorization and verification. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3973–3981

  42. Yang Y, Zhou J, Ai J, Bin Y, Hanjalic A, Shen HT (2018) Video captioning by adversarial lstm. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2018.2855422

  43. Yu W, Sun X, Yang K, Rui Y, Yao H (2018) Hierarchical semantic image matching using cnn feature pyramid. Comput Vis Image Understand

  44. Zeng Z, Li Z, Cheng D, Zhang H, Zhan K, Yang Y (2018) Two-stream multirate recurrent neural network for video-based pedestrian reidentification. IEEE Trans Industrial Inform 14(7):3179–3186

    Article  Google Scholar 

  45. Zhuo T, Cheng Z, Zhang P, Wong Y, Kankanhalli M (2018) Unsupervised online video object segmentation with motion property understanding. arXiv:1810.03783

Download references

Acknowledgments

This work is supported in part by National Natural Science Foundation of China No.61702136 and Central Guide to Local Science and Technology Project No.ZY18A01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Zhu, S., Lyu, D. et al. What is damaged: a benchmark dataset for abnormal traffic object classification. Multimed Tools Appl 79, 18481–18494 (2020). https://doi.org/10.1007/s11042-019-08265-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08265-x

Keywords

Navigation