Skip to main content

Transportation Object Detection with Bag of Visual Words Model by PLSA and MLP

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

Visual big data is an essential and significant research topic, due to its diverse applications. In this paper, a new visual detection method for transportation is proposed based on probabilistic latent semantic analysis with visual data. We detect the distinctiveness by integrating three steps as follows: first, representing the co-ocurrence matrix of images, which were vectorized using the bag of visual words (BoVW) framework; then calculating the histograms of the visual words of each class; and finally applying the test images as the visual words. A multilayer perceptron (MLP) is used as the classification method in our system. The visual words are extracted by sampling the patches from the current image. A new topology of the neural network for the BoVW model is proposed, and management of the learning rate by reducing at specific iterations is exploited. The Probabilistic latent semantic analysis (PLSA) is compared to the MLP using the Caltech 256 datasets. The classes used include cars, motorbikes, and horses. The results of the experiment show that the MLP outperforms current methods in predicting transportation objects, and properly approximates the transportation detection function with extracted local features. It shows that the proposed method yields about 4.4% higher accuracy than the conventional PLSA for all classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Han H, Han Q, Li X, Gu J (2013) Hierarchical spatial pyramid max pooling based on sift features and sparse coding for image classification. IET Comput Vis 7(2):144–150

    Article  Google Scholar 

  2. Ji Z (2013) Decoupling sparse coding with fusion of fisher vectors and scalable svms for large-scale visual recognition. In: Proceedings of the IEEE Conf Computer Vision and Pattern Recognition, pp 450–457

  3. Parkhi OM, Vedaldi A, Zisserman A, Jawahar C (2012) Cats and dogs. In: Proceedings of the IEEE Conf Computer Vision and Pattern Recognition, pp 3498–3505

  4. Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from google’s image search, vol 2, pp 1816–1823

  5. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the International ACM SIGIR Conf Research and Development in Information Retrieval, pp 50–57

  6. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  7. Zhong C, Miao Z (2014) Modeling correlation between multi-modal continuous words for plsa-based video classification. In: Proceedings of the International Conference on Image Processing, pp 4304–4308

  8. Pliakos K, Kotropoulos C (2014) Plsa driven image annotation, classification, and tourism recommendation. In: Proceedings of the International Conference on Image Processing, pp 3003–3007

  9. Fergus R (2005) Visual object category recognition

  10. Choi HJ, Lee YS, Shim D-S, Lee CG, Choi KN (2016) Effective pedestrian detection using deformable part model based on human model. Int J Control Autom Syst 14(6):1618–1625

    Article  Google Scholar 

  11. Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering object categories in image collections. In: Proceedings of the IEEE Conference on Computer Vision

  12. Bosch A, Zisserman A, Muñoz X (2006) Scene classification via plsa. In: Proceedings of the European Conference on Computer Vision, pp 517–530

  13. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. Proc IEEE Conf Comput Vis Pattern Recognit 1:I–I

    Google Scholar 

  14. Bui TQ, Vu TT, Hong K-S (2016) Extraction of sparse features of color images in recognizing objects. Int J Control Autom Syst 14(2):616–627

    Article  Google Scholar 

  15. Whoang I, Kim JH, Choi KN (2012) Object tracking using maximum colour distance under illumination change. Int J Adv Robot Syst 9(5):212

    Article  Google Scholar 

  16. Chang SH, Shim D-S, Kim H-Y, Choi K-N (2012) Object motion tracking using a moving direction estimate and color upyears. Int J Control Autom Syst 10(1):136–142

    Article  Google Scholar 

  17. Kim J, Lee GH, Jung JJ, Choi KN (2017) Real-Time Head Pose Estimation Framework for Mobile Devices. Mobile Networks and Applications 22(4):634–641

    Article  Google Scholar 

  18. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Proc IEEE Conf Comput Vis Pattern Recognit 1:886–893

    Google Scholar 

  19. Bay H, Tuytelaars T, Van Gool L (2006) Surf Speeded up robust features. In: Proceedings of the European Conference on Computer Vision, pp 404–417

  20. Mu Y, Yan S, Liu Y, Huang T, Zhou B (2008) Discriminative local binary patterns for human detection in personal album. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  21. Bui KN, Jung JJ (2018) Internet of agents framework for connected vehicles: A case study on distributed traffic control system. J Parallel Distrib Comput 116:89–95

    Article  Google Scholar 

  22. Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst (MCSS) 2(4):303–314

    Article  MathSciNet  MATH  Google Scholar 

  23. Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comp Vision Image Underst 106(1):59–70

    Article  Google Scholar 

  24. Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. California Institute of Technology

  25. Harris C, Stephens M (1988) A combined corner and edge detector. Alvey Vision Conference 15(50):10–5244

    Google Scholar 

  26. Lowe DG (2001) Local feature view clustering for 3d object recognition. Proc IEEE Conf Comput Vis Pattern Recognit 1:I–I

    Google Scholar 

  27. Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. Proceedings of the European Conference on Computer Vision 1(1–22):1–2

    Google Scholar 

  28. Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. Proc IEEE Conf Comput Vis Pattern Recognit 2:524–531

    Google Scholar 

  29. Lowe DG (1999) Object recognition from local scale-invariant features. Proc of the IEEE Conf Computer Vision 2:1150– 1157

    Google Scholar 

  30. Sivic J, Zisserman A et al (2003) Video google: a text retrieval approach to object matching in videos. Proc of the IEEE Conf Computer Vision 2(1470):1470–1477

    Article  Google Scholar 

  31. Goodfellow I , Bengio Y, Courville A (2016) Deep Learning. MIT Press, Cambridge. http://www.deeplearningbook.org

    MATH  Google Scholar 

  32. Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines, pp 807–814

  33. Orgaz GB, Jung JJ, Camacho D (2016) Social big data: Recent achievements and new challenges. Information Fusion 28:45– 59

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2010-0025512).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kwang Nam Choi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, H.C., Choi, K.N. Transportation Object Detection with Bag of Visual Words Model by PLSA and MLP. Mobile Netw Appl 23, 1103–1110 (2018). https://doi.org/10.1007/s11036-018-1075-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-018-1075-2

Keywords