Skip to main content

Advertisement

Log in

3D Tracking and Classification System Using a Monocular Camera

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

This paper details a 3D tracking and recognition system using a single camera. The system is able to track and classify targets in outdoors and indoors scenarios, as long as they move (at least approximately) on a plane. The system first detects and validates targets and then tracks them in a state-space employing cylindrical models (horizontal and vertical position on the ground, their radius and height) utilising Particle Filters. The tracker fuses visual measurements that utilise the targets’ foreground and colour models. Finally the system classifies the tracked objects based on the visual metrics extracted by our algorithm. We have tested our model in an outdoor setting using humans and automobiles passing through the field of view of the camera at various speeds and distances. The results presented in this paper show the validity our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Andersen, M., Andersen, R., Katsarakis, N., Pnevmatikakis, A., & Tan, Z. H. (2010). Three-dimensional adaptive sensing of people in a multi-camera setup. In Person tracking for assistive working and living environments, EUSIPCO 2010 (pp. 964–968). Denmark: Aalborg.

  2. Arulampalam, S., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for on-line non-linear/non-gaussian bayesian tracking. IEEE Transactions on Signal Processing, 50(2), 174–188.

    Article  Google Scholar 

  3. Babenko, B., Yang, M. H., & Belongie, S. (2009). Visual Tracking with Online Multiple Instance Learning. In IEEE conference on computer vision and pattern recognition (CVPR 2009). Miami Beach, FL, USA.

  4. Barnich, O., & Droogenbroeck, M. V. (2011). Vibe: A universal background subtraction algorithm for video sequences. IEEE Transactions on Image Processing, 20(6), 1709–1724. doi:10.1109/TIP.2010.2101613.

    Article  MathSciNet  Google Scholar 

  5. Bouguet, J. Y. (2008). Camera calibration toolbox for matlab. www.vision.caltech.edu/bouguetj/calib_doc/htmls/parameters.html.

  6. Chen, Z., Pears, N., & Liang, B. (2006). A method of visual metrology from uncalibrated images. Pattern Recognition Letters, 27(13), 1447–1456.

    Article  Google Scholar 

  7. Criminisi, A., Reid, I., & Zisserman, A. (2000). Single view metrology. International Journal of Computer Vision, 40(2), 123–148.

    Article  MATH  Google Scholar 

  8. Diamantas, S. C. (2010). Biological and metric maps applied to robot homing. Ph.D. thesis, School of Electronics and Computer Science, University of Southampton.

  9. Diamantas, S. C., & Dasgupta, P. (2013). An active vision approach to height estimation with optical flow. In International Symposium on Visual Computing (pp. 160–170). Springer.

  10. Diamantas, S. C., Oikonomidis, A., & Crowder, R. M. (2010). Depth computation using optical flow and least squares. In IEEE/SICE international symposium on system integration (pp. 7–12). Sendai, Japan.

  11. Diamantas, S. C., Oikonomidis, A., & Crowder, R. M. (2010). Depth estimation for autonomous robot navigation: A comparative approach. In International conference on imaging systems and techniques (pp. 426–430). Thessaloniki, Greece.

  12. Ding, X., Xu, H., Cui, P., Sun, L., & Yang, S. (2009). A cascade svm approach for head-shoulder detection using histograms of oriented gradients. In IEEE international symposium on circuits and systems (ISCAS 2009) (pp. 1791–1794). Taipei, Taiwan.

  13. Elgammal, A., Duraiswami, R., Harwood, D., & Davis, L. S. (2002). Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceeding of the IEEE, 90, 1151–1163.

    Article  Google Scholar 

  14. Godbehere, A. B., Matsukawa, A., & Goldberg, K. Y. (2012). Visual tracking of human visitors under variable-lighting conditions for a responsive audio art installation. In American control conference, ACC 2012, Montreal, QC, Canada (pp. 4305–4312), June 27–29, 2012.

  15. Heikkilä, M., & Pietikäinen, M. (2006). A texture-based method for modeling the background and detecting moving objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 657–662. doi:10.1109/TPAMI.2006.68.

    Article  Google Scholar 

  16. Jaffré, G., & Crouzil, A. (2003). Non-rigid object localization from color model using mean shift. In IEEE international conference on image processing (ICIP 2003) (pp. 317–320). Barcelona, Spain.

  17. Jones, M. J., & Rehg, J. M. (2002). Statistical color models with application to skin detection. International Journal of Computer Vision, 46(1), 81–96.

    Article  MATH  Google Scholar 

  18. KaewTraKulPong, P., & Bowden, R. (2002). An improved adaptive background mixture model for real-time tracking with shadow detection. In P. Remagnino, G. A. Jones, N. Paragios & C. S. Regazzoni (Eds.), Video-based surveillance systems: Computer vision and distributed processing (pp. 135–144). Boston, MA: Springer. doi:10.1007/978-1-4615-0913-411.

  19. Katsarakis, N., Pnevmatikakis, A., Tan, Z., & Prasad, R. (2014). Combination of multiple measurement cues for visual face tracking. Wireless Personal Communications, 78(3), 1789–1810. doi:10.1007/s11277-014-1900-2.

    Article  Google Scholar 

  20. Katsarakis, N., Pnevmatikakis, A., Tan, Z. H., & Prasad, R. (2014). Combination of multiple measurement cues for visual face tracking. Wireless Personal Communications, 78(3), 1789–1810. doi:10.1007/s11277-014-1900-2.

    Article  Google Scholar 

  21. Khan, Z., Balch, T. R., & Dellaert, F. (2005). Mcmc-based particle filtering for tracking a variable number of interacting targets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(11), 1805–1918. doi:10.1109/TPAMI.2005.223.

    Article  Google Scholar 

  22. Kitagawa, G. (1996). Monte carlo filter and smoother for non-gaussian nonlinear state space models. Journal of Computational and Graphical Statistics, 5(1), 1–25.

    MathSciNet  Google Scholar 

  23. Nalpantidis, L., Kostavelis, I., & Gasteratos, A. (2009). Stereovision-based algorithm for obstacle avoidance. In M. Xie, Y. Xiong, C. Xiong, H. Liu & Z. Hu (Eds.), Intelligent robotics and applications (pp. 195–204). Berlin/Heidelberg: Springer. doi:10.1007/978-3-642-10817-419.

  24. Li, Y., Ai, H., Yamashita, T., Lao, S., & Kawade, M. (2008). Tracking in low frame rate video: A cascade particle filter with discriminative observers of different life spans. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10), 1728–1740. doi:10.1109/TPAMI.2008.73.

    Article  Google Scholar 

  25. Liu, F., Shen, C., Lin, G., & Reid, I. (2016). Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(10), 2024–2039. doi:10.1109/TPAMI.2015.2505283.

    Article  Google Scholar 

  26. Maddalena, L., & Petrosino, A. (2008). A self-organizing approach to background subtraction for visual surveillance applications. IEEE Transactions on Image Processing, 17(7), 1168–1177. doi:10.1109/TIP.2008.924285.

    Article  MathSciNet  Google Scholar 

  27. Maddalena, L., & Petrosino, A. (2010). A fuzzy spatial coherence-based approach to background/foreground separation for moving object detection. Neural Computing and Applications, 19(2), 179–186. doi:10.1007/s00521-009-0285-8.

    Article  Google Scholar 

  28. Mihaylova, L., Brasnett, P., Canagarajah, N., & Bull, D. (2007). Object tracking by particle filtering techniques in video sequences. In E. Lefebvre (Ed.), Advances and challenges in multisensor data and information processing (Vol. 8, pp. 260–268).

  29. Momeni, K. M., Diamantas, S. C., Ruggiero, F., & Siciliano, B. (2012). Height estimation from a single camera view. In Proceedings of the international conference on computer vision theory and applications (pp. 358–364). SCITE Press.

  30. Noh, S., & Jeon, M. (2012). A new framework for background subtraction using multiple cues. In Computer vision—ACCV 2012–2011th Asian conference on computer vision, Daejeon, Korea (pp. 493–506), November 5–9, 2012, Revised Selected Papers, Part III. doi:10.1007/978-3-642-37431-9_38.

  31. OpenCV. (2016). http://opencv.org.

  32. Pan, J., Hu, B., & Zhang, J. Q. (2006). An efficient object tracking algorithm with adaptive prediction of initial searching point. In Proceedings of the advances in image and video technology, first pacific rim symposium, PSIVT 2006, Hsinchu, Taiwan (pp. 1113–1122), December 10–13, 2006. doi:10.1007/11949534_112.

  33. Pnevmatikakis, A., & Polymenakos, L. (2006). Robust estimation of background for fixed cameras. In 15th international conference on computing (CIC’06) (pp. 37–42). Mexico City, Mexico.

  34. Saxena, A., Chung, S. H., & Ng, A. Y. (2006). Learning depth from single monocular images. In Y. Weiss, B. Schölkopf, & J. C. Platt (Eds.), Advances in neural information processing systems (Vol. 18, pp. 1161–1168). MIT Press. http://papers.nips.cc/paper/2921-learning-depth-from-single-monocular-images.pdf.

  35. Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1–3), 7–42.

    Article  MATH  Google Scholar 

  36. Stauffer, C., & Grimson, W. E. L. (1999). Adaptive background mixture models for real-time tracking. In 1999 Conference on Computer Vision and Pattern Recognition (CVPR’99) (pp. 2246–2252), 23–25 June 1999, Ft. Collins, CO, USA. doi:10.1109/CVPR.1999.784637.

  37. Talantzis, F., Pnevmatikakis, A., & Constantinides, A. G. (2012). Audio–visual person tracking: a practical approach. London: Imperial College Press.

    Google Scholar 

  38. Viola, P. A., & Jones, M. J. (2001). Rapid object detection using a boosted cascade of simple features. In IEEE computer society conference on computer vision and pattern recognition (CVPR 2001) (pp. 511–518). Kauai, HI, USA.

  39. Welch, G., & Bishop, G. (2006). An introduction to the kalman filter. Technical report, University of North Carolina at Chapel Hill.

  40. Xu, L., Landabaso, J., & Pardas, M. (2005). Shadow removal with blob-based morphological reconstruction for error correction. In IEEE international conference on acoustics, speech, and signal processing (ICASSP 2005). Philadelphia, PA, USA.

  41. Yao, J., & Odobez, J. M. (2007). Multi-layer background subtraction based on color and texture. In 2007 IEEE computer society conference on computer vision and pattern recognition (CVPR 2007), 18–23 June 2007, Minneapolis, Minnesota, USA. doi:10.1109/CVPR.2007.383497.

  42. Zhang, X., Hu, W., & Maybank, S. J. (2009). A smarter particle filter. In Computer vision—ACCV 2009, 9th Asian conference on computer vision, Xi’an, China (pp. 236–246), September 23–27, 2009, Revised Selected Papers, Part II. doi:10.1007/978-3-642-12304-7_23.

  43. Zhang, Z. (2000). A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1330–1334.

    Article  Google Scholar 

  44. Zivkovic, Z. (2004). Improved adaptive Gaussian mixture model for background subtraction. In 17th International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK (pp. 28–31), August 23–26, 2004. doi:10.1109/ICPR.2004.1333992.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aristodemos Pnevmatikakis.

Additional information

Part of this work has been carried out in the scope of the EC co-funded projects ARGOS (FP7-SEC-2012-1) and eWALL (FP7-610658).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bardas, G., Astaras, S., Diamantas, S. et al. 3D Tracking and Classification System Using a Monocular Camera. Wireless Pers Commun 92, 63–85 (2017). https://doi.org/10.1007/s11277-016-3839-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-016-3839-y

Keywords

Navigation