Abstract
This paper details a 3D tracking and recognition system using a single camera. The system is able to track and classify targets in outdoors and indoors scenarios, as long as they move (at least approximately) on a plane. The system first detects and validates targets and then tracks them in a state-space employing cylindrical models (horizontal and vertical position on the ground, their radius and height) utilising Particle Filters. The tracker fuses visual measurements that utilise the targets’ foreground and colour models. Finally the system classifies the tracked objects based on the visual metrics extracted by our algorithm. We have tested our model in an outdoor setting using humans and automobiles passing through the field of view of the camera at various speeds and distances. The results presented in this paper show the validity our approach.
Similar content being viewed by others
References
Andersen, M., Andersen, R., Katsarakis, N., Pnevmatikakis, A., & Tan, Z. H. (2010). Three-dimensional adaptive sensing of people in a multi-camera setup. In Person tracking for assistive working and living environments, EUSIPCO 2010 (pp. 964–968). Denmark: Aalborg.
Arulampalam, S., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for on-line non-linear/non-gaussian bayesian tracking. IEEE Transactions on Signal Processing, 50(2), 174–188.
Babenko, B., Yang, M. H., & Belongie, S. (2009). Visual Tracking with Online Multiple Instance Learning. In IEEE conference on computer vision and pattern recognition (CVPR 2009). Miami Beach, FL, USA.
Barnich, O., & Droogenbroeck, M. V. (2011). Vibe: A universal background subtraction algorithm for video sequences. IEEE Transactions on Image Processing, 20(6), 1709–1724. doi:10.1109/TIP.2010.2101613.
Bouguet, J. Y. (2008). Camera calibration toolbox for matlab. www.vision.caltech.edu/bouguetj/calib_doc/htmls/parameters.html.
Chen, Z., Pears, N., & Liang, B. (2006). A method of visual metrology from uncalibrated images. Pattern Recognition Letters, 27(13), 1447–1456.
Criminisi, A., Reid, I., & Zisserman, A. (2000). Single view metrology. International Journal of Computer Vision, 40(2), 123–148.
Diamantas, S. C. (2010). Biological and metric maps applied to robot homing. Ph.D. thesis, School of Electronics and Computer Science, University of Southampton.
Diamantas, S. C., & Dasgupta, P. (2013). An active vision approach to height estimation with optical flow. In International Symposium on Visual Computing (pp. 160–170). Springer.
Diamantas, S. C., Oikonomidis, A., & Crowder, R. M. (2010). Depth computation using optical flow and least squares. In IEEE/SICE international symposium on system integration (pp. 7–12). Sendai, Japan.
Diamantas, S. C., Oikonomidis, A., & Crowder, R. M. (2010). Depth estimation for autonomous robot navigation: A comparative approach. In International conference on imaging systems and techniques (pp. 426–430). Thessaloniki, Greece.
Ding, X., Xu, H., Cui, P., Sun, L., & Yang, S. (2009). A cascade svm approach for head-shoulder detection using histograms of oriented gradients. In IEEE international symposium on circuits and systems (ISCAS 2009) (pp. 1791–1794). Taipei, Taiwan.
Elgammal, A., Duraiswami, R., Harwood, D., & Davis, L. S. (2002). Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceeding of the IEEE, 90, 1151–1163.
Godbehere, A. B., Matsukawa, A., & Goldberg, K. Y. (2012). Visual tracking of human visitors under variable-lighting conditions for a responsive audio art installation. In American control conference, ACC 2012, Montreal, QC, Canada (pp. 4305–4312), June 27–29, 2012.
Heikkilä, M., & Pietikäinen, M. (2006). A texture-based method for modeling the background and detecting moving objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 657–662. doi:10.1109/TPAMI.2006.68.
Jaffré, G., & Crouzil, A. (2003). Non-rigid object localization from color model using mean shift. In IEEE international conference on image processing (ICIP 2003) (pp. 317–320). Barcelona, Spain.
Jones, M. J., & Rehg, J. M. (2002). Statistical color models with application to skin detection. International Journal of Computer Vision, 46(1), 81–96.
KaewTraKulPong, P., & Bowden, R. (2002). An improved adaptive background mixture model for real-time tracking with shadow detection. In P. Remagnino, G. A. Jones, N. Paragios & C. S. Regazzoni (Eds.), Video-based surveillance systems: Computer vision and distributed processing (pp. 135–144). Boston, MA: Springer. doi:10.1007/978-1-4615-0913-411.
Katsarakis, N., Pnevmatikakis, A., Tan, Z., & Prasad, R. (2014). Combination of multiple measurement cues for visual face tracking. Wireless Personal Communications, 78(3), 1789–1810. doi:10.1007/s11277-014-1900-2.
Katsarakis, N., Pnevmatikakis, A., Tan, Z. H., & Prasad, R. (2014). Combination of multiple measurement cues for visual face tracking. Wireless Personal Communications, 78(3), 1789–1810. doi:10.1007/s11277-014-1900-2.
Khan, Z., Balch, T. R., & Dellaert, F. (2005). Mcmc-based particle filtering for tracking a variable number of interacting targets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(11), 1805–1918. doi:10.1109/TPAMI.2005.223.
Kitagawa, G. (1996). Monte carlo filter and smoother for non-gaussian nonlinear state space models. Journal of Computational and Graphical Statistics, 5(1), 1–25.
Nalpantidis, L., Kostavelis, I., & Gasteratos, A. (2009). Stereovision-based algorithm for obstacle avoidance. In M. Xie, Y. Xiong, C. Xiong, H. Liu & Z. Hu (Eds.), Intelligent robotics and applications (pp. 195–204). Berlin/Heidelberg: Springer. doi:10.1007/978-3-642-10817-419.
Li, Y., Ai, H., Yamashita, T., Lao, S., & Kawade, M. (2008). Tracking in low frame rate video: A cascade particle filter with discriminative observers of different life spans. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10), 1728–1740. doi:10.1109/TPAMI.2008.73.
Liu, F., Shen, C., Lin, G., & Reid, I. (2016). Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(10), 2024–2039. doi:10.1109/TPAMI.2015.2505283.
Maddalena, L., & Petrosino, A. (2008). A self-organizing approach to background subtraction for visual surveillance applications. IEEE Transactions on Image Processing, 17(7), 1168–1177. doi:10.1109/TIP.2008.924285.
Maddalena, L., & Petrosino, A. (2010). A fuzzy spatial coherence-based approach to background/foreground separation for moving object detection. Neural Computing and Applications, 19(2), 179–186. doi:10.1007/s00521-009-0285-8.
Mihaylova, L., Brasnett, P., Canagarajah, N., & Bull, D. (2007). Object tracking by particle filtering techniques in video sequences. In E. Lefebvre (Ed.), Advances and challenges in multisensor data and information processing (Vol. 8, pp. 260–268).
Momeni, K. M., Diamantas, S. C., Ruggiero, F., & Siciliano, B. (2012). Height estimation from a single camera view. In Proceedings of the international conference on computer vision theory and applications (pp. 358–364). SCITE Press.
Noh, S., & Jeon, M. (2012). A new framework for background subtraction using multiple cues. In Computer vision—ACCV 2012–2011th Asian conference on computer vision, Daejeon, Korea (pp. 493–506), November 5–9, 2012, Revised Selected Papers, Part III. doi:10.1007/978-3-642-37431-9_38.
OpenCV. (2016). http://opencv.org.
Pan, J., Hu, B., & Zhang, J. Q. (2006). An efficient object tracking algorithm with adaptive prediction of initial searching point. In Proceedings of the advances in image and video technology, first pacific rim symposium, PSIVT 2006, Hsinchu, Taiwan (pp. 1113–1122), December 10–13, 2006. doi:10.1007/11949534_112.
Pnevmatikakis, A., & Polymenakos, L. (2006). Robust estimation of background for fixed cameras. In 15th international conference on computing (CIC’06) (pp. 37–42). Mexico City, Mexico.
Saxena, A., Chung, S. H., & Ng, A. Y. (2006). Learning depth from single monocular images. In Y. Weiss, B. Schölkopf, & J. C. Platt (Eds.), Advances in neural information processing systems (Vol. 18, pp. 1161–1168). MIT Press. http://papers.nips.cc/paper/2921-learning-depth-from-single-monocular-images.pdf.
Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1–3), 7–42.
Stauffer, C., & Grimson, W. E. L. (1999). Adaptive background mixture models for real-time tracking. In 1999 Conference on Computer Vision and Pattern Recognition (CVPR’99) (pp. 2246–2252), 23–25 June 1999, Ft. Collins, CO, USA. doi:10.1109/CVPR.1999.784637.
Talantzis, F., Pnevmatikakis, A., & Constantinides, A. G. (2012). Audio–visual person tracking: a practical approach. London: Imperial College Press.
Viola, P. A., & Jones, M. J. (2001). Rapid object detection using a boosted cascade of simple features. In IEEE computer society conference on computer vision and pattern recognition (CVPR 2001) (pp. 511–518). Kauai, HI, USA.
Welch, G., & Bishop, G. (2006). An introduction to the kalman filter. Technical report, University of North Carolina at Chapel Hill.
Xu, L., Landabaso, J., & Pardas, M. (2005). Shadow removal with blob-based morphological reconstruction for error correction. In IEEE international conference on acoustics, speech, and signal processing (ICASSP 2005). Philadelphia, PA, USA.
Yao, J., & Odobez, J. M. (2007). Multi-layer background subtraction based on color and texture. In 2007 IEEE computer society conference on computer vision and pattern recognition (CVPR 2007), 18–23 June 2007, Minneapolis, Minnesota, USA. doi:10.1109/CVPR.2007.383497.
Zhang, X., Hu, W., & Maybank, S. J. (2009). A smarter particle filter. In Computer vision—ACCV 2009, 9th Asian conference on computer vision, Xi’an, China (pp. 236–246), September 23–27, 2009, Revised Selected Papers, Part II. doi:10.1007/978-3-642-12304-7_23.
Zhang, Z. (2000). A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1330–1334.
Zivkovic, Z. (2004). Improved adaptive Gaussian mixture model for background subtraction. In 17th International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK (pp. 28–31), August 23–26, 2004. doi:10.1109/ICPR.2004.1333992.
Author information
Authors and Affiliations
Corresponding author
Additional information
Part of this work has been carried out in the scope of the EC co-funded projects ARGOS (FP7-SEC-2012-1) and eWALL (FP7-610658).
Rights and permissions
About this article
Cite this article
Bardas, G., Astaras, S., Diamantas, S. et al. 3D Tracking and Classification System Using a Monocular Camera. Wireless Pers Commun 92, 63–85 (2017). https://doi.org/10.1007/s11277-016-3839-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-016-3839-y