Skip to main content
Log in

Constant-time monocular object detection using scene geometry

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

This paper presents a structured approach for efficiently exploiting the perspective information of a scene to enhance the detection of objects in monocular systems. It defines a finite grid of 3D positions on the dominant ground plane and computes occupancy maps from which object location estimates are extracted . This method works on the top of any detection method, either pixel-wise (e.g. background subtraction) or region-wise (e.g. detection-by-classification) technique, which can be linked to the proposed scheme with minimal fine tuning. Its flexibility thus allows for applying this approach in a wide variety of applications and sectors, such as surveillance applications (e.g. person detection) or driver assistance systems (e.g. vehicle or pedestrian detection). Extensive results provide evidence of its excellent performance and its ease of use in combination with different image processing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Sobral A, Bouwmans T (2014) BGS library: a library framework for algorithm’s evaluation in foreground/background segmentation. In: Bouwmans T et al (eds) Background modeling and foreground detection for video surveillance. Chapman and Hall/CRC, UK. doi:10.1201/b17223-29

    Chapter  Google Scholar 

  2. Bouwmans T (2015) Traditional and recent approaches in background modeling for foreground detection: an overview. Comput Sci Rev 11–12:31–36

    MATH  Google Scholar 

  3. Cheng L, Gong M (2009) Real time background subtraction from dynamics scenes. In: International conference on computer vision (ICCV). pp 2066–2073

  4. Kryjak T, Komorkiewicz M, Gorgon M (2012) Real-time background generation and foreground object segmentation for high-definition colour video stream in FPGA device. J Real Time Image Proc 9(1):61–77

    Article  Google Scholar 

  5. Del Bimbo A, Lisanti G, Masi I, Pernici F (2010) Person detection using temporal and geometric context with a pan tilt zoom camera. In: 20th International conference on pattern recognition (ICPR). pp 3886–3889

  6. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  7. Ortega JD, Nieto M, Cortes A, Florez J (2013) Perspective multiscale detection of vehicles for real-time forward collision avoidance systems. In: Advanced concepts for intelligent vision systems. Lecture notes in computer science, vol 8192. pp 645–656

    Chapter  Google Scholar 

  8. Carr P, Sheikh Y, Matthews I (2012) Monocular object detection using 3D geometric primitives. In: European conference on computer vision (ECCV). Lecture notes in computer science, vol 7572. pp 864–878

    Chapter  Google Scholar 

  9. Buch N, Cracknell M, Orwell J, Velastin SA (2009) Vehicle localisation and classification in urban CCTV streams. In: 16th World congress on intelligent transport systems

  10. Gonzalez A, Villalonga G, Ros G, Vazquez D, Lopez AM (2015) 3D-guided multiscale sliding window for pedestrian detection. In: Pattern recognition and image analysis. Lecture notes in computer science, vol 9117. pp 560–568

    Chapter  Google Scholar 

  11. Brown L, Feris R, Pankanti S (2014) Temporal non-maximum suppression for pedestrian detection using self-calibration. In: 22nd International conference on pattern recognition (ICPR). pp 2239–2244

  12. Hoeim D, Efros AA, Hebert M (2008) Putting objects in perspective. Int J Comput Vis 80(1):3–15

    Article  Google Scholar 

  13. Pan J, Kanade T (2013) Coherent object detection with 3D geometric context from a single image. In: IEEE international conference on computer vision (ICCV). pp 2576–2583

  14. Bartoli F, Lisanti G, Karaman S, Bagdanov A, Del Bimbo A (2014) Unsupervised scene adaptation for faster multi-scale pedestrian detection. In: 22nd International conference on pattern recognition (ICPR). pp 3534–3539

  15. Cai Y (2006) Robust visual tracking for multiple targets. In: European conference on computer vision (ECCV). pp 107–118

    Chapter  Google Scholar 

  16. Broggi A, Bertozzi M, Fascioli A (2001) Self-calibration of a stereo vision system for automotive applications. IEEE Int Conf Robot Autom (ICRA) 4:3698–3703

    Google Scholar 

  17. Fleuret F, Berclaz J, Lengagne R, Fua P (2008) Multicamera people tracking with a probabilistic occupancy map. IEEE Trans Pattern Anal Mach Intell 30(2):267–282

    Article  Google Scholar 

  18. Benenson R, Omran M, Hosang J, Schiele B (2014) Ten years of pedestrian detection, what have we learned? In: ECCV, CVRSUAD workshop

  19. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE Conf Comput Vis Pattern Recognit (CVPR) 1:886–893

    Google Scholar 

  20. Dollar P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545

    Article  Google Scholar 

  21. Benenson R, Mathias M, Timofte R, Van Gool L (2012) Pedestrian detection at 100 frames per second. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 2903–2910

  22. LeCun Y, Bengio Y, Hinton G (2005) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  23. NVIDIA (2016) DetectNet: deep neural network for object detection in DIGITS. https://devblogs.nvidia.com/parallelforall/detectnet-deep-neural-network-object-detection-digits/

  24. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Conference on neural information processing systems (NIPS)

  25. Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Conference on neural information processing systems (NIPS)

  26. Nieto M, Ortega JD, Cortes A, Gaines S (2014) Perspective multiscale detection and tracking of persons. In: Multimedia modeling. Lecture notes in computer science, vol 8326. pp 92–103

    Chapter  Google Scholar 

  27. Hartley RI, Zisserman A (2004) Multiple view geometry in computer vision. Cambridge University Press, Cambridge

    Book  Google Scholar 

  28. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154

    Article  Google Scholar 

  29. Satzoda RK, Trivedi MM (2014) Efficient lane and vehicle detection with integrated synergies (ELVIS). In: IEEE conference on computer vision and pattern recognition (CVPR) workshops

  30. Benfold B, Reid I (2011) Stable multi-target tracking in real-time surveillance video. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 3457–3464

  31. D’Orazio T, Leo M, Mosca N, Spagnolo P, Mazzeo PL (2009) A semi-automatic system for ground truth generation of soccer video sequences. In: Sixth IEEE international conference on advanced video and signal based surveillance (AVSS). pp 559–564

  32. Blunsden SJ, Fisher RB (2010) The BEHAVE video dataset: ground truthed video for multi-person behavior classification. Ann BMVA 4:1–12

    Article  Google Scholar 

  33. Zivkovic Z (2004) Improved adaptive Gaussian mixture model for background subtraction. In: 17th International conference on pattern recognition (ICPR). pp 28–31

  34. MacFarlane NJB, Schofield CP (1995) Segmentation and tracking of piglets in images. Mach Vis Appl 8(3):187–193

    Article  Google Scholar 

  35. Godbehere AB, Matsukawa A, Goldberg K (2012) Visual tracking of human visitors under variable-lighting conditions for a responsive audio art installation. In: American control conference (ACC). pp 4305–4312

  36. Garrido-Jurado S, Muñoz-Salinas R, Madrid-Cuevas FJ, Marín-Jiménez MJ (2014) Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recogn 47(6):2280–2292

    Article  Google Scholar 

Download references

Acknowledgements

This work has been partially supported by the EU projects SAVASA (Grant Agreement 285621) and P-REACT (Grant Agreement 607881) under the 7th Marco Framework, and by the program Basque Government under projects IAB of the ETORGAI framework and EFITRANS of the ETORTEK framework.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcos Nieto.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nieto, M., Ortega, J.D., Leškovský, P. et al. Constant-time monocular object detection using scene geometry. Pattern Anal Applic 21, 1053–1066 (2018). https://doi.org/10.1007/s10044-017-0625-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-017-0625-8

Keywords

Navigation