Abstract
This paper presents a structured approach for efficiently exploiting the perspective information of a scene to enhance the detection of objects in monocular systems. It defines a finite grid of 3D positions on the dominant ground plane and computes occupancy maps from which object location estimates are extracted . This method works on the top of any detection method, either pixel-wise (e.g. background subtraction) or region-wise (e.g. detection-by-classification) technique, which can be linked to the proposed scheme with minimal fine tuning. Its flexibility thus allows for applying this approach in a wide variety of applications and sectors, such as surveillance applications (e.g. person detection) or driver assistance systems (e.g. vehicle or pedestrian detection). Extensive results provide evidence of its excellent performance and its ease of use in combination with different image processing techniques.
Similar content being viewed by others
References
Sobral A, Bouwmans T (2014) BGS library: a library framework for algorithm’s evaluation in foreground/background segmentation. In: Bouwmans T et al (eds) Background modeling and foreground detection for video surveillance. Chapman and Hall/CRC, UK. doi:10.1201/b17223-29
Bouwmans T (2015) Traditional and recent approaches in background modeling for foreground detection: an overview. Comput Sci Rev 11–12:31–36
Cheng L, Gong M (2009) Real time background subtraction from dynamics scenes. In: International conference on computer vision (ICCV). pp 2066–2073
Kryjak T, Komorkiewicz M, Gorgon M (2012) Real-time background generation and foreground object segmentation for high-definition colour video stream in FPGA device. J Real Time Image Proc 9(1):61–77
Del Bimbo A, Lisanti G, Masi I, Pernici F (2010) Person detection using temporal and geometric context with a pan tilt zoom camera. In: 20th International conference on pattern recognition (ICPR). pp 3886–3889
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Ortega JD, Nieto M, Cortes A, Florez J (2013) Perspective multiscale detection of vehicles for real-time forward collision avoidance systems. In: Advanced concepts for intelligent vision systems. Lecture notes in computer science, vol 8192. pp 645–656
Carr P, Sheikh Y, Matthews I (2012) Monocular object detection using 3D geometric primitives. In: European conference on computer vision (ECCV). Lecture notes in computer science, vol 7572. pp 864–878
Buch N, Cracknell M, Orwell J, Velastin SA (2009) Vehicle localisation and classification in urban CCTV streams. In: 16th World congress on intelligent transport systems
Gonzalez A, Villalonga G, Ros G, Vazquez D, Lopez AM (2015) 3D-guided multiscale sliding window for pedestrian detection. In: Pattern recognition and image analysis. Lecture notes in computer science, vol 9117. pp 560–568
Brown L, Feris R, Pankanti S (2014) Temporal non-maximum suppression for pedestrian detection using self-calibration. In: 22nd International conference on pattern recognition (ICPR). pp 2239–2244
Hoeim D, Efros AA, Hebert M (2008) Putting objects in perspective. Int J Comput Vis 80(1):3–15
Pan J, Kanade T (2013) Coherent object detection with 3D geometric context from a single image. In: IEEE international conference on computer vision (ICCV). pp 2576–2583
Bartoli F, Lisanti G, Karaman S, Bagdanov A, Del Bimbo A (2014) Unsupervised scene adaptation for faster multi-scale pedestrian detection. In: 22nd International conference on pattern recognition (ICPR). pp 3534–3539
Cai Y (2006) Robust visual tracking for multiple targets. In: European conference on computer vision (ECCV). pp 107–118
Broggi A, Bertozzi M, Fascioli A (2001) Self-calibration of a stereo vision system for automotive applications. IEEE Int Conf Robot Autom (ICRA) 4:3698–3703
Fleuret F, Berclaz J, Lengagne R, Fua P (2008) Multicamera people tracking with a probabilistic occupancy map. IEEE Trans Pattern Anal Mach Intell 30(2):267–282
Benenson R, Omran M, Hosang J, Schiele B (2014) Ten years of pedestrian detection, what have we learned? In: ECCV, CVRSUAD workshop
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE Conf Comput Vis Pattern Recognit (CVPR) 1:886–893
Dollar P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Benenson R, Mathias M, Timofte R, Van Gool L (2012) Pedestrian detection at 100 frames per second. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 2903–2910
LeCun Y, Bengio Y, Hinton G (2005) Deep learning. Nature 521(7553):436–444
NVIDIA (2016) DetectNet: deep neural network for object detection in DIGITS. https://devblogs.nvidia.com/parallelforall/detectnet-deep-neural-network-object-detection-digits/
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Conference on neural information processing systems (NIPS)
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Conference on neural information processing systems (NIPS)
Nieto M, Ortega JD, Cortes A, Gaines S (2014) Perspective multiscale detection and tracking of persons. In: Multimedia modeling. Lecture notes in computer science, vol 8326. pp 92–103
Hartley RI, Zisserman A (2004) Multiple view geometry in computer vision. Cambridge University Press, Cambridge
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Satzoda RK, Trivedi MM (2014) Efficient lane and vehicle detection with integrated synergies (ELVIS). In: IEEE conference on computer vision and pattern recognition (CVPR) workshops
Benfold B, Reid I (2011) Stable multi-target tracking in real-time surveillance video. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 3457–3464
D’Orazio T, Leo M, Mosca N, Spagnolo P, Mazzeo PL (2009) A semi-automatic system for ground truth generation of soccer video sequences. In: Sixth IEEE international conference on advanced video and signal based surveillance (AVSS). pp 559–564
Blunsden SJ, Fisher RB (2010) The BEHAVE video dataset: ground truthed video for multi-person behavior classification. Ann BMVA 4:1–12
Zivkovic Z (2004) Improved adaptive Gaussian mixture model for background subtraction. In: 17th International conference on pattern recognition (ICPR). pp 28–31
MacFarlane NJB, Schofield CP (1995) Segmentation and tracking of piglets in images. Mach Vis Appl 8(3):187–193
Godbehere AB, Matsukawa A, Goldberg K (2012) Visual tracking of human visitors under variable-lighting conditions for a responsive audio art installation. In: American control conference (ACC). pp 4305–4312
Garrido-Jurado S, Muñoz-Salinas R, Madrid-Cuevas FJ, Marín-Jiménez MJ (2014) Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recogn 47(6):2280–2292
Acknowledgements
This work has been partially supported by the EU projects SAVASA (Grant Agreement 285621) and P-REACT (Grant Agreement 607881) under the 7th Marco Framework, and by the program Basque Government under projects IAB of the ETORGAI framework and EFITRANS of the ETORTEK framework.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nieto, M., Ortega, J.D., Leškovský, P. et al. Constant-time monocular object detection using scene geometry. Pattern Anal Applic 21, 1053–1066 (2018). https://doi.org/10.1007/s10044-017-0625-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-017-0625-8