Abstract
This paper addresses the problem of estimating a camera motion from a non-calibrated monocular camera. Compared to existing methods that rely on restrictive assumptions, we propose a method which can estimate camera motion with much less restrictions by adopting new example-based techniques compensating the lack of information. Specifically, we estimate the focal length of the camera by referring to visually similar training images with which focal lengths are associated. For one step camera estimation, we refer to stationary points (landmark points) whose depths are estimated based on RGB-D candidates. In addition to landmark points, moving objects can be also used as an information source to estimate the camera motion. Therefore, our method simultaneously estimates the camera motion for a video, and the 3D trajectories of objects in this video by using Reversible Jump Markov Chain Monte Carlo (RJ-MCMC) particle filtering. Our method is evaluated on challenging datasets demonstrating its effectiveness and efficiency.
Similar content being viewed by others
References
Boukhers Z, Shirahama K, Li F, Grzegorzek M (2015) Extracting 3d trajectories of objects from 2d videos using particle filter. In: International conference on multimedia retrieval (ICMR), pp 83–90
Buczko M, Willert V (2016) How to distinguish inliers from outliers in visual odometry for high-speed automotive applications IEEE symposium on intelligent vehicles (IV), pp 478–483
Cao L, Wang C, Li J (2015) Robust depth-based object tracking from a moving binocular camera. Signal Process 112:154–161
Choi W, Savarese S (2010) Multiple target tracking in world coordinate with single, minimally calibrated camera, pages 553–567
Choi W, Pantofaru C, Savarese S (2013) A general framework for tracking multiple people from a moving camera. IEEE Trans Pattern Anal Mach Intell (PAMI) 35(7):1577–1591
Cvišié I, Petrović I (2015) Stereo odometry based on careful feature selection and tracking. In: European conference on mobile robots (ECMR), pp 1–6
Engel J, Sturm J, Cremers D (2013) Semi-dense visual odometry for a monocular camera. In: IEEE International conference on computer vision (ICCV), pp 1449–1456
Ess A, Leibe B, Schindler K, van Gool L (2008) A mobile vision system for robust multi-person tracking. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1–8
Ess A, Leibe B, Schindler K, Van Gool L (2009) Robust multiperson tracking from a mobile platform. IEEE Trans Pattern Anal Mach Intell (PAMI) 31 (10):1831–1846
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Frost DP, Kähler O. (2012) D. W. Murray. Object-aware bundle adjustment for correcting monocular scale drift. In: IEEE International conference on robotics and automation (ICRA), pp 4770–4776
Garcia J, Gardel A, Bravo I, Lazaro J, Martinez M (2013) Tracking people motion based on extended condensation algorithm. IEEE Trans Syst Man Cybern Syst 43(3):606–618
Geiger A, Ziegler J, Stiller C (2011) Stereoscan: Dense 3d reconstruction in real-time. In: IEEE Symposium on intelligent vehicles, pp 963–968
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 3354–3361
Grigorescu SM, Macesanu G, Cocias TT, Puiu D, Moldoveanu F (2011) Robust camera pose and scene structure analysis for service robotics. Robot Auton Syst 59(11):899–909
Gutierrez-Gomez D, Mayol-Cuevas W, Guerrero J (2015) Inverse depth for accurate photometric and geometric error minimisation in rgb-d dense visual odometry. In: IEEE International conference on robotics and automation (ICRA), pp 83–89
Handa A, Whelan T, McDonald J, Davison A (2014) A benchmark for rgb-d visual odometry, 3d reconstruction and slam. In: IEEE International conference on robotics and automation (ICRA), pp 1524–1531
Hartley RI, Zisserman A (2004) Multiple view geometry in computer vision, Second edition. Cambridge University Press, Cambridge. ISBN: 0521540518
Hoiem D, Efros AA, Hebert M (2008) Putting objects in perspective. Int J Comput Vis 80(1):3–15
Jafari O, Mitzel D, Leibe B (2014) Real-time rgb-d based people detection and tracking for mobile robots and head-worn cameras IEEE International conference on robotics and automation (ICRA), pp 5636–5643
Jaimez M, Gonzalez-Jimenez J (2015) Fast visual odometry for 3-d range sensors. IEEE Trans Robot 31(4):809–822
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv:1408.5093
Karsch K, Liu C, Kang SB (2016) Depth transfer: depth extraction from videos using nonparametric sampling pages 173–205
Kerl C, Sturm J, Cremers D (2013) Robust odometry estimation for rgb-d cameras. In: IEEE International conference on robotics and automation (ICRA), pp 3748–3754
Kerl C, Stuckler J, Cremers D (2015) Dense continuous-time tracking and mapping with rolling shutter rgb-d cameras. In: IEEE International conference on computer vision (ICCV), pp 2264–2272
Khan Z, Balch T, Dellaert F (2005) Mcmc-based particle filtering for tracking a variable number of interacting targets. IEEE Trans Pattern Anal Mach Intell (PAMI) 27(11):1805–1819
Liu R, Li Z, Jia J (2008) Image partial blur detection and classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1–8
Liu C, Yuen J, flow A. Torralba. (2011) Sift Dense correspondence across scenes and its applications. IEEE Trans Pattern Anal Mach Intell (PAMI) 33(5):978–994
Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: International joint conference on artificial intelligence, pp 674–679
Micusik B, Pajdla T (2003) Estimation of omnidirectional camera model from epipolar geometry. In: IEEE conference on computer vision and pattern recognition (CVPR), vol 1, pp 485–490
Mirabdollah MH, Mertsching B (2014) On the second order statistics of essential matrix elements. In: German conference on pattern recognition, pp 547–557
Mirabdollah H, Mertsching B (2015) Fast techniques for monocular visual odometry. In: German conference on pattern recognition (GCPR), pp 297–307
Morais E, Ferreira A, Cunha SA, Barros RM, Rocha A, Goldenstein S (2014) A multiple camera methodology for automatic localization and tracking of futsal players. Pattern Recogn Lett 39:21–30
Nardi L, Bodin B, Zia MZ, Mawer J, Nisbet A, Kelly PHJ, Davison AJ, Luján M, O’Boyle MFP, Riley G, Topham N, Furber S (2015) Introducing slambench, a performance and accuracy benchmarking methodology for slam IEEE International conference on robotics and automation (ICRA), pp 5783–5790
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Persson M, Piccini T, Mester R, Felsberg M (2015) Robust stereo visual odometry from monocular techniques. In: IEEE Symposium on intelligent vehicles (IV), pp 686–691
Rosten E, Drummond T (2005) Fusing points and lines for high performance tracking. In: IEEE International conference on computer vision (ICCV), pp 1508–1515
Saisan P, Medasani S, Owechko Y (2005) Multi-view classifier swarms for pedestrian detection and tracking. In: IEEE Conference on computer vision and pattern recognition (CVPR) - Workshops, p 18
Salas-Moreno RF, Glocken B, Kelly PHJ, Davison AJ (2014) Dense planar slam. In: IEEE International symposium on mixed and augmented reality (ISMAR), pp 157–164
Saxena A, Sun M, Ng AY (2009) Make3d: learning 3d scene structure from a single still image. IEEE Trans Pattern Anal Mach Intell (PAMI) 31(5):824–840
Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3d. ACM Trans Graph 25(3):835–846
Song S, Chandraker M (2014) Robust scale estimation in real-time monocular sfm for autonomous driving. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1566–1573
Song S, Xiao J, 233–240 (2013) Tracking revisited using rgbd camera: unified benchmark and baselines. In: IEEE International conference on computer vision (ICCV)
Vedaldi A, Fulkerson B (2008) VLFeat: an open and portable library of computer vision algorithms. http://www.vlfeat.org/
Wojek C, Walk S, Roth S, Schiele B (2011) Monocular 3d scene understanding with explicit occlusion reasoning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1993–2000
Wojek C, Walk S, Roth S, Schindler K, Schiele B (2013) Monocular visual scene understanding: Understanding multi-object traffic scenes. IEEE Trans Pattern Anal Mach Intell (PAMI) 35(4):882–897
Wu S, Oreifej O, Shah M (2011) Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In: 2011 International conference on computer vision, pp 1419–1426
Xiang Y, Song C, Savarese S (2014) Monocular multiview object tracking with 3D aspect parts, pages 220–235
Xu C, Cetintas S, Lee K, Li L (2014) Visual sentiment prediction with deep convolutional neural networks. CoRR
Xue H, Liu Y, Cai D, He X (2016) Tracking people in rgbd videos using deep learning and motion clues. Neurocomputing 204:70–76
Zhang J, Singh S (2015) Visual-lidar odometry and mapping: low drift, robust, and fast. In: IEEE International conference on robotics and automation(ICRA), pp 2174–2181
Zhang S, Yu X, Sui Y, Zhao S, Zhang L (2015) Object tracking with multi-view support vector machines. IEEE Trans Multimedia 17(3):265–278
Zhou Q-Y, Koltun V (2015) Depth camera tracking with contour cues. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 632–638
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Boukhers, Z., Shirahama, K. & Grzegorzek, M. Less restrictive camera odometry estimation from monocular camera. Multimed Tools Appl 77, 16199–16222 (2018). https://doi.org/10.1007/s11042-017-5195-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5195-7