Abstract
In this paper, we present a pipeline for camera pose and trajectory estimation, and image stabilization and rectification for dense as well as wide baseline omnidirectional images. The proposed pipeline transforms a set of images taken by a single hand-held camera to a set of stabilized and rectified images augmented by the computed camera 3D trajectory and a reconstruction of feature points facilitating visual object recognition. The paper generalizes previous works on camera trajectory estimation done on perspective images to omnidirectional images and introduces a new technique for omnidirectional image rectification that is suited for recognizing people and cars in images. The performance of the pipeline is demonstrated on real image sequences acquired in urban as well as natural environments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
2d3. Boujou (2001). http://www.boujou.com.
Akbarzadeh, A., Frahm, J. M., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Merrell, P., Phelps, M., Sinha, S., Talton, B., Wang, L., Yang, Q., Stewénius, H., Yang, R., Welch, G., Towles, H., Nistér, D., & Polleeys, M. (2006). Towards urban 3D reconstruction from video. In 3DPVT, Invited paper.
Bakstein, H., & Pajdla, T. (2002). Panoramic mosaicing with a 180° field of view lens. In OMNIVIS ’02, Copenhagen, Denmark (pp. 60–67).
Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3), 346–359.
Brown, M., & Lowe, D. G. (2003). Recognising panoramas. In ICCV ’03, Washington, DC, USA.
Chum, O., & Matas, J. (2005). Matching with PROSAC—progressive sample consensus. In CVPR ’05, Los Alamitos, USA (Vol. I, pp. 220–226).
Clipp, B. Kim, J.-H., Frahm, J.-M., Pollefeys, M., Hartley, R. (2008). Robust 6DOF motion estimation for non-overlapping, multi-camera systems. In WACV ’08 (Vol. I, pp. 1–8).
Cornelis, N., Cornelis, K., & Van Gool, L. (2006). Fast compact city modeling for navigation pre-visualization. In CVPR ’06, New York, USA (Vol. II, pp. 1339–1344).
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR ’05, Los Alamitos, USA (Vol. I, pp. 886–893).
Davison, A. J., & Molton, N. D. (2007). Monoslam: Real-time single camera SLAM. IEEE Transactions on Patern Analysis and Machine Intelligence, 29(6), 1052–1067.
Ess, A., Leibe, B., Schindler, K., & Van Gool, L. (2008). A mobile vision system for robust multi-person tracking. In CVPR ’08, Anchorage, AK, USA.
Fischler, M., & Bolles, R. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.
Geyer, C., & Daniilidis, K. (2001). Structure and motion from uncalibrated catadioptric views. In CVPR ’01 (pp. 279–286).
Goedemé, T., Nuttin, M., Tuytelaars, T., & Van Gool, L. (2007). Omnidirectional vision based topological navigation. International Journal of Computer Vision, 74(3), 219–236.
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press.
Havlena, M., Pajdla, T., & Cornelis, K. (2008). Structure from omnidirectional stereo rig motion for city modeling. In VISAPP ’08, Funchal, Portugal.
Havlena, M., Torii, A., Knopp, H., & Pajdla, T. (2009). Randomized structure from motion based on atomic 3D models from camera triplets. In CVPR ’09, Miami, FL, USA.
Heller, J., Havlena, M., Torii, A., & Pajdla, T. (2010). CMP SfM web service v1.0. (Research Report CTU–CMP–2010–01). CMP Prague.
Hoiem, D., Efros, A. A., & Hebert, M. (2006). Putting objects in perspective. In CVPR ’06 (Vol. II, pp. 2137–2144).
Kahl, F. (2005). Multiple view geometry and the L-infinity norm. In ICCV ’05, China, Beijing.
Ke, Q., & Kanade, T. (2007). Quasiconvex optimization for robust geometric reconstruction. IEEE Transactions on Patern Analysis and Machine Intelligence, 29(10), 1834–1847.
Knopp, J., Šivic, J., & Pajdla, T. (2009). Location recognition using large vocabularies and fast spatial matching (Research Report CTU–CMP–2009–01). CMP Prague.
Leibe, B., Cornelis, N., Cornelis, K., & Van Gool, L. (2007a). Dynamic 3D scene analysis from a moving vehicle. In CVPR ’07, Minneapolis, MN, USA.
Leibe, B., Schindler, K., & Van Gool, L. (2007b). Coupled detection and trajectory estimation for multi-object tracking. In ICCV ’07, Rio de Janeiro, Brazil.
Li, H., & Hartley, R. (2005). A non-iterative method for correcting lens distortion from nine point correspondences. In OMNIVIS ’05 China: Beijing.
Lourakis, M., & Argyros, A. (2004). The design and implementation of a generic sparse bundle adjustment software package based on the Levenberg-Marquardt algorithm (Technical Report 340). Institute of Computer Science—FORTH, Heraklion, Crete, Greece. http://www.ics.forth.gr/~lourakis/sba.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Martinec, D., & Pajdla, T. (2007). Robust rotation and translation estimation in multiview reconstruction. In CVPR ’07, Minneapolis, MN, USA.
Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767.
Microsoft (2008). Photosynth: Use your camera to stitch the world. http://livelabs.com/photosynth.
Mičušík, B., & Pajdla, T. (2006). Structure from motion with wide circular field of view cameras. IEEE Transactions on Patern Analysis and Machine Intelligence, 28(7), 1135–1149.
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Van Gool, L. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1–2), 43–72.
Muja, M., & Lowe, D. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In VISAPP ’09, Lisboa, Portugal.
Nistér, D. (2004a). An efficient solution to the five-point relative pose problem. IEEE Transactions on Patern Analysis and Machine Intelligence, 26(6), 756–770.
Nistér, D. (2004b). A minimal solution to the generalized 3-point pose problem. In CVPR ’04, Washington, DC, USA (Vol. I, pp. 560–567).
Nistér, D., & Engels, C. (2006). Estimating global uncertainty in epipolar geometry for vehicle-mounted cameras. In SPIE, unmanned systems technology VIII (Vol. 6230).
Obdržálek, Š., & Matas, J. (2002). Object recognition using local affine frames on distinguished regions. In BMVC ’02, London, UK (Vol. I, pp. 113–122).
Obdržálek, Š, & Matas, J. (2003). Image retrieval using local compact DCT-based representation. In LNCS : Vol. 2781. DAGM ’03 (pp. 490–497). Berlin: Springer.
Point Grey Research (2005). Ladybug 2 Spherical Digital Camera System. http://www.ptgrey.com/products/ladybug2.
Scaramuzza, D., Fraundorfer, F., Siegwart, R., & Pollefeys, M. (2008). Closing the loop in appearance guided SfM for omnidirectional cameras. In OMNIVIS ’08, Marseille, France.
Schweighofer, G., & Pinz, A. (2008). Globally optimal O(n) solution to the PnP problem for general camera models. In BMVC ’08, Leeds, UK.
Sivic, J., & Zisserman, A. (2006). Video Google: Efficient visual search of videos. In CLOR ’06 (pp. 127–144).
Snavely, N., Seitz, S., & Szeliski, R. (2006). Photo Tourism: Exploring image collections in 3D. In SigGraph ’06, Boston, USA (pp. 835–846).
Snavely, N., Seitz, S., & Szeliski, R. (2008). Skeletal graphs for efficient structure from motion. In CVPR ’08, Anchorage, AK, USA.
Stewénius, H. (2005). Gröbner basis methods for minimal problems in computer vision. PhD thesis, Centre for Mathematical Sciences LTH, Lund University, Sweden.
Sturm, J. (2006). Sedumi: A software package to solve optimization problems. http://sedumi.ie.lehigh.edu.
Tardif, J., Pavlidis, Y., & Daniilidis, K. (2008). Monocular visual odometry in urban environments using an omdirectional camera. In IROS ’08, Nice, France.
Torii, A., & Pajdla, T. (2008). Omnidirectional camera motion estimation. In VISAPP ’08, Funchal, Portugal.
Torii, A., Havlena, M., Pajdla, T., & Leibe, B. (2008). Measuring camera translation by the dominant apical angle. In CVPR ’08, Anchorage, AK, USA.
Williams, B., Klein, G., & Reid, I. (2007). Real-time SLAM relocalisation. In ICCV ’07, Rio de Janeiro, Brazil.
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Torii, A., Havlena, M. & Pajdla, T. Omnidirectional Image Stabilization for Visual Object Recognition. Int J Comput Vis 91, 157–174 (2011). https://doi.org/10.1007/s11263-010-0350-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-010-0350-x