Abstract
This paper presents algorithms for tracking unknown objects in the presence of zoom. Since prior models are unavailable, point and line matches in affine views are used to characterize the structure and to transfer a fixation point into new images in a sequence. Because any affine projection matrix is permitted, the intrinsic camera parameters such as focal length may change freely. Also, since the techniques do not require long feature tracks, a further desirable property is insensitivity to partial occlusion caused, for instance, by part of the object falling off the image plane while zooming in. If only point matches are available, a previous method based on factorization is applied. When also incorporating lines, the affine trifocal and quadrifocal tensors are used for tracking in monocular and stereo systems respectively. Methods for computing the tensors, minimizing algebraic error, are developed. In comparison with their projective counterparts, the affine tensors offer significant advantages in terms of computation time and convenience of parameterization, and the relations between the different tensors are shown to be much simpler. Successful tracking is demonstrated on several real image sequences.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Åström, K., Heyden, A., Kahl, F., and Oskarsson, M. 1999. Structure and motion from lines under affine projections. In Proc. 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 285-292.
Baillard, C., Schmid, C., Zisserman, A., and Fitzgibbon, A. 1999. Automatic line matching and 3D reconstruction of buildings from multiple views. In ISPRS Conference on Automatic Extraction of GIS Objects from Digital Imagery, IAPRS, Vol. 32, Part 3-2W5, pp. 69-80.
Blake, A. and Isard, M. 1998. Active Contours. Springer: Berlin.
Blake, A., Isard, M.A., and Reynard, D. 1995. Learning to track the visual motion of contours. Artificial Intelligence, 78:101-134.
Bretzner, L. and Lindeberg, T. 1998. Use your hand as a 3-D mouse, or, relative orientation from extended sequences of sparse point and line correspondences using the affine trifocal tensor. In Proc. 5th European Conf. on Computer Vision, Freiburg, pp. 141- 157.
Canny, J.F. 1983. Finding edges and lines in images. Master's Thesis, MIT.
Cipolla, R. and Blake, A. 1990. The dynamic analysis of apparent contours. In Proc. 3rd Int'l Conf. on Computer Vision, Osaka. IEEE Computer Society Press: Washington, DC, pp.616-632.
Drummond, T. and Cipolla, R. 2000. Real-time tracking of multiple articulated structures in multiple views. In Proc. 6th European Conference on Computer Vision, Dublin, Ireland, pp. II:20-36.
Fairley, S.M., Reid, I.D., and Murray, D.W. 1998. Transfer of fixation using affine structure: Extending the analysis to stereo. International Journal of Computer Vision, 29(1):47-58.
Faugeras, O. and Mourrain, B. 1995. On the geometry and algebra of the point and line correspondences between N images. In Proc. 5th Int'l Conf. on Computer Vision, Cambridge, MA. IEEE Computer Society Press: Los Alamitos, CA, pp. 951-956.
Faugeras, O.D., Luong, Q.-T., and Maybank, S.J. 1992. Camera selfcalibration: Theory and experiments. In Proc. 2nd European Conf. on Computer Vision, G. Sandini (Ed.). Santa Margharita Ligure, Italy, Springer-Verlag: Berlin, pp. 321-334.
Fischler, M.A. and Bolles, R.C. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. Assoc. Comp. Mach., 24(6):381- 395.
Fitzgibbon, A.W. and Zisserman, A. 1998. Automatic camera recovery for closed or open image sequences. In Proc. European Conf.on Computer Vision, Springer-Verlag: Berlin, pp. 311-326.
Hager, G. and Belhumeur, P. 1998. Efficient region tracking with parametric models of geometry and illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10):1025- 1039.
Harris, C.G. 1992. Tracking with rigid models. In Active Vision, A. Blake and A. Yuille (Eds.). MIT Press: Cambridge, MA.
Harris, C.G. and Stephens, M. 1988. A combined corner and edge detector. In Proc. 4th AlveyVision Conf., Manchester, pp. 147-151.
Hartley, R., Gupta, R., and Chang, T. 1992. Stereo from uncalibrated cameras. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 761-764.
Hartley, R.I. 1993. Camera calibration using line correspondences. In Proc. DARPA Image Understanding Workshop, pp. 361- 366.
Hartley, R.I. 1995. In defence of the 8-point algorithm. In Proc. International Conference on Computer Vision, pp. 1064-1070.
Hartley, R.I. 1997. Lines and points in three views and the trifocal tensor. International Journal of Computer Vision, 22(2):125-140.
Hartley, R.I. 1998. Computation of the quadrifocal tensor. In Proc. 5th European Conf. on Computer Vision, Freiburg, Vol. I, pp. 20- 35.
Hartley, R.I. and Sturm, P. 1997. Triangulation. Computer Vision and Image Understanding, 68(2):146-157.
Hartley, R.I. and Zisserman, A. 2000. Multiple View Geometry in Computer Vision. Cambridge University Press.
Hayman, E. 2000. The use of zoom within active vision. D.Phil Thesis, Department of Engineering Science, University of Oxford.
Hayman, E., Reid, I., and Murray, D.W. 1996. Zooming while tracking using affine transfer. In Proc. 7th British Machine Vision Conference, Edinburgh, Vol. 2, pp. 395-404.
Hayman, E., Thórhallsson, T., and Murray, D.W. 1999. Zoominvariant tracking using points and lines in affine views-An application of the affine multifocal tensors. In Proc. 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 269-276.
Heyden, A. 1998a. A common framework for multiple view tensors.In Proc. 5th European Conf. on Computer Vision, Freiburg, Vol. I, pp. 3-19.
Heyden, A. 1998b. Reduced multilinear constraints-Theory and experiments. International Journal of Computer Vision, 30(1).
Inoue, H., Tachikawa, T., and Inaba, M. 1992. Robot vision system with a correlation chip for real-time tracking, optical flow and depth map generation. In Proc. IEEE Int'l Conf. on Robotics and Automation, pp. 1621-1626.
Irani, M., Rousso, B., and Peleg, S. 1994. Computing occluding and transparent motions. International Journal of Computer Vision, 12(1):5-16.
Kahl, F. and Heyden, A. 1999. Affine structure and motion from points, lines and conics. International Journal of Computer Vision, 33(3):1-18.
Kanade, T. and Morris, D.D. 1998. Factorization methods for structure from motion. Philosophical Transactions of the Royal Society of London, SERIES A, 356(1740):1153-1173.
Kass, M., Witkin, A., and Terzopoulos, D. 1987. Snakes: Active contour models. In Proc. 1st Int'l Conf. on Computer Vision, London, IEEE Computer Society Press: Los Alamitos, CA, pp. 259- 268.
Kaucic, R., Hartley, R., and Dano, N. 2001. Plane-based projective reconstruction. In Proc. 8th International Conference on Computer Vision, Vancouver, Canada, pp. I:420-427.
Koenderink, J.J. and van Doorn, A.J. 1991. Affine structure from motion. J. Opt. Soc. Am., A 8(2):377-385.
Longuet-Higgins, H. 1981. A computer algorithm for reconstructing a scene from two projections. Nature, 293:133-135.
Lowe, D.G. 1987. The viewpoint consistency constraint. International Journal of Computer Vision, 1(1):57-72.
McLauchlan, P. 1999. The variable state dimension filter. Centre for Vision, Speech and Signal Processing, University of Surrey, UK, Technical Report VSSP 4/99.
Mendonça, P.R.S. and Cipolla, R. 1998. Analysis and computation of an affine trifocal tensor. In Proc. 9th British Machine Vision Conf., M. Nixon and J. Carter (Eds.). Southampton, pp. 125- 133.
Morris, D.D. and Kanade, T. 1998. A unified factorization algorithm for points, line segments and planes with uncertainty models. In Proc. 6th International Conference on Computer Vision, Bombay, India, pp. 696-702.
Murray, D.W., Bradshaw, K.J., McLauchlan, P.F., Reid, I.D., and Sharkey, P.M. 1995. Driving saccade to pursuit using image motion. International Journal of Computer Vision, 16(3):205-228.
Pahlavan, K., Uhlin, T., and Eklundh, J.-O. 1996. Dynamic fixation and active perception. International Journal of Computer Vision, 16(2):113-135.
Quan, L. 1996. Self-calibration of an affine camera from multiple views. International Journal of Computer Vision, 19(1):93-105.
Quan, L. and Kanade, T. 1997. Affine structure from line correspondences with uncalibrated affine cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(8):834-845.
Quan, L., Ohta, Y., and Mohr, R. 1998. Geometry of multiple affine views. In Proc. European Workshop on 3D Structure from Multiple Images of Large-Scale Environments (SMILE'98), Freiburg, Germany, Vol. 1506 of Lecture Notes in Computer Science. Springer-Verlag: Berlin, pp. 32-46.
Reid, I.D. and Murray, D.W. 1993. Tracking foveated corner clusters using affine structure. In Proc. 4th Int'l Conf. on Computer Vision, Berlin, IEEE Computer Society Press: Los Alamitos CA, pp. 76- 83.
Reid, I.D. and Murray, D.W. 1996. Active tracking of foveated feature clusters using affine structure. International Journal of Computer Vision, 18(1):41-60.
Rother, C. and Carlsson, S. 2001. Linear multi view reconstruction and camera recovery. In Proc. 8th International Conference on Computer Vision, Vancouver, Canada, pp. I:42-49.
Schmid, C. and Zisserman, A. 1997. Automatic line matching across views. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 666-671.
Shapiro, L.S., Zisserman, A., and Brady, M. 1995. 3D motion recovery via affine epipolar geometry. International Journal of Computer Vision, 16(2):147-182.
Shashua, A. 1994. Trilinearity in visual recognition by alignment. In Proc. 3rd European Conf. on Computer Vision, Stockholm, Vol. 1, pp. 479-484.
Shashua, A. 1995. Algebraic functions for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):779- 789.
Shashua, A. and Avidan, S. 2000. On the reprojection of 3D and 2D scenes without explicit model selection. In Proc. 6th European Conference on Computer Vision, Dublin, Ireland, pp. I:936- 949.
Shashua, A. and Wolf, L. 2000. On the structure and properties of the quadrifocal tensor. In Proc. 6th European Conference on Computer Vision, Dublin, Ireland, pp. I:710-724.
Spetsakis, M.E. and Aloimonos, J. 1990. Structure from motion using line correspondences. International Journal of Computer Vision, 4(3):171-183.
Thórhallsson, T. 2000. Object symmetry in multiple affine views. D.Phil Thesis, Department of Engineering Science, University of Oxford.
Thórhallsson, T. and Murray, D.W. 1999. The tensors of three affine views. In Proc. IEEE Conf. on ComputerVision andPattern Recognition. Fort Collins, IEEE Computer Society Press: Los Alamitos, CA, pp. 450-456.
Tomasi, C. and Kanade, T. 1992. Shape and motion from image streams under orthography: A factorization approach. International Journal of Computer Vision, 9(2):137-154.
Tordoff, B.J. and Murray, D.W. 2002. Guided sampling and consensus for motion estimation. In Proc. 7th European Conference on Computer Vision, Copenhagen, Denmark, pp. I:82-96.
Torr, P. 1995. Motion segmentation and outlier detection. D.Phil Thesis, Dept of Engineering Science, Oxford University.
Triggs, W. 1995. The geometry of projective reconstruction I: Matching constraints and the joint image. Unpublished Report.
Triggs, W. 1996. Factorization methods for projective structure and motion. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 845-851.
Triggs, W. 2000. Plane+parallax, tensors and factorization. In Proc. 6th European Conference on Computer Vision, Dublin, Ireland, pp. 522-538.
Uhlin, T., Nordlund, P., Maki, A., and Eklundh, J. 1995. Towards an active visual observer. In Proc. 5th International Conference on Computer Vision, Boston, pp. 679-686.
Viéville, T. and Luong, Q.-T. 1993. Motion of points and lines in the uncalibrated case. I.N.R.I.A., Technical Report 2054.
Weng, J., Ahuja, N., and Huang, T.S. 1992. Motion and structure from line correspondences: Closed-form solution, uniqueness, and optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(3):318-336.
Yuille, A. and Hallinan, P. 1992. Deformable templates. In Active Vision, A. Blake and A. Yuille (Eds.). MIT Press: Cambridge, MA, pp. 21-38.
Zhang, Z. 1994. Token tracking in a cluttered scene. Image and Vision Computing, 12(2):110-120.
Zisserman, A. 1992. Notes on geometric invariance in vision. British Machine Vision Conference, Tutorial.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hayman, E., Thórhallsson, T. & Murray, D. Tracking While Zooming Using Affine Transfer and Multifocal Tensors. International Journal of Computer Vision 51, 37–62 (2003). https://doi.org/10.1023/A:1020988723254
Issue Date:
DOI: https://doi.org/10.1023/A:1020988723254