Skip to main content
Log in

Object Detection and Localization by Dynamic Template Warping

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

A simple method is presented for detecting, localizing and recognizing instances of classes of objects, while accommodating a wide variation in an object's pose. The method utilizes a small two-dimensional template that is warped into an image, and converts localization to a one-dimensional sub-problem, with the search for a match between image and template executed by dynamic programming. For roughly cylindrical objects (like heads), the method recovers three of the six degrees of freedom of motion (2 translation, 1 rotation), and accommodates two more degrees of freedom in the search process (1 rotation, 1 translation). Experiments demonstrate that the method provides an efficient search strategy that outperforms normalized correlation. This is demonstrated in the example domain of face detection and localization, and can extended to more general detection tasks. An additional technique recovers rough object pose from the match results, and is used in a two stage recognition experiment in conjunction with maximization of mutual information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baker, H.H. and Binford, T.O. 1981. Depth from edge and intensity based stereo. In Proc. 7th IJCAI, pp. 631–636.

  • Ballard, D. and Brown, C. 1982. In Computer Vision. Prentice Hall.

  • Barrow, H.G. 1976. Interactive aids for cartography and photo interpretation. SRI Tech. Report, SRI International.

  • Betke, M. and Makris, N. 1995. Fast object recognition in noisy images using simulated annealing. In Proc. Int. Conf. on Computer Vision, pp. 523–530.

  • Beymer, D. 1993. Face recognition under varying pose. AI Memo 1461, Artificial Intelligence Lab at MIT, Cambridge, MA.

    Google Scholar 

  • Brunelli, R. and Poggio, T. 1993. Face recognition: Features versus templates. IEEE Trans. on Pattern Analysis and Machine Intelligence, 15(10):1042–1052.

    Google Scholar 

  • Cootes, T.F., Taylor, C.J., Lanitis, A., Cooper D.H., Graham, J. 1993. Building and using flexible models incorporating gray level information. In Proc. Int. Conf. on Computer Vision, Berlin, pp. 242–246.

  • Corman, T., Leiserson, C., and Rivest, R. 1990. Introduction to Algorithms. McGraw Hill.

  • Cyberware Incorporated Monterey, CA.

  • Forney, G.D. 1973. The Viterbi algorithm. In Proceedings IEEE, Vol. 61, pp. 268–278.

    Google Scholar 

  • Hornegger, J. 1995. Statistical learning, localization and identification of objects. In Proc. Int. Conf. on Computer Vision, Cambridge, MA, pp. 914–919.

  • Huttenlocher, D.P. and Ullman, S. 1990. Recognizing solid objects by alignment with an image. Int. Journal of Computer Vision 5(2):195–212.

    Google Scholar 

  • Huttenlocher, D.P., Lilien, R., and Olson, C. 1996. Object recognition using subspace methods. In Proc. European Conf. on Computer Vision, pp. 537–545.

  • Mahmood, S.T.F. and Zhu, W. 1998. Image organization and retrieval using a flexible shape model. In Proc. of Content Based Access of Image and Video Libraries.

  • Murase, H. and Nayar, S. 1995. Learning and recognition of 3-d objects from brightness images. AAAI Fall Symposium Series Working Notes, AAAI.

  • Press, W. and Flannery, B. et al. 1990. Numerical Recipes in C. Cambridge University Press.

  • Ohta, Y. and Kanade, T. 1985. Stereo by intra and inter-scanline search using dynamic programming. IEEE Trans. on Pattern Analysis and Machine Intelligence, 7(2).

  • Pentland, A., Moghaddam, B., and Starner, T. 1994. View-based and modular eigenspaces for face recognition. In Proc. Computer Vision and Pattern Recognition, pp. 84–91.

  • Romano, R., Beymer, D., and Poggio, T. 1996. Face verification for real time applications. ARPA, IU Workshop, Vol. 1.

  • Rowley, H., Baluja, S., and Kanade, T. 1995. Human face detection in visual scenes. CMU-CS-95-158R, Carnegie Mellon University, Pittsburg, PA.

    Google Scholar 

  • Rowley, H., Baluja, S., and Kanade, T. 1998. Rotation invariant neural-network based face detection. In Proc. Computer Vision and Pattern Recognition.

  • Rucklidge, J. 1994. Locating objects using the hausdorff distance. Proc. Int. Conference on Computer Vision, pp. 457–464.

  • Schneiderman, H. and Kanade, T. 1998. Probabilistic modeling of local appearance and spatial relationships for object recognition. In Proc. Computer Vision and Pattern Recognition.

  • Sakoe, H. and Chiba, S. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoustics, Speech, and Signal Proc., Vol. ASSP-26, pp. 43–49.

    Google Scholar 

  • Shashua, A. and Ullman, S. 1988. Structural saliency: The detection of globally salient structures using a locally connected network. In Proc. Int. Conference on Computer Vision, pp. 321–327.

  • Sinha, P. 1994. Object recognition via image invariants: Acase study. In Investigative Opthamology and Visual Science, Florida.

  • Sung, K. and Poggio, T. 1994. Example based learning for viewbased human face detection. AI Memo 1521, MIT. Cambridge, MA.

    Google Scholar 

  • Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1).

  • Ullman, S. and Basri, R. 1991. Recognition by linear combination of models. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(10).

  • Vaillant, R., Monrocq, C., and Le Cun, Y. 1994. Original approach for the localization of objects in images. IEEE Proc. on Vision, Image and Signal Processing, Vol. 141, No.4.

  • Viola, P. and Wells, W.M. 1995. Alignment by maximization of mutual information. In Proc. Int. Conference on Computer Vision, Cambridge, MA.

  • Viterbi, A.J. 1967. Error bounds for convolution codes and an asymptotically optimal decoding algorithm. IEEE Trans. on Information Theory, IT-13:260–269.

    Google Scholar 

  • Yuille, A., Hallinan, P., and Cohen, D. 1992. Feature extraction from faces using deformable templates. Int. Journal of Computer Vision, 8(2):99–111.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ratan, A.L., Grimson, W.E.L. & Wells, W.M. Object Detection and Localization by Dynamic Template Warping. International Journal of Computer Vision 36, 131–147 (2000). https://doi.org/10.1023/A:1008147915077

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008147915077

Navigation