Elsevier

Pattern Recognition

Volume 36, Issue 9, September 2003, Pages 2127-2141
Pattern Recognition

A 2D/3D model-based object tracking framework

https://doi.org/10.1016/S0031-3203(03)00041-4Get rights and content

Abstract

This paper presents a robust framework for tracking complex objects in video sequences. Multiple hypothesis tracking (MHT) algorithm reported in (IEEE Trans. Pattern Anal. Mach. Intell. 18(2) (1996)) is modified to accommodate a high level representations (2D edge map, 3D models) of objects for tracking. The framework exploits the advantages of MHT algorithm which is capable of resolving data association/uncertainty and integrates it with object matching techniques to provide a robust behavior while tracking complex objects. To track objects in 2D, a 4D feature is used to represent edge/line segments and are tracked using MHT. In many practical applications 3D models provide more information about the object's pose (i.e., rotation information in the transformation space) which cannot be recovered using 2D edge information. Hence, a 3D model-based object tracking algorithm is also presented. A probabilistic Hausdorff image matching algorithm is incorporated into the framework in order to determine the geometric transformation that best maps the model features onto their corresponding ones in the image plane. 3D model of the object is used to constrain the tracker to operate in a consistent manner. Experimental results on real and synthetic image sequences are presented to demonstrate the efficacy of the proposed framework.

Introduction

Designing a robust object tracking framework requires a reliable extraction of features, combining features into object and an accurate estimation process that is robust to background clutter and occlusion. Many methods have been developed for this purpose including probabilistic methods [2], [3] as well as non-probabilistic methods [4]. There has been some important work in multiple object tracking in the domain of person tracking [5], [6], [7] which consider blob representation for objects making it difficult to distinguish objects when they come closer and are occluded in the image. This makes it difficult to assign unique track label for different objects which would be very important for some applications such as surveillance and target tracking. To prevent this problem, the tracking framework needs to be capable of solving data association and uncertainty problems caused by background clutter, closely-spaced features or objects. Multiple hypothesis tracking (MHT) algorithm [1], [8] is one of the solutions to multiple object tracking and data association uncertainty problems. The MHT algorithm provides a Bayesian framework for motion analysis of multiple objects. The algorithm has the advantage of handling statistical data association problems such as track initiation, termination, continuation and assigning measurements to proper tracks.

Choosing an optimal object representation is the key in designing a robust and efficient tracking framework. A higher level representation of object is important to track complex objects in natural environments. For example, an object can be represented by a point (x,y) which gives only the position information in 2D space. While this representation is simple, compact, relatively easy to extract and even sufficient for many applications such as target tracking, it is very sensitive to noise and it does not contain enough information about the shape and appearance of the objects being tracked. Hence, it is required to have more information than only position information for many applications such as creating 3D models from image sequences. In this case, shape and viewpoint information are also required.

A model-based tracking requires prior knowledge of the shape and appearance of the objects of interest. For this purpose, a model of the object is needed. Once the model is available, reliable identification can be made by determining consistent partial matches between the models and features extracted from image sequences. To achieve this, image matching based on Hausdorff measure is employed. Matching images based on Hausdorff measure has been used for many applications [9], [10]. While standard methods have been largely successful, they have lacked a probabilistic formulation of the matching process and this has made it difficult to incorporate probabilistic information, such as feature uncertainties and prior probability of model positions, into these applications.

In this paper, we present a framework for tracking single as well as multiple objects in image sequences. First, an edge-based feature tracking method in 4D space using MHT algorithm is introduced. Then, a framework for localizing and tracking multiple complex objects in video sequences is described. The framework takes advantage of the MHT algorithm which is capable of tracking multiple edge-based features with limited occlusions and is suitable for resolving any data association uncertainty caused by closely spaced features. Hausdorff matching algorithm which employs a model-based matching that preserves the shape and viewpoint information of the objects is integrated into the framework to provide more robust tracking.

The rest of the paper is organized as follows: Section 2 gives a brief overview on related works. In Section 3, we present our 2D model-based tracking framework, subsequently in Section 4, the 3D model-based tracking framework is described. Section 5 provides experimental results with evaluation and quantitative analysis. The conclusions and discussions are given in Section 6. Finally, we give background information on multiple hypothesis tracking and Hausdorff image matching in Appendix section.

Section snippets

Related work

Object models can be used in determining the object's motion [11], [12], [13] if geometric models are known a priori. If multiple viewpoints of the object are available [14], [15], tracking results from each viewpoint can be integrated to have a better tracking performance. In Ref. [11], tracking a planar patch, simultaneously estimating the pose of the patch and the texture appearing on the planar portion of the object is presented. The research in model-based object tracking not only focuses

2D model-based tracking framework

We describe a novel framework for localizing and tracking multiple complex objects in video sequences. The framework utilizes multi-dimensional MHT algorithm (see Appendix for details) to track multiple line segments that represent objects of interest in the scene. These individual segments are matched and combined into object level using Hausdorff image matching algorithm (see Appendix for details) based on given 2D object models. The Hausdorff matching takes unorganized individual features

3D model-based tracking framework

In some cases 3D models of objects are available. 3D models provide more information about the object's pose (i.e., rotation transformation) which cannot be recovered using edge information. Using 3D models as a known priori constrains the feature trackers to operate in a consistent manner. A 3D model-based object tracking framework is presented in this section. The framework employs known geometric, dynamic and appearance models of objects during tracking. The tracking problem becomes the

Experimental results

The objective of the framework is to achieve efficient and reliable tracking of objects based on their given 2D/3D edge-based models. In Section 5.1, we have demonstrated the tracking results of line segments in 4D space using a synthetic image sequence. Subsequently, in Section 5.2, the experimental results were presented using real image sequences for the 2D model-based tracking framework. We applied our framework to different domain of applications to demonstrate the broad range of

Conclusions

This paper presents a robust framework for model-based tracking of complex objects in an unstructured environment. A high level representation of object (2D edge map, 3D model) is used for tracking by combining MHT along with Hausdorff image matching algorithm. In case of 2D model-based tracking, MHT is used to track individual line segments and Hausdorff measure for image matching is used to find a match of the model in the image plane. Integration of MHT algorithm which is suitable for

Summary

This paper presents a tracking framework that employs high level features (2D edge map, 3D models) to represent objects in video sequences for tracking. Choosing an optimal object representation is the key in designing a robust and efficient tracking framework. Representing objects by a point is not sufficient for many tracking applications since the shape and the viewpoint of the objects cannot be recovered. In addition, using more complicated features provides more information about the

Acknowledgements

This work was supported in part by NSF CAREER Grant IIS-97-33644 and NSF Grant IIS-0081935.

About the AuthorEDIZ POLAT received his Ph.D. and M.S. degrees from Pennsylvania State University in 2002 and in 1996 respectively. He was a member of Computer Vision Group in Computer Science and Engineering Department at Penn State where his research focused on computer vision, in particular, detection and visual tracking of objects in video sequences. He is currently an Assistant Professor in Electrical and Electronics Engineering Department at Kirikkale University, Turkey.

References (35)

  • I.J. Cox et al.

    An efficient implementation of Reid's multiple hypothesis tracking algorithm and its evaluation for the purpose of visual tracking

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1996)
  • J. MacCormick, A. Blake, A probabilistic exclusion principle for tracking multiple objects, in: Proceedings of the IEEE...
  • C. Rasmussen et al.

    Probabilistic data association methods for tracking complex visual objects

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2001)
  • I. Haritaoglu et al.

    W4real-time surveillance of people and their activities

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • E. Polat, M. Yeasin, R. Sharma, Tracking body parts of multiple people: a new approach, in: IEEE International...
  • S. Intille, J. Davis, A. Bobick, Real-time closed-world tracking, in: Proceedings of the IEEE Computer Society...
  • W.E.L. Grimson, C. Stauffer, R. Romano, L. Lee, Using adaptive tracking to classify and monitor activities in a site,...
  • D.B. Reid

    An algorithm for tracking multiple targets

    IEEE Trans. Automatic Control

    (1979)
  • D.P. Huttenlocher, M.E. Leventon, W.J. Rucklidge, Visually-guided navigation by comparing two-dimensional edge images,...
  • C.F. Olson, Mobile robot self localization by iconic matching of range maps, in: International Conference on Advanced...
  • F. Dellaert, S. Thurn, C. Thorpe, Jacobian images of super-resolved texture maps for model-based motion estimation and...
  • S. Dettmer, A. Seetharamaiah, L. Wang, M. Shah, Model based approach for recognizing human activities from video...
  • L. Goncalves, E. Dibernardo, E. Ursella, P. Perona, Monocular tracking of human arm in 3-d, in: Proceedings of the...
  • D.M. Gavrila, L.S. Davis, 3d model-based tracking of humans in action: a multi-view approach, in: Proceedings of the...
  • I. Kakadiaris, D. Metaxas, Model-based estimation of 3-d human motion with occlusion based on active multi-viewpoint...
  • M.J. Black et al.

    Eigentrackingrobust matching and tracking of articulated objects using a view-based representation

    Int. J. Comput. Vision

    (1988)
  • C. Bregler, J. Malik, Tracking people with twist and exponential maps, in: Proceedings of the IEEE Computer Society...
  • Cited by (0)

    About the AuthorEDIZ POLAT received his Ph.D. and M.S. degrees from Pennsylvania State University in 2002 and in 1996 respectively. He was a member of Computer Vision Group in Computer Science and Engineering Department at Penn State where his research focused on computer vision, in particular, detection and visual tracking of objects in video sequences. He is currently an Assistant Professor in Electrical and Electronics Engineering Department at Kirikkale University, Turkey.

    About the AuthorMOHAMMED YEASIN joined the faculty of the Pennsylvania State University (Penn State), University Park in July 2000. Prior to joining Penn State he spent one academic year as an assistant professor (visiting) in the University of West Florida. He has served in the Electro-technical laboratory, Japan as a center of excellence (COE) research fellow. He obtained his Ph.D. degree in Electrical Engineering (computer vision) from Indian Institute of Technology, Bombay (deputed from Bangladesh Institute of Technology (BIT), Chittagong) in May 1998 and also served in the Electrical & Electronic Engineering department, BIT Chittagong as a lecturer from 1990–1998. Dr. Yeasins's chief interest is in the role of computational vision in achieving robust behavior by autonomous systems. The goal is to address issues that arise in integrating computer vision with a system that benefits from real time feedback. Such systems include robots operating in a dynamic and uncertain environment, bio-medical systems and advanced human computer interfaces. Currently, he is researching advanced human computer interaction (HCI) systems. The research results are being used for developing commercial products by Advanced Interface Technologies (AIT) Inc., a private company in partnership with Penn State.

    About the AuthorRAJEEV SHARMA is an Associate Professor of Computer Science and Engineering at Pennsylvania State University. He received a Ph.D. from University of Maryland, College Park in 1993. Prior to joining Penn State, he served as a Beckman Fellow at University of Illinois at Urbana-Champaign for 3 years. His main research focus has been on the use of computer vision to develop novel human–computer interaction and intelligence-gathering techniques. Dr. Sharma's research has been funded through several grants from National Science Foundation and Department of Defense. He is also helping in transferring the results of the university research to commercial applications through a State College, PA based company, Advanced Interfaces, (www.AdvancedInterfaces.com) in partnership with Penn State. Advanced Interfaces is a leader new intelligence and interaction technologies using Computer Vision, focusing on public spaces such as Retail and Museums.

    View full text