Abstract
This paper describes a probabilistic framework for simultaneously performing object tracking and event detection in monocular videos. Mathematically, we cast the problem of jointly tracking and detecting semantic events as a principled model-based search problem in a multi-dimensional state space, where the tracking trajectory and event type are discovered via maximum a posteriori (MAP) optimization. The benefit of this approach comes from its combined utilization of particle probabilistic representation, multiple hypothesis retention, efficient particle propagation, and temporal optimization. We present qualitative and quantitative results from realistic video sequences to demonstrate the effectiveness of this approach.
Similar content being viewed by others
References
Naphades MR, Wang RR, Huang TS (2001) Audio-visual query and retrieval: a system that uses dynamic programming and relevance feedback. J Electron Imaging 10:861–870
Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans Inform Theory IT-13:260–269
Forney G (1973) The Viterbi algorithm. Proc IEEE 61:268–278
Wolf JK, Viterbi A, Dixon G (1989) Finding the best set of k paths through a trellis with application to multitarget tracking. IEEE Trans Aero Electron Sys 25:287–295
Isard M, Blake A (1996) Visual tracking by stochastic propagation of conditional density. In: Proceedings of the 4th European conference on computer vision (Eccv’96), Cambridge, UK, April 1996, pp 343–356
Li B, Chellappa R (2000) Simultaneous tracking and verification via sequential posterior estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2000), Hilton Head Island, South Carolina, June 2000
Doucet A, Godsill S, Andrieu C (2000) On sequential Monte Carlo sampling methods for Bayesian filtering. Stat Comput 10:197–208
Godsill S, Doucet A, West M (2001) Maximum a posteriori sequence estimation using Monte Carlo particle filters. Ann Inst Stat Math 52(1):82–96
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39:1–38
Cham T-J, Rehg JM (1999) A multiple hypothesis approach to figure tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’99), Fort Collins, Colorado, June 1999, pp 239–244
Morris DD, Rehg J (1998) Singularity analysis for articulated object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’98), Santa Barbara, California, June 1998, pp 289–296
Birchfield S (1998) Elliptical head tracking using intensity gradients and color histograms. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’98), Santa Barbara, California, June 1998, pp 232–237
Comaniciu D, Ramesh V, Meer P (2000) Real-time tracking of nonrigid objects using mean shift. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2000), Hilton Head Island, South Carolina, June 2000, vol 2, pp 142–149
McKenna S, Raja Y, Gong S (1998) Object tracking using adaptive colour mixture models. In: Proceedings of the Asian conference on computer vision (ACCV’98), Hong Kong, China, January 1998
Hager G, Belhumeur P (1996) Real-time tracking of image regions with changes in geometry and illumination. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’96), San Francisco, California, June 1996, pp 88–93
Gavrila DM (2000) Pedestrian detection from a moving vehicle. In: Proceedings of the 6th IEEE European conference on computer vision (Eccv 2000), Dublin, Ireland, June/July 2000
Ju SX, Black MJ, Yacoob Y (1996) Cardboard people: a parameterized model of articulated motion. In: Proceedings of the 2nd international conference on automatic face and gesture recognition (FG’96), Killington, Vermont, October 1996, pp 38–44
Wren C, Azarbayejani A, Darrell T, Pentland A (1997) Pfinder: realtime tracking of the human body. IEEE Trans Pattern Recogn Mach Intell 19(7):780–785
Blake A, Isard M (1998) Active contours. Springer, Berlin Heidelberg New York
Toyama K, Blake A (2001) Probabilistic tracking in a metric space. In: Proceedings of the 8th IEEE international conference on computer vision (ICCV 2001), Vancouver, Canada, July 2001, vol 2, pp 50–57
Bobick A, Davis J (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Recogn Mach Intell 23(3):257–267
Polana R, Nelson R (1997) Detection and recognition of periodic, nonrigid motion. Int J Comput Vis 23(3):261–282
Chomat O, Crowley JL (1999) Probabilistic recognition of activity using local appearance. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’99), Fort Collins, Colorado, June 1999, vol 2, pp 104–109
Niyogi SA, Adelson EH (1994) Analyzing and recognizing walking figures in XYT. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’94), Seattle, Washington, June 1994, pp 469–474
Gavrila D, Davis L (1996) 3D model-based tracking of humans in action: a multi-view approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’96), San Francisco, California, June 1996, pp 73–80
Clarkson B, Pentland A (2000) Framing through peripheral perception. In: Proceedings of the IEEE international conference on image processing (ICIP 2000), Vancouver, Canada, September 2000, vol 3, pp 38–41
Bregler C (1997) Learning and recognizing human dynamics in video sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’97), San Juan, Puerto Rico, June 1997, pp 568–574
Zelnik-Manor L, Irani M (2001) Event-based analysis of video. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2001), Kauai, Hawaii, December 2001, vol 2, pp 123–130
Medioni G, Cohen I, Bremond F, Hongeng S, Nevatia R (2001) Event detection and analysis from video streams. IEEE Trans Pattern Recogn Mach Intell 23(8):873–889
Isard M, Blake A (1998) CONDENSATION—conditional density propagation for visual tracking. Int J Comput Vis 29(1):5–28
Bar-Shalom Y, Fortmann TE (1988) Tracking and data association. Academic Press, San Diego, California
Arulampalam MS, Maskell S, Gordon N, Clapp T (2002) A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans Signal Process 50(2):174–188
Rabiner LR, Juang BH (1993) Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs, New Jersey
Jesorsky O, Kirchberg K, Frischholz R (2001) Robust face detection using the Hausdorff distance. In: Proceedings of the 3rd international conference on audio- and video-based biometric person authentication (AVBPA 2001), Halmstad. Sweden, June 2001, pp 90–95
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 2:241–254
D’andrade R (1978) U-statistic hierarchical clustering. Psychometrika 4:58–67
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, R.R., Huang, T. A framework of joint object tracking and event detection. Pattern Anal Applic 7, 343–355 (2004). https://doi.org/10.1007/s10044-004-0231-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-004-0231-4