Abstract
We present a novel off-line algorithm for target segmentation and tracking in video. In our approach, video data is represented by a multi-label Markov Random Field model, and segmentation is accomplished by finding the minimum energy label assignment. We propose a novel energy formulation which incorporates both segmentation and motion estimation in a single framework. Our energy functions enforce motion coherence both within and across frames. We utilize state-of-the-art methods to efficiently optimize over a large number of discrete labels. In addition, we introduce a new ground-truth dataset, called Georgia Tech Segmentation and Tracking Dataset (GT-SegTrack), for the evaluation of segmentation accuracy in video tracking. We compare our method with several recent on-line tracking algorithms and provide quantitative and qualitative performance comparisons.
Similar content being viewed by others
References
Bai, X., Wang, J., Simons, D., & Sapiro, G. (2009). Video snapcut: Robust video object cutout using localized classifiers. In Proceedings of SIGGRAPH.
Balch, T., Dellaert, F., Feldman, A., Guillory, A., Isbell, C. L. Jr., Khan, Z., Pratt, S. C., Stein, A. N., & Wilde, H. (2006). How multirobot systems research will accelerate our understanding of social animal behavior. Proceedings of the IEEE, 94(7), 1445–1463. Invited paper.
Bibby, C., & Reid, I. (2008). Robust real-time visual tracking using pixel-wise posteriors. In Proceedings of ECCV.
Bluff, L., & Rutz, C. (2008). A quick guide to video-tracking birds. Biology Letters, 4, 319–322.
Bouguet, J. Y. (2002). Pyramidal implementation of the Lucas Kanade feature tracker: Description of the algorithm (Technical Report). Microprocessor Research Labs, Intel Corporation.
Boykov, Y., & Funka-Lea, G. (2006). Graph cuts and efficient n-d image segmentation. International Journal of Computer Vision, 70(2), 109–131.
Boykov, Y., & Jolly, M. P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In Proceedings of ICCV.
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.
Branson, K., Robie, A., Bender, J., Perona, P., & Dickinson, M. (2009). High-throughput ethomics in large groups of Drosophila. Nature Methods, 6, 451–457.
Brostow, G., Essa, I., Steedly, D., & Kwatra, V. (2004). Novel skeletal representation for articulated creatures. In Proceedings of ICCV.
Caselles, V., Kimmel, R., & Sapiro, G. (1997). Geodesic active contours. International Journal of Computer Vision, 22(1), 61–79.
Cham, T. J., & Rehg, J. M. (1999). A multiple hypothesis approach to figure tracking. In Proceedings of CVPR.
Chang, M. M., Tekalp, A. M., & Sezan, M. I. (1997). Simultaneous motion estimation and segmentation. IEEE Transactions on Image Processing, 6(9), 1326–1333.
Chellappa, R., Ferryman, J., & Tan, T. (Eds.) (2005). 2nd joint IEEE intl. workshop on visual surveillance and performance evaluation of tracking and surveillance (VS-PETS 05), Beijing, China. Held in conjunction with ICCV 2005.
Chockalingam, P., Pradeep, N., & Birchfield, S. (2009). Adaptive fragments-based tracking of non-rigid objects using level sets. In International conference on computer vision (ICCV).
Dankert, H., Wang, L., Hoopfer, E. D., Anderson, D. J., & Perona, P. (2009). Automated monitoring and analysis of social behavior in drosophila. Nature Methods, 6, 297–303.
Delcourt, J., Becco, C., Vandewalle, N., & Poncin, P. (2009). A video multitracking system for quantification of individual behavior in a large fish shoal: advantages and limits. Behavior Research Methods, 41(1), 228–235. http://hdl.handle.net/2268/6100.
Donoser, M., & Bischof, H. (2008). Fast non-rigid object boundary tracking. In Proceedings of British machine vision conference (BMVC) (pp. 1–10).
Felzenschwalb, P. (2005). Representation and detection of deformable shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2), 208–220.
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.
Glocker, B., Paragios, N., Komodakis, N., Tziritas, G., & Navab, N. (2007). Inter and intra-modal deformable registration: continuous deformations meet efficient optimal linear programming. In Proceedings of IPMI.
Glocker, B., Paragios, N., Komodakis, N., Tziritas, G., & Navab, N. (2008). Optical flow estimation with uncertainties through dynamic MRFs. In Proceedings of CVPR.
Grundmann, M., Kwatra, V., Han, M., & Essa, I. (2010). Efficient hierarchical graph-based video segmentation. In Proceedings of CVPR.
Kao, E. K., Daggett, M. P., & Hurley, M. B. (2009). An information theoretic approach for tracker performance evaluation. In Proceedings of ICCV.
Khan, Z., Balch, T., & Dellaert, F. (2005). MCMC-based particle filtering for tracking a variable number of interacting targets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1805–1819.
Kohli, P., & Torr, P. (2005). Efficiently solving dynamic Markov random fields using graph cuts. In Proceedings of ICCV (pp. 922–929).
Komodakis, N., Paragios, N., & Tziritas, G. (2007). MRF optimization via dual decomposition: Message-passing revisited. In International conference on computer vision (ICCV).
Komodakis, N., & Tziritas, G. (2005). A new framework for approximate labeling via graph cuts. In Proceedings of ICCV.
Komodakis, N., & Tziritas, G. (2007). Approximate labeling via graph-cuts based on linear programming. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 1436–1453.
Lempitsky, V., & Boykov, Y. (2007). Global optimization for shape fitting. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Li, Y., Sun, J., & Shum, H. Y. (2005). Video object cut and paste. ACM Transactions on Graphics, 24(3), 595–600.
Martin, J. (2004). A portrait of locomotor behaviour in Drosophila determined by a video-tracking paradigm. Behavioural Processes, 67, 207–219.
Price, B. L., Morse, B. S., & Cohen, S. (2009). Livecut: Learning-based interactive video segmentation by evaluation of multiple propagated cues. In Proceedings of ICCV.
Ramanan, D., & Forsyth, D. (2003). Using temporal coherence to build models of animals. In International conference on computer vision (ICCV).
Ren, X., & Malik, J. (2007). Tracking as repeated figure/ground segmentation. In IEEE conference on computer vision and pattern recognition (CVPR).
Rodriguez, M. D., Ahmed, J., & Shah, M. (2008). Action mach: A spatio-temporal maximum average correlation height filter for action recognition. In IEEE conference on computer vision and pattern recognition (CVPR).
Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3), 309–314.
Schoenemann, T., & Cremers, D. (2010). A combinatorial solution for model-based image segmentation and real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7), 1153–1164.
Shi, J., & Tomasi, C. (1994). Good features to track. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 593–600).
Sigal, L., Balan, A., & Black, M. J. (2009). Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision, 87, 4–27.
Sminchisescu, C., & Triggs, B. (2003). Kinematic jump processes for monocular 3d human tracking. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 69–76).
Tsai, D., Flagg, M., & Rehg, J. M. (2010). Motion coherent tracking with multi-label MRF optimization. In British machine vision conference (BMVC). Recipient of the Best Student Paper Prize.
Tsibidis, G., & Tavernarakis, N. (2007). Nemo: A computational tool for analyzing nematode locomotion. BMC Neuroscience, 8(1), 86. doi:10.1186/1471-2202-8-86. http://www.biomedcentral.com/1471-2202/8/86.
Vaswani, N., Tannenbaum, A., & Yezzi, A. (2007). Tracking deforming objects using particle filtering for geometric active contours. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(8), 1470–1475.
Wang, J., Bhat, P., Colburn, R. A., Agrawala, M., & Cohen, M. F. (2005). Interactive video cutout. In SIGGRAPH ’05 ACM SIGGRAPH 2005 papers (pp. 585–594). New York: ACM. doi:10.1145/1186822.1073233.
Wang, P., & Rehg, J. M. (2006). A modular approach to the analysis and evaluation of particle filters for figure tracking. In IEEE conference on computer vision and pattern recognition (CVPR), New York, NY (Vol. 1, pp. 790–797).
Xiao, J., & Shah, M. (2005). Motion layer extraction in the presence of occlusion using graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1644–1659.
Zhaozheng, Y., & Collins, R. (2009). Shape constrained figure-ground segmentation and tracking. In Proceedings of CVPR.
Zitnick, C. L., Jojic, N., & Kang, S. B. (2005). Consistent segmentation for optical flow estimation. In Proceedings of ICCV.
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
(AVI 10.8 MB)
Rights and permissions
About this article
Cite this article
Tsai, D., Flagg, M., Nakazawa, A. et al. Motion Coherent Tracking Using Multi-label MRF Optimization. Int J Comput Vis 100, 190–202 (2012). https://doi.org/10.1007/s11263-011-0512-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-011-0512-5