short-paper

Detecting motion synchrony by video tubes

Authors:

Carlo TomasiAuthors Info & Claims

MM '11: Proceedings of the 19th ACM international conference on Multimedia

Pages 1197 - 1200

https://doi.org/10.1145/2072298.2071973

Published: 28 November 2011 Publication History

Abstract

Motion synchrony, i.e., the coordinated motion of a group of individuals, is an interesting phenomenon in nature or daily life. Fish swim in schools, birds fly in flocks, soldiers march in platoons, etc. Our goal is to detect motion synchrony that may be present in the video data, and to track the group of moving objects as a whole. This opens the door to novel algorithms and applications. To this end, we model individual motions as video tubes in space-time, define motion synchrony by the geometric relation among video tubes, and track a whole set of tubes by dynamic programming. The resulting algorithm is highly efficient in practice. Given a video clip of T frames of resolution XxY, we show that finding the K spatially correlated video tubes and determining the presence of synchrony can be solved optimally in O(XYTK) time. Preliminary experiments show that our method is both effective and efficient. Typical running times are 30 - 100 VGA-resolution frames per second after feature extraction, and the accuracy for the detection of synchrony is more than 90% as evaluated in our annotated data set.

References

[1]

M. Beal, N. Jojic, and H. Attias. A graphical model for audiovisual object tracking. PAMI, 25(7):828--836, 2003.

Digital Library

[2]

P. Felzenszwalb and D. Huttenlocher. Distance transforms of sampled functions. Technical Report TR2004--1963, Cornell Computing and Information Science, 2004.

[3]

P. Felzenszwalb and D. Huttenlocher. Pictorial structures for object recognition. IJCV, 61(1):55--79, 2005.

Digital Library

[4]

P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In CVPR, 2008.

[5]

J. Fisher and T. Darrell. Speaker association with signal-level audiovisual fusion. In IEEE Transaction on Multimedia, 2004.

Digital Library

[6]

J. Fisher, T. Darrell, W. Freeman, and P. Viola. Learning joint statistical models for audio-visual fusion and segregation. In NIPS, 2001.

[7]

S. Gu and C. Tomasi. Phase diffusion for the synchronization of heterogenous sensor streams. In ICASSP, pages 1841--1844, 2009.

Digital Library

[8]

S. Gu and C. Tomasi. Branch and track. In CVPR, 2011.

Digital Library

[9]

S. Gu, Y. Zheng, and C. Tomasi. Efficient visual object tracking with online nearest neighbor classifier. In ACCV, pages 267--277, 2010.

Digital Library

[10]

S. Gu, Y. Zheng, and C. Tomasi. Linear time offline tracking and lower envelope algorithms. In ICCV, 2011.

Digital Library

[11]

R. Hess and A. Fern. Discriminatively trained particle filters for complex multi-object tracking. In CVPR, pages 240--247, 2009.

[12]

Z. Khan, T. Balch, and F. Dellaert. Mcmc-based particle filtering for tracking a variable number of interacting targets. PAMI, pages 1805--1918, 2005.

Digital Library

[13]

B. Leibe, K. Schindler, and L. V. Gool. Coupled detection and trajectory estimation for multi-object tracking. In ICCV, pages 1--8, 2007.

[14]

K. Li, E. Miller, M. Chen, T. Kanade, L. Weiss, and P. Campbell. Computer vision tracking of stemness. In ISBI, pages 847--850, 2008.

[15]

Y. Li, C. Huang, and R. Nevatia. Learning to associate: Hybridboosted multi-target tracker for crowded scene. In CVPR, pages 2953--2960, 2009.

[16]

D. Lowe. Object recognition from local scale-invariant features. In ICCV, pages 1150--1157, 1999.

Digital Library

[17]

H. Nock, G. Iyengar, and C. Neti. Assessing face and speech consistency for monologue detection in video. In Proc. ACM Multimedia, pages 303--306, 2002.

Digital Library

[18]

A. Perera, C. Srinivas, A. Hoogs, G. Brooksby, and W. Hu. Multi-object tracking through simultaneous long occlusions and split-merge conditions. In CVPR, pages 666--673, 2006.

Digital Library

[19]

M. Sargin, Y. Yemez, E. Erzin, and A. Tekalp. Audiovisual synchronization and fusion using canonical correlation analysis. IEEE Transactions on Multimedia, 9(7):1396--1403, 2007.

Digital Library

Cited By

Chu WTorre FCohn JMessinger D(2017)A Branch-and-Bound Framework for Unsupervised Common Event DiscoveryInternational Journal of Computer Vision10.1007/s11263-017-0989-7123:3(372-391)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1007/s11263-017-0989-7
Chu WZeng JTorre FCohn JMessinger D(2015)Unsupervised Synchrony Discovery in Human InteractionProceedings of the 2015 IEEE International Conference on Computer Vision (ICCV)10.1109/ICCV.2015.360(3146-3154)Online publication date: 7-Dec-2015
https://dl.acm.org/doi/10.1109/ICCV.2015.360
Xu PYe MLi XLiu QYang YDing JHua KRui YSteinmetz RHanjalic ANatsev AZhu W(2014)Dynamic Background Learning through Deep Auto-encoder NetworksProceedings of the 22nd ACM international conference on Multimedia10.1145/2647868.2654914(107-116)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2647868.2654914

Index Terms

Detecting motion synchrony by video tubes
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization
      2. Image and video acquisition
        Motion capture
  2. Computer graphics
    1. Animation
      1. Motion capture
      2. Motion processing

Recommendations

Detecting kinetic occlusion
ICCV '95: Proceedings of the Fifth International Conference on Computer Vision

Visual motion boundaries provide a powerful cue for the perceptual organization of scenes. Motion boundaries are present when surfaces in motion occlude one another. Conventional approaches to motion analysis have relied on assumptions of data ...
Recovering articulated non-rigid shapes, motions and kinematic chains from video
AMDO'06: Proceedings of the 4th international conference on Articulated Motion and Deformable Objects

We propose an approach to analyze and recover articulated motion with non-rigid parts, e.g. the human body motion with non-rigid facial motion, under affine projection from feature trajectories. We model the motion using a set of intersecting subspaces. ...
Compressed Domain Motion Analysis for Video Semantic Events Detection
ICIE '09: Proceedings of the 2009 WASE International Conference on Information Engineering - Volume 01

In this paper, a novel approach is proposed to estimate camera motion and segment moving objects from compressed video streams, aiming to detect semantic events in video clips. Simultaneously using the motion vectors and DC components of MPEG ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '11: Proceedings of the 19th ACM international conference on Multimedia

November 2011

944 pages

ISBN:9781450306164

DOI:10.1145/2072298

General Chairs:
K. Selçuk Candan
Arizona State University, USA
,
Sethuraman Panchanathan
Arizona State University, USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
,
Program Chairs:
Hari Sundaram
Arizona State University, USA
,
Wu-Chi Feng
Portland State University, USA
,
Nicu Sebe
University of Trento, Italy

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 November 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

MM '11

Sponsor:

SIGMM

MM '11: ACM Multimedia Conference

November 28 - December 1, 2011

Arizona, Scottsdale, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
118
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chu WTorre FCohn JMessinger D(2017)A Branch-and-Bound Framework for Unsupervised Common Event DiscoveryInternational Journal of Computer Vision10.1007/s11263-017-0989-7123:3(372-391)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1007/s11263-017-0989-7
Chu WZeng JTorre FCohn JMessinger D(2015)Unsupervised Synchrony Discovery in Human InteractionProceedings of the 2015 IEEE International Conference on Computer Vision (ICCV)10.1109/ICCV.2015.360(3146-3154)Online publication date: 7-Dec-2015
https://dl.acm.org/doi/10.1109/ICCV.2015.360
Xu PYe MLi XLiu QYang YDing JHua KRui YSteinmetz RHanjalic ANatsev AZhu W(2014)Dynamic Background Learning through Deep Auto-encoder NetworksProceedings of the 22nd ACM international conference on Multimedia10.1145/2647868.2654914(107-116)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2647868.2654914

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten