poster

What Can We Learn about Motion Videos from Still Images?

Authors:
Jianguang Zhang

School of Computer Science and Technology, Tianjin University, Tianjin, China

School of Computer Science and Technology, Tianjin University, Tianjin, China
View Profile

,
Yahong Han

School of Computer Science and Technology, Tianjin University&Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, China

School of Computer Science and Technology, Tianjin University&Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, China
View Profile

,
Jinhui Tang

School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China

School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China
View Profile

,
Qinghua Hu

School of Computer Science and Technology, Tianjin University&Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, China

School of Computer Science and Technology, Tianjin University&Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, China
View Profile

,
Jianmin Jiang

School of Computer Science & Software Engineering, Shenzhen, China

School of Computer Science & Software Engineering, Shenzhen, China
View Profile

MM '14: Proceedings of the 22nd ACM international conference on MultimediaNovember 2014Pages 973–976https://doi.org/10.1145/2647868.2654992

Published:03 November 2014Publication History

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

Pages 973–976

ABSTRACT

Human action recognition from motion videos plays an important role in multimedia analysis. Different from the temporal cues of action series in motion videos, the motion tendency can also be revealed from the still images or key frames. Thus, if the action knowledge in related still images can be well adapted to the target motion videos, we would have a great chance to improve the performance of video action recognition. In this paper, we propose a framework of Still-to-Motion Adaptation (SMA) for human action recognition. Common visual features are extracted both from the related images and target videos' key frames, by which the gap between still images and videos are bridged. Meanwhile, to utilize the unlabeled training videos in target domain, we incorporate a semi-supervised process into our framework. By minimizing the difference of action prediction from still features and motion features, we formulate the still-to-motion adaptation into a joint optimization process. Experiments successfully demonstrate the effectiveness of the proposed framework and show the better performance of action recognition compared with the state-of-the-art methods. We also analyze the impact on the recognition results of target videos by knowledge adaptation from still images.

References

V. Delaitre, I. Laptev, and J. Sivic. Recognizing human actions in still images: a study of bag-of-features and part-based representations. In BMVC (2010).Google Scholar
L. Duan, D. Xu, and I. Tsang. Learning with augmented features for heterogeneous domain adaptation. In ICML (2012).Google Scholar
A. Gupta, A. Kembhavi, and L. S. Davis. Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE Transactions on PAMI, 2009. Google ScholarDigital Library
Y. Han, Y. Yang, Z. Ma, H. Shen, N. Sebe, and X. Zhou. Image attribute adaptation. IEEE Transactions on Multimedia, 2014. Google ScholarDigital Library
Z. Ma, Y. Yang, N. Sebe, and A. G. Hauptmann. Knowledge adaptation with partially shared features for event detection using few exemplars. IEEE Transactions on PAMI, 2014.Google Scholar
K. Soomro, A. R. Zamir, and M. Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild. CRCV-TR-12-01, November 2012.Google Scholar
H. Wang, M. M. Ullah, A. Klaser, I. Laptev, C. Schmid, et al. Evaluation of local spatio-temporal features for action recognition. In BMVC(2009).Google Scholar
F. Wu, X. Lu, Z. Zhang, S. Yan, Y. Rui, and Y. Zhuang. Cross-media semantic representation via bi-directional learning to rank. In ACM MM (2013). Google ScholarDigital Library
Y. Yang, Z. Ma, A. G. Hauptmann, and N. Sebe. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia, 2013.Google Scholar
Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Transfer tagging from image to video. In ACM MM (2011). Google ScholarDigital Library
B. Yao and L. Fei-Fei. Grouplet: A structured image representation for recognizing human and object interactions. In CVPR (2010).Google Scholar
B. Yao, X. Jiang, A. Khosla, A. L. Lin, L. Guibas, and L. Fei-Fei. Human action recognition by learning bases of action attributes and parts. In ICCV (2011). Google ScholarDigital Library

Index Terms

What Can We Learn about Motion Videos from Still Images?
1. Computing methodologies

Recommendations

Do less and achieve more

We collect three large web action image datasets.We verify that web action images are complementary to training videos by extensive experiments.We show both filtered and unfiltered web action images are complementary to training videos.We show ...
Read More
Local velocity-adapted motion events for spatio-temporal recognition

In this paper, we address the problem of motion recognition using event-based local motion representations. We assume that similar patterns of motion contain similar events with consistent motion across image sequences. Using this assumption, we ...
Read More
Human action recognition in videos using motion impression image
ICIMCS '09: Proceedings of the First International Conference on Internet Multimedia Computing and Service

Human action recognition in surveillance has become a hot topic in computer vision. In this paper, we develope a new method to recognize human action using motion information in video. Video sequence is compressed along time axis into a Motion ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '14: Proceedings of the 22nd ACM international conference on Multimedia
November 2014
1310 pages
ISBN:9781450330633
DOI:10.1145/2647868
General Chairs:
Kien A. Hua
University of Central Florida, USA
,
Yong Rui
Microsoft Research, China
,
Ralf Steinmetz
Technische Universitt Darmstadt, Germany
,
Program Chairs:
Alan Hanjalic
Delft University of Technology, Netherlands
,
Apostol (Paul) Natsev
Google, USA
,
Wenwu Zhu
Tsinghua University, China
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
action recognition
domain adaptation
images
videos
Qualifiers
- poster
Conference

Acceptance Rates
MM '14 Paper Acceptance Rate55of286submissions,19%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 219
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

What Can We Learn about Motion Videos from Still Images?

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Do less and achieve more

Local velocity-adapted motion events for spatio-temporal recognition

Human action recognition in videos using motion impression image

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

What Can We Learn about Motion Videos from Still Images?

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Do less and achieve more

Local velocity-adapted motion events for spatio-temporal recognition

Human action recognition in videos using motion impression image

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media