skip to main content
10.1145/3271553.3271563acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvispConference Proceedingsconference-collections
research-article

Action Recognition by Jointly Using Video Proposal and Trajectory

Published: 27 August 2018 Publication History

Abstract

As a popular research field in computer vision community, human action recognition in videos is a challenging task. In recent years, trajectory based methods have been proven effective for action recognition. However, because trajectory is generated around motion region, trajectory based methods often only pay attention to regions with high motion salience in video and ignore motionless but semantic objects. To compensate the shortage of trajectory based methods, video proposal is utilized for its ability to discover semantic object in this paper. In the proposed method, video proposal and trajectory are extracted simultaneously to capture motion information and object information. The proposed method can be divided into three steps: 1) trajectories and video proposals are extracted from video to capture motion information and object information respectively; 2) a trained Convolution Neural Network (CNN) model is employed to describe the extracted trajectories and video proposals; 3) the holistic representation of video is constructed by Fisher Vector model and then input to classifier to get the action label. The complementarity between trajectory and video proposal enables the discrimination power of the proposed method for kinds of actions. The proposed method is evaluated on UCF101 and HMDB51, on which the promising results prove the effectiveness of the proposed method.

References

[1]
P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. IEEE. T. Pattern. Anal., 33(5):898--916, August 2011.
[2]
X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, and R. Urtasun. 3d object proposals for accurate object class detection. In Proceedings of Advances in Neural Information Processing Systems, pages 424--432, Montreal, December 2015.
[3]
M. Cheng, Z. Zhang, W. Lin, and P. Torr. Bing: Binarized normed gradients for objectness estimation at 300fps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3286--3293, Columbus, May 2014.
[4]
N. Dalal, B. Triggs, and C. Schmid. Human detection using oriented histograms of flow and appearance. In Proceedings of European Conference on Computer Vision, volume 3952, pages 428--441, Graz, May 2006.
[5]
R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. Liblinear: A library for large linear classification. J. Mach. Learn. Res., 9:1871--1874, January 2008.
[6]
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 580--587, Columbus, June 2014.
[7]
S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. IEEE. T. Pattern. Anal., 35(1):221--231, March 2013.
[8]
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. Hmdb: a large video database for human motion recognition. In Proceedings of the IEEE International Conference on Computer Vision, pages 2556--2563, Barcelona, November 2011.
[9]
S. Narayan and K. R. Ramakrishnan. A cause and effect analysis of motion trajectories for modeling actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2633--2640, Columbus, June 2014.
[10]
E. Nowak, F. Jurie, and B. Triggs. Sampling strategies for bag-of-features image classification. In Proceedings of European Conference on Computer Vision, pages 490--503, Graz, May 2006.
[11]
K. Ohnishi, M. Hidaka, and T. Harada. Improved dense trajectory with cross streams. In Proceedings of the 2016 ACM on Multimedia Conference, pages 257--261, Amsterdam, November 2016.
[12]
D. Oneata, J. Revaud, J. Verbeek, and C. Schmid. Spatio-temporal object detection proposals. In Proceedings of European Conference on Computer Vision, pages 737--752, Zurich, September 2014.
[13]
S. Sadanand and J. J. Corso. Action bank: A high-level representation of activity in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1234--1241, Providence, June 2012.
[14]
J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek. Image classification with the fisher vector: theory and practice. Int. J. Comput. Vision., 105(3):222--245, June 2013.
[15]
P. Scovanner, S. Ali, and M. Shah. A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th ACM International Conference on Multimedia, pages 357--360, Augsburg, September 2007.
[16]
J. J. Seo, H. I. Kim, W. De Neve, and Y. M. Ro. Effective and efficient human action recognition using dynamic frame skipping and trajectory rejection. Image. Vision. Comput., 58:76--85, February 2017.
[17]
Z. Shu, K. Yun, and D. Samaras. Action detection with improved dense trajectories and sliding window. In Workshop at the European Conference on Computer Vision, pages 541--551, Zurich, September 2014.
[18]
K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In Proceedings of Advances in Neural Information Processing Systems, pages 568--576, Montreal, December 2014.
[19]
K. Soomro, A. R. Zamir, and M. Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv: 1212.0402, 2012.
[20]
Y. Tian, R. Sukthankar, and M. Shah. Spatiotemporal deformable part models for action detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2642--2649, Sydney, December 2013.
[21]
H. Wang, A. Kläser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3169--3176, Colorado Springs, June 2011.
[22]
H. Wang, A. Kläser, C. Schmid, and C.-L. Liu. Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vision., 103(1):60--79, March 2013.
[23]
H. Wang and C. Schmid. Action recognition with improved trajectories. In Proceedings of the IEEE International Conference on Computer Vision, pages 3551--3558, Sydney, December 2013.
[24]
L. Wang, Y. Qiao, and X. Tang. Action recognition with trajectory-pooled deep-convolutional descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4305--4314, Boston, June 2015.
[25]
P. Wang, Z. Li, Y. Hou, and W. Li. Action recognition based on joint trajectory maps using convolutional neural networks. In Proceedings of the 2016 ACM on Multimedia Conference, pages 102--106, Amsterdam, November 2016.
[26]
X. Wang, C. Qi, and F. Lin. Combined trajectories for action recognition based on saliency detection and motion boundary. Signal. Process-Image., 57:91--102, May 2017.
[27]
Y. Wang, V. Tran, and M. Hoai. Evolution-preserving dense trajectory descriptors. arXiv preprint arXiv: 1702.04037, 2017.
[28]
G. Yu and J. Yuan. Fast action proposals for human action detection and search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1302--1311, Boston, June 2015.
[29]
G. Yu, J. Yuan, and Z. Liu. Unsupervised random forest indexing for fast action search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 865--872, Portland, December 2011.

Cited By

View all
  • (2020)Posture Recognition Technology Based on KinectIEICE Transactions on Information and Systems10.1587/transinf.2019EDP7221E103.D:3(621-630)Online publication date: 1-Mar-2020
  • (2020)Spatial attention based visual semantic learning for action recognition in still imagesNeurocomputing10.1016/j.neucom.2020.07.016413(383-396)Online publication date: Nov-2020
  • (2019)Unsupervised Learning of Human Action Categories in Still Images with Deep RepresentationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/336216115:4(1-20)Online publication date: 16-Dec-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICVISP 2018: Proceedings of the 2nd International Conference on Vision, Image and Signal Processing
August 2018
402 pages
ISBN:9781450365291
DOI:10.1145/3271553
© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Action recognition
  2. CNN
  3. Trajectory
  4. Video proposal

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICVISP 2018

Acceptance Rates

Overall Acceptance Rate 186 of 424 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Posture Recognition Technology Based on KinectIEICE Transactions on Information and Systems10.1587/transinf.2019EDP7221E103.D:3(621-630)Online publication date: 1-Mar-2020
  • (2020)Spatial attention based visual semantic learning for action recognition in still imagesNeurocomputing10.1016/j.neucom.2020.07.016413(383-396)Online publication date: Nov-2020
  • (2019)Unsupervised Learning of Human Action Categories in Still Images with Deep RepresentationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/336216115:4(1-20)Online publication date: 16-Dec-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media