research-article

Exploring dense trajectory feature and encoding methods for human interaction recognition

Authors:
Xiaojiang Peng

Southwest Jiaotong University, Chengdu, China and Shenzhen Key Lab of CVPR, Shenzhen Institute of Advanced Technology, CAS

Southwest Jiaotong University, Chengdu, China and Shenzhen Key Lab of CVPR, Shenzhen Institute of Advanced Technology, CAS
View Profile

,
Xiao Wu

Southwest Jiaotong University, Chengdu, China

Southwest Jiaotong University, Chengdu, China
View Profile

,
Qiang Peng

Southwest Jiaotong University, Chengdu, China

Southwest Jiaotong University, Chengdu, China
View Profile

,
Xianbiao Qi

Beijing University of Posts and Telecommunications, Beijing, China

Beijing University of Posts and Telecommunications, Beijing, China
View Profile

,
Yu Qiao

Shenzhen Key Lab of CVPR, Shenzhen Institute of Advanced Technology, CAS

Shenzhen Key Lab of CVPR, Shenzhen Institute of Advanced Technology, CAS
View Profile

,
Yanhua Liu

Samsung Guangzhou Mobile, R&D Center, Guangzhou, China

Samsung Guangzhou Mobile, R&D Center, Guangzhou, China
View Profile

ICIMCS '13: Proceedings of the Fifth International Conference on Internet Multimedia Computing and ServiceAugust 2013Pages 23–27https://doi.org/10.1145/2499788.2499795

Published:17 August 2013Publication History

ICIMCS '13: Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service

Pages 23–27

ABSTRACT

Recently, human activity recognition has obtained increasing attention due to its wide range of potential applications. Much progress has been made to improve the performance on single actions in videos while few on collective and interactive activities. Human interaction is a more challenging task owing to multi-actors in an execution. In this paper, we utilize multi-scale dense trajectories and explore four advanced feature encoding methods on the human interaction dataset with a bag-of-features framework. Particularly, dense trajectories are described by shape, histogram of gradient orientation, histogram of flow orientation and motion boundary histogram, and all these are computed by integral images. Experimental results on the UT-Interaction dataset show that our approach outperforms state-of-the-art methods by 7-14%. Additionally, we thoroughly analyse a finding that the performance of vector quantization is on par with or even better than other sophisticated feature encoding methods by using dense trajectories in videos.

References

M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In ICCV, volume 2, pages 1395--1402, 2005. Google ScholarDigital Library
Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. Learning mid-level features for recognition. In CVPR, pages 2559--2566, 2010.Google ScholarCross Ref
W. Brendel and S. Todorovic. Learning spatiotemporal graphs of human activities. In ICCV, pages 778--785, 2011. Google ScholarDigital Library
N. Dalal, B. Triggs, and C. Schmid. Human detection using oriented histograms of flow and appearance. ECCV, pages 428--441, 2006. Google ScholarDigital Library
P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pages 65--72, 2005.Google ScholarCross Ref
A. Klaser, M. Marszalek, C. Schmid, et al. A spatio-temporal descriptor based on 3d-gradients. In BMVC, 2008.Google ScholarCross Ref
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. Hmdb: A large video database for human motion recognition. In ICCV, pages 2556--2563, 2011. Google ScholarDigital Library
I. Laptev. On space-time interest points. IJCV, 64(2):107--123, 2005. Google ScholarDigital Library
I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, pages 1--8, 2008.Google ScholarCross Ref
H. Lee, A. Battle, R. Raina, and A. Ng. Efficient sparse coding algorithms. In NIPS, volume 19, pages 801--808, 2007.Google Scholar
L. Liu, L. Wang, and X. Liu. In defense of soft-assignment coding. In ICCV, pages 2486--2493, 2011. Google ScholarDigital Library
M. Marszalek, I. Laptev, and C. Schmid. Actions in context. In CVPR, pages 2929--2936, 2009.Google ScholarCross Ref
M. Ryoo. Human activity prediction: Early recognition of ongoing activities from streaming videos. In ICCV, pages 1036--1043, 2011. Google ScholarDigital Library
M. Ryoo, C.-C. Chen, J. Aggarwal, and A. Roy-Chowdhury. An overview of contest on semantic description of human activities (sdha) 2010. Recognizing Patterns in Signals, Speech, Images and Videos, pages 270--285, 2010. Google ScholarDigital Library
C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In ICPR, volume 3, pages 32--36, 2004. Google ScholarDigital Library
P. Scovanner, S. Ali, and M. Shah. A 3-dimensional sift descriptor and its application to action recognition. In MM, pages 357--360. ACM, 2007. Google ScholarDigital Library
D. Waltisberg, A. Yao, J. Gall, and L. Van Gool. Variations of a hough-voting action recognition system. Recognizing Patterns in Signals, Speech, Images and Videos, pages 306--312, 2010. Google ScholarDigital Library
H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, pages 3169--3176, 2011. Google ScholarDigital Library
H. Wang, A. Kläser, C. Schmid, and C.-L. Liu. Dense trajectories and motion boundary descriptors for action recognition. IJCV, Mar. 2013.Google ScholarCross Ref
J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In CVPR, pages 3360--3367, 2010.Google ScholarCross Ref
G. Willems, T. Tuytelaars, and L. Van Gool. An efficient dense and scale-invariant spatio-temporal interest point detector. ECCV, pages 650--663, 2008. Google ScholarDigital Library
J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR, pages 1794--1801, 2009.Google Scholar
L. Yeffet and L. Wolf. Local trinary patterns for human action recognition. In ICCV, pages 492--497, 2009.Google ScholarCross Ref

Index Terms

Exploring dense trajectory feature and encoding methods for human interaction recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

Weighted feature trajectories and concatenated bag-of-features for action recognition

Key-point trajectory based approaches to recognizing human actions in realistic videos have recently shown promising results. However, their coverage of the entire actor is not sufficient for describing human actions, and the trajectories often ...
Read More
Human action recognition based on spatio-temporal three-dimensional scattering transform descriptor and an improved VLAD feature encoding algorithm
Abstract
The local spatio-temporal descriptor and feature encoding algorithm are two crucial key steps for human action recognition based on spatio-temporal interest points (STIP). Since the local descriptors for STIP are essentially a type of ...
Read More
Random interest regions for object recognition based on texture descriptors and bag of features

In this work we propose a novel method for object recognition based on a random selection of interest regions, texture features (local binary/ternary patterns and local phase quantization) for describing each region, a bag-of-features approach for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICIMCS '13: Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
August 2013
419 pages
ISBN:9781450322522
DOI:10.1145/2499788
Conference Chair:
Tat-Seng Chua
National University of Singapore, Singapore
,
General Chairs:
Ke Lu
University of Chinese Academy of Sciences, China
,
Tao Mei
Microsoft Research Asia, China
,
Xindong Wu
Hefei University of Technology, China
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bag-of-features
dense trajectory
feature encoding
human activity recognition
Qualifiers
- research-article
Conference

Acceptance Rates
ICIMCS '13 Paper Acceptance Rate20of94submissions,21%Overall Acceptance Rate163of456submissions,36%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 164
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploring dense trajectory feature and encoding methods for human interaction recognition

ICIMCS '13: Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service

ABSTRACT

References

Cited By

Index Terms

Recommendations

Weighted feature trajectories and concatenated bag-of-features for action recognition

Human action recognition based on spatio-temporal three-dimensional scattering transform descriptor and an improved VLAD feature encoding algorithm

Random interest regions for object recognition based on texture descriptors and bag of features