Attribute-based supervised deep learning model for action recognition

Chen, Kai; Ding, Guiguang; Han, Jungong

doi:10.1007/s11704-016-6066-5

Attribute-based supervised deep learning model for action recognition

Research Article
Published: 23 February 2017

Volume 11, pages 219–229, (2017)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Kai Chen¹,
Guiguang Ding¹ &
Jungong Han²

166 Accesses
11 Citations
6 Altmetric
Explore all metrics

Abstract

Deep learning has been the most popular feature learning method used for a variety of computer vision applications in the past 3 years. Not surprisingly, this technique, especially the convolutional neural networks (ConvNets) structure, is exploited to identify the human actions, achieving great success. Most algorithms in existence directly adopt the basic ConvNets structure, which works pretty well in the ideal situation, e.g., under stable lighting conditions. However, its performance degrades significantly when the intra-variation in relation to image appearance occurs within the same category. To solve this problem, we propose a new method, integrating the semantically meaningful attributes into deep learning’s hierarchical structure. Basically, the idea is to add simple yet effective attributes to the category level of ConvNets such that the attribute information is able to drive the learning procedure. The experimental results based on three popular action recognition databases show that the embedding of auxiliary multiple attributes into the deep learning framework improves the classification accuracy significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Relative Attributes

Action Recognition Using Co-trained Deep Convolutional Neural Networks

Assessment of Semi-supervised Approaches Applied to Convolutional Neural Networks

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Lao WL, Han J G. Automatic video-based human motion analyzer for consumer surveillance system. IEEE Transactions on Consumer Electronics, 2009, 55(2): 591–598
Article Google Scholar
Zhang B C, Alessandro P, Li Z G, Vittorio M, Liu J Z, Ji R R. Bounding multiple gaussians uncertainty with application to object tracking. International Journal of Computer Vision, 2016, 1–16
Google Scholar
Chen C, Liu M Y, Zhang B C, Han J G, Jiang J J, Liu H. 3D action recognition using multi-temporal depth motion maps and fisher vector. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016, 3331–3337
Google Scholar
Han J G, Dirk F, De With P H N. Broadcast court-net sports video analysis using fast 3-D camera modeling. IEEE Transactions on Circuits and Systems for Video Technology, 2008, 18(11): 1628–1638
Article Google Scholar
Ding G G, Guo Y C, Zhou J L, Gao Y. Large-scale cross-modality search via collective matrix factorization hashing. IEEE Transactions on Image Processing, 2016, 25(11): 5427–5440
Article MathSciNet Google Scholar
Lin Z J, Ding G G, Han J G, Wang J M. Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Transactions on Cybernetics, 2016
Google Scholar
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 886–893
Google Scholar
Laptev I, Marszalek M, Schmid C, Rozenfeld B. Learning realistic human actions from movies. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1–8
Google Scholar
Dalal N, Triggs B, Schmid C. Human detection using oriented histograms of flow and appearance. In: Proceedings of European Conference on Computer Vision. 2006, 428–441
Google Scholar
Wang H, Schmid C. Action recognition with improved trajectories. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 3551–3558
Google Scholar
Li F F, Pietro P. A bayesian hierarchical model for learning natural scene categories. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 524–531
Google Scholar
Lee H, Battle A, Raina R, Ng A Y. Efficient sparse coding algorithms. In: Proceedings of Advances in Neural Information Processing Systems. 2006, 801–808
Google Scholar
Yang Y, Wang X, Liu Q, Xu ML, Yu L. A bundled-optimization model of multiview dense depth map synthesis for dynamic scene reconstruction. Information Sciences, 2015, 320: 306–319
Article MathSciNet Google Scholar
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems. 2012, 1097–1105
Google Scholar
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li F F. Large-scale video classification with convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 1725–1732
Google Scholar
Price A L, Patterson N J, Plenge R M, Weinblatt M E, Shadick N A, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics, 2006, 38(8): 904–909
Article Google Scholar
Liu A A, Su Y T, Jia P P, Gao Z, Hao T, Yang Z X. Multipe/singleview human action recognition via part-induced multitask structural learning. IEEE Transactions on Cybernetics, 2015, 45(6): 1194–1208
Article Google Scholar
Liu A A, Xu N, Su Y T, Lin H, Hao T, Yang Z X. Single/multi-view human action recognition via regularized multi-task learning. Neurocomputing, 2015, 151: 544–553
Article Google Scholar
Xu N, Liu A A, Nie W Z, Wong Y Y, Li F W, Su Y T. Multi-modal & multi-view & interactive benchmark dataset for human action recognition. In: Proceedings of the 23rd ACM International Conference on Multimedia. 2015, 1195–1198
Chapter Google Scholar
Liu A A, Nie W Z, Su Y T, Ma L, Hao T, Yang Z X. Coupled hidden conditional random fields for RGB-D human action recognition. Signal Processing, 2015, 112: 74–82
Article Google Scholar
Yang Y, Wang X, Guan T, Shen J L, Yu L. A multi-dimensional image quality prediction model for user-generated images in social networks. Information Sciences, 2014, 281: 601–610
Article Google Scholar
Zhu Y M, Li K, Jiang J M. Video super-resolution based on automatic key-frame selection and feature-guided variational optical flow. Signal Processing: Image Communication, 2014, 29(8): 875–886
Google Scholar
Gao Y, Wang M, Tao D C, Ji R R, Dai Q H. 3-D object retrieval and recognition with hypergraph analysis. IEEE Transactions on Image Processing, 2012, 21(9): 4290–4303
Article MathSciNet Google Scholar
Gao Y, Wang M, Ji R R, Wu X D, Dai Q H. 3-D object retrieval with hausdorff distance learning. IEEE Transactions on Industrial Electronics, 2014, 61(4): 2088–2098
Article Google Scholar
Ji R R, Gao Y, Hong R C, Liu Q, Tao D C, Li X L. Spectral-spatial constraint hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 2014, 52(3): 1811–1824
Article Google Scholar
Lu X Q, Zheng X T, Li X L. Latent semantic minimal hashing for image retrieval. IEEE Transactions on Image Processing, 2016, 26(1): 355–368
Article MathSciNet Google Scholar
Lu X Q, Li X L, Mou L C. Semi-supervised multitask learning for scene recognition. IEEE Transactions on Cybernetics, 2015, 45(9): 1967–1976
Article Google Scholar
Zhang D W, Han J W, Han J G, Shao L. Cosaliency detection based on intrasaliency prior transfer and deep intersaliency mining. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(6): 1163–1176
Article MathSciNet Google Scholar
Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In: Proceedings of Advances in Neural Information Processing Systems. 2014, 568–576
Google Scholar
Ryoo M S, Rothrock B, Matthies L. Pooled motion features for firstperson videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 896–904
Google Scholar
Wang L M, Qiao Y, Tang X O. Action recognition with trajectorypooled deep-convolutional descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 4305–4314
Google Scholar
Liu J G, Yu Q, Javed O, Ali S, Tamrakar A, Divakaran A, Cheng H, Sawhney H. Video event recognition using concept attributes. In: Proceedings of IEEE Workshop on Applications of Computer Vision. 2013, 339–346
Google Scholar
Soomro K, Zamir A R, Shah M. Ucf101: a dataset of 101 human actions classes from videos in the wild. 2012, arXiv preprint arXiv:1212.0402
Google Scholar
Deng J, Dong W, Socher R, Li L J, Li K, Li F F. Imagenet: A largescale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255
Google Scholar
Jia Y Q, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia. 2014, 675–678
Google Scholar
Wang H, Kläser A, Schmid C, Liu C L. Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision, 2013, 103(1): 60–79
Article MathSciNet Google Scholar
Ng J Y H, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G. Beyond short snippets: deep networks for video classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 4694–4702
Google Scholar
Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition. 2004, 32–36
Google Scholar
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T. Hmdb: a large video database for human motion recognition. In: Proceedings of IEEE International Conference on Computer Vision. 2011, 2556–2563
Google Scholar
Chang C C, Lin C J. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27
Article Google Scholar
Bilen H, Fernando B, Gavves E, Vedaldi A, Gould S. Dynamic image networks for action recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2016
Google Scholar
Bagheri M, Gao Q G, Escalera S, Clapes A, Nasrollahi K, Holte M, Moeslund T. Keep it accurate and diverse: enhancing action recognition performance by ensemble learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2015, 22–29
Google Scholar
Ho T K. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(8): 832–844
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Tsinghua University, Beijing, 100084, China
Kai Chen & Guiguang Ding
Department of Computer Science, Northumbria University, Newcastle, NE1 8ST, UK
Jungong Han

Authors

Kai Chen
View author publications
Search author on:PubMed Google Scholar
Guiguang Ding
View author publications
Search author on:PubMed Google Scholar
Jungong Han
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Guiguang Ding.

Additional information

Kai Chen received the BS degree from the School of Software, Tsinghua University, China in 2014, where he is currently pursuing the MS degree with the School of Software. His research interests include multimedia information retrieval, computer vision, and machine learning.

Guiguang Ding received the PhD degree in electronic engineering from Xidian University, China. He is currently an associate professor with the School of Software, Tsinghua University, China. His current research focuses on the area of multimedia information retrieval and management, in particular, visual object classification, automatic semantic annotation, content-based multimedia indexing, social multimedia retrieval, mining and recommendation. He has published about 40 research papers in international conferences and journals and applied for eight Patent Rights in China.

Jungong Han is a senior lecturer with the Department of Computer Science at Northumbria University, UK. Previously, he was a senior scientist (2012–2015) with Civolution Technology (a combining synergy of Philips CI and Thomson STS), a research staff (2010–2012) with the Centre forMathematics and Computer Science, and a researcher (2005–2010) with the Technical University of Eindhoven in Netherlands. Dr. Han’s research interests include multimedia content identification, computer vision, and artificial intelligence. He has written and co-authored over 100 papers, in which one first-authored paper has been cited, up to date, for more than 500 times. He is an associate editor of Elsevier Neurocomputing and Springer Multimedia Tools and Applications.

Electronic supplementary material

Supplementary material, approximately 213 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, K., Ding, G. & Han, J. Attribute-based supervised deep learning model for action recognition. Front. Comput. Sci. 11, 219–229 (2017). https://doi.org/10.1007/s11704-016-6066-5

Download citation

Received: 01 February 2016
Accepted: 08 December 2016
Published: 23 February 2017
Issue Date: April 2017
DOI: https://doi.org/10.1007/s11704-016-6066-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attribute-based supervised deep learning model for action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Relative Attributes

Action Recognition Using Co-trained Deep Convolutional Neural Networks

Assessment of Semi-supervised Approaches Applied to Convolutional Neural Networks

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 213 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now