research-article

Cooperative Hierarchical Framework for Group Activity Recognition: From Group Detection to Multi-activity Recognition

Authors:
Mohammed Al-Habib

School of Information Science and Engineering, Central South University, Changsha, China

School of Information Science and Engineering, Central South University, Changsha, China
View Profile

,
Dongjun Huang

School of Information Science and Engineering, Central South University, Changsha, China

School of Information Science and Engineering, Central South University, Changsha, China
View Profile

,
Majjed Al-Qatf

School of Information Science and Engineering, Central South University, Changsha, China

School of Information Science and Engineering, Central South University, Changsha, China
View Profile

,
Kamal Al-Sabahi

School of Information Science and Engineering, Central South University, Changsha, China

School of Information Science and Engineering, Central South University, Changsha, China
View Profile

ICSCA '19: Proceedings of the 2019 8th International Conference on Software and Computer ApplicationsFebruary 2019Pages 291–298https://doi.org/10.1145/3316615.3316722

Published:19 February 2019Publication History

ICSCA '19: Proceedings of the 2019 8th International Conference on Software and Computer Applications

Pages 291–298

ABSTRACT

Deep neural network algorithms have shown promising performance for many tasks in computer vision field. Several neural network-based methods have been proposed to recognize group activities from video sequences. However, there are still several challenges that are related to multiple groups with different activities within a scene. The strong correlation that exists among individual motion, groups and activities can be utilized to detect groups and recognize their concurrent activities. Motivated by these observations, we propose a unified deep learning framework for detecting multiple groups and recognizing their corresponding collective activity based on Long Short-Term Memory (LSTM) network. In this framework, we use a pre-trained convolutional neural network (CNN) to extract features from the frames and appearances of persons. An objective function has been proposed to learn the amount of pairwise interaction between persons. The obtained individual features are passed to a clustering algorithm to detect groups in the scene. Then, an LSTM based model is used to recognize group activities. Together with this, a scene level CNN followed by LSTM is used to extract and learn scene level feature. Finally, the activities from the group level and the scene context level are integrated to infer the collective activity. The proposed method is evaluated on the benchmark collective activity dataset and compared with several baselines. The experimental results show its competitive performance for the collective activity recognition task.

References

R. Poppe, "A survey on vision-based human action recognition," Image and vision computing, vol. 28, no. 6, pp. 976--990, 2010. Google ScholarDigital Library
H. Wang and C. Schmid, "Action Recognition with Improved Trajectories," in ICCV, 2013, pp. 3551--3558. Google ScholarDigital Library
P. Turaga, R. Chellappa, V. S. Subrahmanian, and O. Udrea, "Machine Recognition of Human Activities: A Survey," IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 11, pp. 1473--1488, 2008. Google ScholarDigital Library
Z. Lan, M. Lin, X. Li, A. G. Hauptmann, and B. Raj, "Beyond gaussian pyramid: Multi-skip feature stacking for action recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 204--212.Google Scholar
M. S. Ryoo and J. K. Aggarwal, "Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities," in Computer vision, 2009 ieee 12th international conference on, 2009, pp. 1593--1600: IEEE.Google Scholar
M. S. Ryoo and J. K. Aggarwal, "Recognition of composite human activities through context-free grammar based representation," in Computer vision and pattern recognition, 2006 ieee computer society conference on, 2006, vol. 2, pp. 1709--1718: IEEE. Google ScholarDigital Library
W. Choi, K. Shahid, and S. Savarese, "What are they doing?: Collective activity classification using spatio-temporal relationship among people," in Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, 2009, pp. 1282--1289: IEEE.Google Scholar
T. Lan, L. Sigal, and G. Mori, "Social roles in hierarchical models for human activity recognition," in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 1354--1361: IEEE. Google ScholarDigital Library
V. Ramanathan, B. Yao, and L. Fei-Fei, "Social role discovery in human events," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 2475--2482. Google ScholarDigital Library
M. R. Amer, P. Lei, and S. Todorovic, "Hirf: Hierarchical random field for collective activity recognition in videos," in European Conference on Computer Vision, 2014, pp. 572--585: Springer.Google Scholar
L. Sun, H. Ai, and S. Lao, "Activity group localization by modeling the relations among participants," in European Conference on Computer Vision, 2014, pp. 741--755: Springer.Google Scholar
J. Carreira and A. Zisserman, "Quo vadis, action recognition? a new model and the kinetics dataset," in Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, 2017, pp. 4724--4733: IEEE.Google Scholar
S. Ji, W. Xu, M. Yang, and K. Yu, "3D convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221--231, 2013. Google ScholarDigital Library
K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," in Advances in neural information processing systems, 2014, pp. 568--576. Google ScholarDigital Library
B. Singh, T. K. Marks, M. Jones, O. Tuzel, and M. Shao, "A multi-stream bi-directional recurrent neural network for fine-grained action detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1961--1970.Google Scholar
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, "Large-scale video classification with convolutional neural networks," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2014, pp. 1725--1732. Google ScholarDigital Library
M. S. Ibrahim, S. Muralidharan, Z. Deng, A. Vahdat, and G. Mori, "A hierarchical deep temporal model for group activity recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1971--1980.Google Scholar
T. Shu, S. Todorovic, and S.-C. Zhu, "CERN: confidence-energy recurrent network for group activity recognition," in IEEE Conference on Computer Vision and Pattern Recognition, 2017, vol. 2.Google Scholar
M. Wang, B. Ni, and X. Yang, "Recurrent modeling of interaction context for collective activity recognition," in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.Google Scholar
T. Lan, Y. Wang, W. Yang, S. N. Robinovitch, and G. Mori, "Discriminative latent models for recognizing contextual group activities," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 8, pp. 1549--1562, 2012. Google ScholarDigital Library
W. Choi, Y.-W. Chao, C. Pantofaru, and S. Savarese, "Discovering groups of people in images," in European conference on computer vision, 2014, pp. 417--433: Springer.Google Scholar
D. Weinland, R. Ronfard, and E. Boyer, "A survey of vision-based methods for action representation, segmentation and recognition," Computer vision and image understanding, vol. 115, no. 2, pp. 224--241, 2011. Google ScholarDigital Library
W. Choi, K. Shahid, and S. Savarese, "Learning context for collective activity recognition," 2011.Google Scholar
M. Ryoo and J. Aggarwal, "Stochastic representation and recognition of high-level group activities," International journal of computer Vision, vol. 93, no. 2, pp. 183--200, 2011. Google ScholarDigital Library
H. Hajimirsadeghi, W. Yan, A. Vahdat, and G. Mori, "Visual recognition by counting instances: A multi-instance cardinality potential kernel," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2596--2605.Google Scholar
S. Lathuilière, G. Evangelidis, and R. Horaud, "Recognition of Group Activities in Videos Based on Single-and Two-Person Descriptors," in Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on, 2017, pp. 217--225: IEEE.Google Scholar
M. R. Amer, S. Todorovic, A. Fern, and S.-C. Zhu, "Monte carlo tree search for scheduling activity recognition," in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1353--1360. Google ScholarDigital Library
S. Khamis, V. I. Morariu, and L. S. Davis, "Combining per-frame and per-track cues for multi-person action recognition," in European Conference on Computer Vision, 2012, pp. 116--129: Springer. Google ScholarDigital Library
Z. Deng et al., "Deep structured models for group activity recognition," arXiv preprint arXiv:1506.04191, 2015.Google Scholar
W. Choi and S. Savarese, "Understanding Collective Activitiesof People from Videos," vol. 36, pp. 1242--1257. Google ScholarDigital Library
W. Choi and S. Savarese, "A unified framework for multi-target tracking and collective activity recognition," in European Conference on Computer Vision, 2012, pp. 215--230: Springer. Google ScholarDigital Library
S. Khamis, V. I. Morariu, and L. S. Davis, "A flow model for joint action recognition and identity maintenance," in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 1218--1225: IEEE. Google ScholarDigital Library
Z. Deng, A. Vahdat, H. Hu, and G. Mori, "Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4772--4781.Google Scholar
L. Wei and S. K. Shah, "Human Activity Recognition using Deep Neural Network with Contextual Information," in VISIGRAPP (5: VISAPP), 2017, pp. 34--43.Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770--778.Google Scholar
P. Sudowe, H. Spitzer, and B. Leibe, "Person Attribute Recognition with a Jointly-Trained Holistic CNN Model," pp. 329--337. Google ScholarDigital Library
K. Hornik, "Approximation capabilities of multilayer feedforward networks," Neural networks, vol. 4, no. 2, pp. 251--257, 1991. Google ScholarDigital Library
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," in Kdd, 1996, vol. 96, no. 34, pp. 226--231. Google ScholarDigital Library
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929--1958, 2014. Google ScholarDigital Library
F. c. Chollet. (2015). Keras. Available: https://github.com/fchollet/kerasGoogle Scholar
O. Russakovsky et al., "Imagenet large scale visual recognition challenge," International Journal of Computer Vision, vol. 115, no. 3, pp. 211--252, 2015. Google ScholarDigital Library
D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.Google Scholar
B. Antic and B. Ommer, "Learning latent constituents for recognition of group activities in video," in European Conference on Computer Vision, 2014, pp. 33--47: Springer.Google Scholar

Index Terms

Cooperative Hierarchical Framework for Group Activity Recognition: From Group Detection to Multi-activity Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Ensembles of Deep LSTM Learners for Activity Recognition using Wearables

Recently, deep learning (DL) methods have been introduced very successfully into human activity recognition (HAR) scenarios in ubiquitous and wearable computing. Especially the prospect of overcoming the need for manual feature design combined with ...
Read More
Hybrid deep learning approaches for smartphone sensor-based human activity recognition
Abstract
Human Activity Recognition (HAR) has become one of the most important research fields to achieve real-time monitoring of human activities for timely decision making in various applications like fall detection, elderly care etc. Now-a-days, most ...
Read More
Part-Aware Spatial-Temporal Graph Convolutional Network for Group Activity Recognition
Artificial Intelligence
Abstract
Group activity recognition, a challenging task that requires not only recognizing the individual actions of each person but also inferring relationships among persons, has received considerable attention. Previous methods infer coarse-level ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICSCA '19: Proceedings of the 2019 8th International Conference on Software and Computer Applications
February 2019
611 pages
ISBN:9781450365734
DOI:10.1145/3316615

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 February 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CNN
Clustering Algorithm
Group Activity Recognition
LSTM
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 143
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cooperative Hierarchical Framework for Group Activity Recognition: From Group Detection to Multi-activity Recognition

ICSCA '19: Proceedings of the 2019 8th International Conference on Software and Computer Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

Ensembles of Deep LSTM Learners for Activity Recognition using Wearables

Hybrid deep learning approaches for smartphone sensor-based human activity recognition

Part-Aware Spatial-Temporal Graph Convolutional Network for Group Activity Recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Cooperative Hierarchical Framework for Group Activity Recognition: From Group Detection to Multi-activity Recognition

ICSCA '19: Proceedings of the 2019 8th International Conference on Software and Computer Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

Ensembles of Deep LSTM Learners for Activity Recognition using Wearables

Hybrid deep learning approaches for smartphone sensor-based human activity recognition

Part-Aware Spatial-Temporal Graph Convolutional Network for Group Activity Recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media