ABSTRACT
Human activities are predominantly spatio-temporal, involving spatial changes over time. Qualitative spatial relations between interacting entities are often used to describe spatial change. To derive such qualitative spatial relations, the interacting entities are approximated as a single bounding box or set of bounding boxes. A set of bounding boxes abstracting a single entity has been termed as an extended object; where each box is bounding a component. Extended object abstraction of spatial entities has been shown to be more effective for representation of human activities [10]. The temporal aspect of an activity is characterized through changing spatial relations between components of interacting extended objects over time. In this paper, we propose Temporal Activity Graph (TAG) based representation model to keep track of the sequences of relations between components of the extended objects. A kernel is designed for classification of spatio-temporal interactions in the TAG based model. The TAG kernel uses concepts of label sequence similarity and interestingness to compute similarity of a pair of TAGs. The TAG kernel is a generic solution that can be used with any kernel based method. Here, the kernel is used within a Support Vector Machine classifier. The TAG kernel based classification of activities is found on par with the state-of-the-art approaches for experiments performed on the Mind's Eye, the UT Interaction, and the SBU Kinect Interaction datasets.
- M. Ahmad and Seong-Whan Lee. 2006. HMM-based Human Action Recognition Using Multiview Image Sequences. In 18th Intl. Conf. on Pattern Recognition (ICPR). IEEE, 263--266. https://doi.org/10.1109/ICPR.2006.630Google ScholarDigital Library
- Thomas Bittner and Maureen Donnelly. 2007. A formal theory of qualitative size and distance relations between regions. In Proc. of the 21st Annual Workshop on Qualitative Reasoning (QR 2007).Google Scholar
- Henri Bouma, Gertjan Burghouts, Leo de Penning, Patrick Hanckmann, Johan-Martijn ten Hove, Sanne Korzec, Maarten Kruithof, Sander Landsmeer, Coen van Leeuwen, Sebastiaan van den Broek, Arvid Halma, Richard den Hollander, and Klamer Schutte. 2013. Recognition and localization of relevant human behavior in videos. In Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense XII. SPIE, 87110B-10. https://doi.org/10.1117/12.2015877Google Scholar
- A. G. Cohn and S. M. Hazarika. 2001. Qualitative Spatial Representation and Reasoning: An Overview. Fundamenta Informaticae 46, 1-2 (2001), 1--29.Google ScholarDigital Library
- Anthony G. Cohn, Derek R. Magee, Aphrodite Galata, David Hogg, and Shyamanta M. Hazarika. 2003. Towards an Architecture for Cognitive Vision Using Qualitative Spatio-temporal Representations and Abduction. In Spatial Cognition. Springer Berlin Heidelberg, 232--248. https://doi.org/10.1007/3-540-45004-1_14Google Scholar
- Anthony G. Cohn, Jochen Renz, and Muralikrishna Sridhar. 2012. Thinking Inside the Box: A Comprehensive Spatial Representation for Video Analysis. In Proc. 13th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR2012). AAAI Press, 588--592.Google Scholar
- Krishna S. R. Dubba, Mehul Bhatt, Frank Dylla, David C. Hogg, and Anthony G. Cohn. 2012. Interleaved Inductive-Abductive Reasoning for Learning Complex Event Models.. In ILP (Lecture Notes in Computer Science), Vol. 7207. Springer, 113--129.Google Scholar
- Christian Freksa. 1992. Temporal Reasoning Based on Semi-intervals. Artificial Intelligence 54, 1-2 (1992), 199--227. https://doi.org/10.1016/0004-3702(92)90090-KGoogle ScholarDigital Library
- M. Humayun Kabir, M. Robiul Hoque, Keshav Thapa, and Sung-Hyun Yang. 2016. Two-Layer Hidden Markov Model for Human Activity Recognition in Home Environments. Intl. Journal of Distributed Sensor Networks 12, 1 (2016), 12. https://doi.org/10.1155/2016/4560365Google Scholar
- Shobhanjana Kalita, Arindam Karmakar, and Shyamanta M. Hazarika. 2018. Efficient extraction of spatial relations for extended objects vis-à-vis human activity recognition in video. Applied Intelligence 48, 1 (2018), 204--219. https://doi.org/10.1007/s10489-017-0970-8Google ScholarDigital Library
- Keerthy Kusumam. 2012. Relational Learning using body parts for Human Activity Recognition in Videos. Master's thesis. University of Leeds.Google Scholar
- Pierre Latouche and Fabrice Rossi. 2015. Graphs in machine learning:An introduction. In 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN). 207--218.Google Scholar
- Vlad I. Morariu, David Harwood, and Larry S. Davis. 2013. Tracking People's Hands and Feet Using Mixed Network AND/OR Search. IEEE Trans. on Pattern Recognition and Machine Intelligence (PAMI) 35, 5 (2013), 1248--1262.Google ScholarDigital Library
- S.U. Park, J.H. Park, M.A. Al-masni, M.A. Al-antari, Md.Z. Uddin, and T.-S. Kim. 2016. A Depth Camera-based Human Activity Recognition via Deep Learning Recurrent Neural Network for Health and Social Care Services. Procedia Computer Science 100 (2016), 78--84.Google ScholarCross Ref
- V. Ramakrishna, T. Kanade, and Y. Sheikh. 2013. Tracking Human Pose by Tracking Symmetric Parts. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition. 3728--3735. https://doi.org/10.1109/CVPR.2013.478Google ScholarDigital Library
- David A. Randell, Zhan Cui, and Anthony Cohn. 1992. A Spatial Logic Based on Regions and Connection. In KR'92. Principles of Knowledge Representation and Reasoning: Proc. of the 3rd Int. Conf., Bernhard Nebel, Charles Rich, and William Swartout (Eds.). Morgan Kaufmann, 165--176.Google Scholar
- M. S. Ryoo and J. K. Aggarwal. 2010. UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA). http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html. (2010).Google Scholar
- Spiros Skiadopoulos and Manolis Koubarakis. 2005. On the consistency of cardinal directions constraints. Artificial Intelligence 163 (2005), 91--135.Google ScholarDigital Library
- Hajar Sadeghi Sokeh, Stephen Gould, and Jochen J. 2013. Efficient Extraction and Representation of Spatial Information from Video Data.. In Proc. of the 23rd Int. Joint Conf. on Artificial Intelligence (IJCAI'13). AAAI Press/IJCAI, 1076--1082.Google Scholar
- Muralikrishna Sridhar, Anthony G. Cohn, and David C. Hogg. 2010. Relational Graph Mining for Learning Events from Video. In 5th Starting AI Researchers Symposium (STAIRS). 315--327. https://doi.org/10.3233/978-1-60750-676-8-315Google Scholar
- D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. 2015. Learning Spatiotemporal Features with 3D Convolutional Networks. In 2015 IEEE Intl. Conf. on Computer Vision (ICCV). 4489--4497.Google Scholar
- R. Wagner and M. Fischer. 1974. The String-to-String Correction Problem. J. ACM 21, 1 (1974), 168--173.Google ScholarDigital Library
- Y. Wang and G. Mori. 2011. Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin. IEEE Trans. on Pattern Analysis and Machine Intelligence 33, 7 (2011), 1310--1323. https://doi.org/10.1109/TPAMI.2010.214Google ScholarDigital Library
- W. Xu, Z. Miao, and X. P. Zhang. 2015. Structured feature-graph model for human activity recognition. In IEEE Intl. Conf. on Image Processing (ICIP). IEEE, 1245--1249. https://doi.org/10.1109/ICIP.2015.7350999Google ScholarDigital Library
- Kiwon Yun, Jean Honorio, Debaleena Chattopadhyay, Tamara L. Berg, and Dimitris Samaras. 2012. Two-person Interaction Detection Using Body-Pose Features and Multiple Instance Learning. In IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW),. IEEE.Google Scholar
- Yimeng Zhang, Xiaoming Liu, Ming-Ching Chang, Weina Ge, and Tsuhan Chen. 2012. Spatio-Temporal Phrases for Activity Recognition. In Proc. of 12th European Conf. on Computer Vision (ECCV)-Part III. Springer Berlin Heidelberg, 707--721. https://doi.org/10.1007/978-3-642-33712-3_51Google ScholarDigital Library
- Yibiao Zhao, Steven Holtzen, Tao Gao, and Song-Chun Zhu. 2015. Represent and Infer Human Theory of Mind for Human-Robot Interaction. In AAAI Fall Symposium Series.Google Scholar
- Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie. 2016. Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks. In Proc. of the 30th AAAI Conf. on Artificial Intelligence, 2016. 3697--3704.Google ScholarCross Ref
Index Terms
- A Temporal Activity Graph Kernel for Human Activity Classification
Recommendations
Wrist View: Understanding Human Activity Through the Hand
Universal Access in Human-Computer InteractionAbstractUnderstanding human-object interaction is important for recognizing the activity and the sequence of actions performed. Egocentric tracking of people’s actions and interactions has long been a research topic in many fields. Humans use their hands ...
Lateralized frontal eye field activity precedes occipital activity shortly before saccades: Evidence for cortico-cortical feedback as a mechanism underlying covert attention shifts
When an eye movement is prepared, attention is shifted toward the saccade end-goal. This coupling of eye movements and spatial attention is thought to be mediated by cortical connections between the FEFs and the visual cortex. Here, we present evidence ...
Differential activity for animals and manipulable objects in the anterior temporal lobes
Neuropsychological evidence has highlighted the role of the anterior temporal lobes in the processing of conceptual knowledge. That putative role is only beginning to be investigated with fMRI as methodological advances are able to compensate for well-...
Comments