research-article

Human Activity Role Identification using Feature Vector and Encoding Techniques on Natural Language Sentences

Authors:

Mayank Lovanshi,

Rahul ShrivastavaAuthors Info & Claims

IVSP '23: Proceedings of the 2023 5th International Conference on Image, Video and Signal Processing

Pages 1 - 9

https://doi.org/10.1145/3591156.3591157

Published: 16 June 2023 Publication History

Abstract

Role Identification has the potential to enhance activity recognition applications since it adds more information. Most of the works in the field of activity recognition and role identification are mainly dominated by models that use images and videos. The existing datasets of human activity are not capable of role identification. In this view, this work attempt to develop a novel Human Activity Role Identification Dataset and a novel Computational Recurrent Model that takes textual data as input. Additionally, various feature vector generation methods like N-Grams extraction, Unique word extraction, and Word2Vec are used to encode the input data into feature vectors that describe the relationship between sequences of words. To determine the fundamental roles, these feature vectors are trained on various types of Recurrent Neural Networks (i.e. RNN, LSTM, GRU). The proposed model is validated on evaluation metrics such as Precision, Recall, F1 Score, etc., using Recurrent Neural Networks like RNN, LSTM, and GRU. Hence, the combination of LSTM with unique word extraction methods outperforms with an F1 Score, precision and recall by 0.44, 0.36 and 0.58 respectively. So this role identification work may help to bind roles with entity and objects in human activity recognition.

References

[1]

Anam Arshad, Vivek Tiwari, Mayank Lovanshi, and Rahul Shrivastava. 2023. Role Identification from Human Activity Videos using Recurrent Neural Networks. In proceedings of the 8th IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE).

[2]

Djamila Romaissa Beddiar, Brahim Nini, Mohammad Sabokrou, and Abdenour Hadid. 2020. Vision-based human activity recognition: a survey. Multimedia Tools and Applications 79, 41 (2020), 30509–30555.

Digital Library

[3]

Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7291–7299.

[4]

Alebachew Chiche and Betselot Yitagesu. 2022. Part of speech tagging: a systematic review of deep learning and machine learning approaches. Journal of Big Data 9, 1 (2022), 1–25.

[5]

Jason PC Chiu and Eric Nichols. 2016. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the association for computational linguistics 4 (2016), 357–370.

[6]

Wongun Choi and Silvio Savarese. 2013. Understanding collective activitiesof people from videos. IEEE transactions on pattern analysis and machine intelligence 36, 6 (2013), 1242–1257.

[7]

Meenakshi Choudhary, Vivek Tiwari, and U Venkanna. 2020. Enhancing human iris recognition performance in unconstrained environment using ensemble of convolutional and residual deep neural network models. Soft Computing 24, 15 (2020), 11477–11491.

Digital Library

[8]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).

Digital Library

[9]

Rahul Dey and Fathi M Salem. 2017. Gate-variants of gated recurrent unit (GRU) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS). IEEE, 1597–1600.

[10]

Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2625–2634.

[11]

John T Hancock and Taghi M Khoshgoftaar. 2020. Survey on categorical data for neural networks. Journal of Big Data 7, 1 (2020), 1–41.

[12]

Maria M Hedblom, Oliver Kutz, Rafael Peñaloza, and Giancarlo Guizzardi. 2019. Image schema combinations and complex events. KI-Künstliche Intelligenz 33, 3 (2019), 279–291.

[13]

Jiaxin Huang, Chunyuan Li, Krishan Subudhi, Damien Jose, Shobana Balakrishnan, Weizhu Chen, Baolin Peng, Jianfeng Gao, and Jiawei Han. 2021. Few-Shot Named Entity Recognition: An Empirical Baseline Study. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 10408–10423.

[14]

Yanli Ji, Guo Ye, and Hong Cheng. 2014. Interactive body part contrast mining for human interaction recognition. In 2014 IEEE international conference on multimedia and expo workshops (ICMEW). IEEE, 1–6.

[15]

Yaozong Jia and Xiaobin Xu. 2018. Chinese named entity recognition based on cnn-bilstm-crf. In 2018 IEEE 9th international conference on software engineering and service science (ICSESS). IEEE, 1–4.

[16]

Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. 2016. Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410 (2016).

[17]

Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3595–3603.

[18]

Ivan Lillo, Juan Carlos Niebles, and Alvaro Soto. 2017. Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos. Image and Vision Computing 59 (2017), 63–75.

Digital Library

[19]

Mayank Lovanshi and Vivek Tiwari. 2023. Human Pose Estimation: Benchmarking Deep Learning-based Methods. In proceedings of the IEEE Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation.

[20]

Fang Luo, Han Xiao, and Weili Chang. 2011. Product named entity recognition using conditional random fields. In 2011 Fourth international conference on business intelligence and financial engineering. IEEE, 86–89.

Digital Library

[21]

Steven L Lytinen. 1992. Conceptual dependency and its descendants. Computers & Mathematics with Applications 23, 2-5 (1992), 51–73.

[22]

Jamie C Macbeth and Dagmar Gromann. 2019. Towards Modeling Conceptual Dependency Primitives with Image Schema Logic. (2019).

[23]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).

[24]

Kriti Pawar, Raj Srujan Jalem, and Vivek Tiwari. 2019. Stock market price prediction using LSTM RNN. In Emerging Trends in Expert Applications and Security: Proceedings of ICETEAS 2018. Springer, 493–503.

[25]

Ronald Poppe. 2010. A survey on vision-based human action recognition. Image and vision computing 28, 6 (2010), 976–990.

[26]

Michalis Raptis and Leonid Sigal. 2013. Poselet key-framing: A model for human activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2650–2657.

Digital Library

[27]

Michael S Ryoo. 2011. Human activity prediction: Early recognition of ongoing activities from streaming videos. In 2011 International Conference on Computer Vision. IEEE, 1036–1043.

Digital Library

[28]

Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh, Mathieu Salzmann, Basura Fernando, Lars Petersson, and Lars Andersson. 2017. Encouraging lstms to anticipate actions very early. In Proceedings of the IEEE International Conference on Computer Vision. 280–289.

[29]

Roger C Schank. 1972. Conceptual dependency: A theory of natural language understanding. Cognitive psychology 3, 4 (1972), 552–631.

[30]

Alex Sherstinsky. 2020. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena 404 (2020), 132306.

[31]

Rahul Shrivastava, Vivek Tiwari, Swati Jain, Basant Tiwari, Alok Kumar Singh Kushwaha, and Vibhav Prakash Singh. 2022. A role-entity based human activity recognition using inter-body features and temporal sequence memory. IET Image Processing (2022).

[32]

Kamilya Smagulova and Alex Pappachen James. 2019. A survey on LSTM memristive neural network architectures and applications. The European Physical Journal Special Topics 228, 10 (2019), 2313–2324.

[33]

Daniel Soutner and Luděk Müller. 2013. Application of LSTM neural networks in language modelling. In International Conference on Text, Speech and Dialogue. Springer, 105–112.

[34]

Vivek Tiwari, Aditi Agrahari, and Sriyuta Srivastava. 2021. Performance analysis of hand-crafted features and cnn toward real-time crop disease identification. In Information and Communication Technology for Intelligent Systems: Proceedings of ICTIS 2020, Volume 1. Springer, 497–505.

[35]

Arash Vahdat, Bo Gao, Mani Ranjbar, and Greg Mori. 2011. A discriminative key pose sequence model for recognizing human interactions. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). IEEE, 1729–1736.

[36]

Daniel Weinland, Remi Ronfard, and Edmond Boyer. 2006. Free viewpoint action recognition using motion history volumes. Computer vision and image understanding 104, 2-3 (2006), 249–257.

[37]

Yong Yu, Xiaosheng Si, Changhua Hu, and Jianxun Zhang. 2019. A review of recurrent neural networks: LSTM cells and network architectures. Neural computation 31, 7 (2019), 1235–1270.

[38]

Kiwon Yun, Jean Honorio, Debaleena Chattopadhyay, Tamara L Berg, and Dimitris Samaras. 2012. Two-person interaction detection using body-pose features and multiple instance learning. In 2012 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, 28–35.

[39]

Yimeng Zhang, Xiaoming Liu, Ming-Ching Chang, Weina Ge, and Tsuhan Chen. 2012. Spatio-temporal phrases for activity recognition. In European Conference on Computer Vision. Springer, 707–721.

Digital Library

[40]

Qiang Zhou and Gang Wang. 2012. Atomic action features: A new feature for action recognition. In European Conference on Computer Vision. Springer, 291–300.

Digital Library

Index Terms

Human Activity Role Identification using Feature Vector and Encoding Techniques on Natural Language Sentences
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction techniques
      1. Text input

Recommendations

A Neural Network Architecture Combining Gated Recurrent Unit (GRU) and Support Vector Machine (SVM) for Intrusion Detection in Network Traffic Data
ICMLC '18: Proceedings of the 2018 10th International Conference on Machine Learning and Computing

Gated Recurrent Unit (GRU) is a recently-developed variation of the long short-term memory (LSTM) unit, both of which are variants of recurrent neural network (RNN). Through empirical evidence, both models have been proven to be effective in a wide ...
Enhancing recurrent neural network-based language models by word tokenization

Different approaches have been used to estimate language models from a given corpus. Recently, researchers have used different neural network architectures to estimate the language models from a given corpus using unsupervised learning neural networks ...
Local maximum ozone concentration prediction using soft computing methodologies

The prediction of ozone levels is an important task because this toxic gas can produce harmful effects to the population health especially of children. This article describes the application of the Fuzzy Inductive Reasoning methodology and a Recurrent ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IVSP '23: Proceedings of the 2023 5th International Conference on Image, Video and Signal Processing

March 2023

207 pages

ISBN:9781450398381

DOI:10.1145/3591156

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IVSP 2023

IVSP 2023: 2023 5th International Conference on Image, Video and Signal Processing

March 24 - 26, 2023

Singapore, Singapore

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
40
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)4

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents