research-article

A Multi-modal System to Assess Cognition in Children from their Physical Movements

Authors:
Ashwin Ramesh Babu

The University of Texas at Arlington, Arlington, TX, USA

The University of Texas at Arlington, Arlington, TX, USA
View Profile

,
Mohammad Zaki Zadeh

The University of Texas at Arlington, Arlington, TX, USA

The University of Texas at Arlington, Arlington, TX, USA
View Profile

,
Ashish Jaiswal

The University of Texas at Arlington, arlington, TX, USA

The University of Texas at Arlington, arlington, TX, USA
View Profile

,
Alexis Lueckenhoff

The University of Texas at Arlington, Arlington, TX, USA

The University of Texas at Arlington, Arlington, TX, USA
View Profile

,
Maria Kyrarini

The University of Texas at Arlington, Arlington, TX, USA

The University of Texas at Arlington, Arlington, TX, USA
View Profile

,
Fillia Makedon

The University of Texas at Arlington, Arlington, TX, USA

The University of Texas at Arlington, Arlington, TX, USA
View Profile

ICMI '20: Proceedings of the 2020 International Conference on Multimodal InteractionOctober 2020Pages 6–14https://doi.org/10.1145/3382507.3418829

Published:22 October 2020Publication History

ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

Pages 6–14

ABSTRACT

In recent years, computer and game-based cognitive tests have become popular with the advancement in mobile technology. However, these tests require very little body movements and do not consider the influence that physical motion has on cognitive development. Our work mainly focus on assessing cognition in children through their physical movements. Hence, an assessment test "Ball-Drop-to-the-Beat" that is both physically and cognitively demanding has been used where the child is expected to perform certain actions based on the commands. The task is specifically designed to measure attention, response inhibition, and coordination in children. A dataset has been created with 25 children performing this test. To automate the scoring, a computer vision-based assessment system has been developed. The vision system employs an attention-based fusion mechanism to combine multiple modalities such as optical flow, human poses, and objects in the scene to predict a child's action. The proposed method outperforms other state-of-the-art approaches by achieving an average accuracy of 89.8 percent on predicting the actions and an average accuracy of 88.5 percent on predicting the rhythm on the Ball-Drop-to-the-Beat dataset.

Supplemental Material

3382507.3418829.mp4

mp4

8.4 MB

Download

References

Ashwin Ramesh Babu, Akilesh Rajavenkatanarayanan, James Robert Brady, and Fillia Makedon. 2018. Multimodal approach for cognitive task performance prediction from body postures, facial expressions and EEG signal. In Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data. 1--7.Google ScholarDigital Library
Ashwin Ramesh Babu, Mohammad Zakizadeh, James Robert Brady, Diane Calderon, and Fillia Makedon. 2019. An Intelligent Action Recognition System to assess Cognitive Behavior for Executive Function Disorder. In 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE). IEEE, 164--169.Google ScholarCross Ref
Vinay Bettadapura, Grant Schindler, Thomas Plötz, and Irfan Essa. 2013. Augmenting bag-of-words: Data-driven discovery of temporal and structural information for activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2619--2626.Google ScholarDigital Library
Matteo Bregonzio, Shaogang Gong, and Tao Xiang. 2009. Recognising action as clouds of space-time interest points. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 1948--1955.Google ScholarCross Ref
Thomas Brox, Andrés Bruhn, Nils Papenberg, and Joachim Weickert. 2004. High accuracy optical flow estimation based on a theory for warping. In European conference on computer vision. Springer, 25--36.Google ScholarCross Ref
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008 (2018).Google Scholar
Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh. 2019. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).Google Scholar
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.Google ScholarCross Ref
Jen-Yen Chang, Antonio Tejero-de Pablos, and Tatsuya Harada. 2019. Improved Optical Flow for Gesture-based Human-robot Interaction. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 7983--7989.Google Scholar
Rizwan Chaudhry, Avinash Ravichandran, Gregory Hager, and René Vidal. 2009. Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1932--1939.Google ScholarCross Ref
Catherine L Davis and Stephanie Cooper. 2011. Fitness, fatness, cognition, behavior, and academic achievement among overweight children: do cross-sectional associations correspond to exercise trial outcomes? Preventive medicine 52 (2011), S65--S69.Google Scholar
Emma E Davis, Nicola J Pitchford, and Ellie Limback. 2011. The interrelation between cognitive and motor development in typically developing children aged 4--11 years is underpinned by visual processing and fine manual control. British Journal of Psychology 102, 3 (2011), 569--584.Google ScholarCross Ref
Milton J Dehn. 2011. Working memory and academic learning: Assessment and intervention. John Wiley & Sons.Google Scholar
Adele Diamond. 2013. Executive functions. Annual review of psychology 64 (2013), 135--168.Google Scholar
Alex Dillhoff, Konstantinos Tsiakas, Ashwin Ramesh Babu, Mohammad Zakizadehghariehali, Benjamin Buchanan, Morris Bell, Vassilis Athitsos, and Fillia Makedon. 2019. An automated assessment system for embodied cognition in children: from motion data to executive functioning. In Proceedings of the 6th international Workshop on Sensor-based Activity Recognition and Interaction. 1--6.Google ScholarDigital Library
Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. RMPE: Regional Multi-person Pose Estimation. In ICCV.Google Scholar
Annalisa Franco, Antonio Magnani, and Dario Maio. 2020. A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recognition Letters (2020).Google Scholar
Harshala Gammulle, Simon Denman, Sridha Sridharan, and Clinton Fookes. 2017. Two stream lstm: A deep fusion framework for human action recognition. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 177--186.Google ScholarCross Ref
Srujana Gattupalli, Ashwin Ramesh Babu, James Robert Brady, Fillia Makedon, and Vassilis Athitsos. 2018. Towards deep learning based hand keypoints detection for rapid sequential movements from rgb images. In Proceedings of the 11th PErvasive Technologies Related to Assistive Environments Conference. 31--37.Google ScholarDigital Library
Srujana Gattupalli, Dylan Ebert, Michalis Papakostas, Fillia Makedon, and Vassilis Athitsos. 2017. Cognilearn: A deep learning-based interface for cognitive behavior assessment. In Proceedings of the 22nd International Conference on Intelligent User Interfaces. 577--587.Google ScholarDigital Library
Alexander Grushin, Derek D Monner, James A Reggia, and Ajay Mishra. 2013. Robust human action recognition via long short-term memory. In The 2013 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.Google ScholarCross Ref
Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2017. Learning spatiotemporal features with 3D residual networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3154--3160.Google ScholarCross Ref
Samitha Herath, Mehrtash Harandi, and Fatih Porikli. 2017. Going deeper into action recognition: A survey. Image and vision computing 60 (2017), 4--21.Google Scholar
Berthold KP Horn and Brian G Schunck. 1981. Determining optical flow. In Techniques and Applications of Image Understanding, Vol. 281. International Society for Optics and Photonics, 319--331.Google ScholarCross Ref
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.Google ScholarCross Ref
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2012. 3D convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence 35, 1 (2012), 221--231.Google Scholar
Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas PJJ Noldus, and Remco C Veltkamp. 2019. Egocentric Hand Track and Object-based Human Action Recognition. arXiv preprint arXiv:1905.00742 (2019).Google Scholar
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725--1732.Google ScholarDigital Library
Muhammed Kocabas, Nikos Athanasiou, and Michael J Black. 2020. VIBE: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5253--5263.Google ScholarCross Ref
Yu Kong and Yun Fu. 2018. Human action recognition and prediction: A survey. arXiv preprint arXiv:1806.11230 (2018).Google Scholar
Maria Kyrarini, Quan Zheng, Muhammad Abdul Haseeb, and Axel Gräser. 2019. Robot Learning of Assistive Manipulation Tasks by Demonstration via Head Gesture-based Interface. In 2019 IEEE 16th International Conference on Rehabilitation Robotics (ICORR). IEEE, 1139--1146.Google ScholarCross Ref
Tongwei Lu, Shihui Ai, Yongyuan Jiang, Yudian Xiong, and Feng Min. 2018. Deep Optical Flow Feature Fusion Based on 3D Convolutional Networks for Video Action Recognition. In 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, 1077--1080.Google Scholar
Haojie Ma, Wenzhong Li, Xiao Zhang, Songcheng Gao, and Sanglu Lu. 2019. AttnSense: multi-level attention mechanism for multimodal human activity recognition. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 3109--3115.Google ScholarCross Ref
Mehran Maghoumi and Joseph J LaViola Jr. 2019. DeepGRU: Deep gesture recognition utility. In International Symposium on Visual Computing. Springer, 16--31.Google ScholarCross Ref
James W Montgomery, Beula M Magimairaj, and Mianisha C Finney. 2010. Working memory and specific language impairment: An update on the relation and perspectives on assessment and treatment. American journal of speech-language pathology (2010).Google Scholar
Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML'10). Omnipress, Madison, WI, USA, 807--814.Google ScholarDigital Library
Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Akilesh Rajavenkatanarayanan, Ashwin Ramesh Babu, Konstantinos Tsiakas, and Fillia Makedon. 2018. Monitoring task engagement using facial expressions and body postures. In Proceedings of the 3rd International Workshop on Interactive and Spatial Computing. 103--108.Google ScholarDigital Library
Michalis Raptis and Leonid Sigal. 2013. Poselet key-framing: A model for human activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2650--2657.Google ScholarDigital Library
Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google Scholar
Bin Ren, Mengyuan Liu, Runwei Ding, and Hong Liu. 2020. A Survey on 3D Skeleton-Based Action Recognition Using Learning Method. arXiv preprint arXiv:2002.05907 (2020).Google Scholar
Itsaso Rodríguez-Moreno, José María Martínez-Otzeta, Basilio Sierra, Igor Rodriguez, and Ekaitz Jauregi. 2019. Video Activity Recognition: State-of-the-Art. Sensors 19, 14 (2019), 3160.Google ScholarCross Ref
Laura Sevilla-Lara, Yiyi Liao, Fatma Güney, Varun Jampani, Andreas Geiger, and Michael J Black. 2018. On the integration of optical flow and action recognition. In German Conference on Pattern Recognition. Springer, 281--297.Google Scholar
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1010--1019.Google ScholarCross Ref
Ali Sharifara, Ashwin Ramesh Babu, Akilesh Rajavenkatanarayanan, Christopher Collander, and Fillia Makedon. 2018. A robot-based cognitive assessment model based on visual working memory and attention level. In International Conference on Universal Access in Human-Computer Interaction. Springer, 583--597.Google ScholarCross Ref
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7912--7921.Google ScholarCross Ref
Riyanto Sigit, Dyah Rahma Kartika, et al. 2016. 3D Sign language translator using optical flow. In 2016 International Electronics Symposium (IES). IEEE, 262--266.Google ScholarCross Ref
Hannah R Snyder. 2013. Major depressive disorder is associated with broad impairments on neuropsychological measures of executive function: a metaanalysis and review. Psychological bulletin 139, 1 (2013), 81.Google Scholar
Shuyang Sun, Zhanghui Kuang, Lu Sheng, Wanli Ouyang, and Wei Zhang. 2018. Optical flow guided feature: A fast and robust motion representation for video action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1390--1399.Google ScholarCross Ref
Amin Ullah, Jamil Ahmad, Khan Muhammad, Muhammad Sajjad, and Sung Wook Baik. 2017. Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6 (2017), 1155--1166.Google ScholarCross Ref
Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters 119 (2019), 3--11.Google ScholarDigital Library
Wei Wang, Jinjin Zhang, Chenyang Si, and Liang Wang. 2018. Pose-based twostream relational networks for action recognition in videos. arXiv preprint arXiv:1805.08484 (2018).Google Scholar
Erik G Willcutt, Alysa E Doyle, Joel T Nigg, Stephen V Faraone, and Bruce F Pennington. 2005. Validity of the executive function theory of attentiondeficit/hyperactivity disorder: a meta-analytic review. Biological psychiatry 57, 11 (2005), 1336--1346.Google Scholar
Mohammad Zaki Zadeh, Ashwin Ramesh Babu, Ashish Jaiswal, and Fillia Makedon. 2020. Self-Supervised Human Activity Recognition by Augmenting Generative Adversarial Networks. arXiv:cs.CV/2008.11755Google Scholar
Philip David Zelazo, Jacob E Anderson, Jennifer Richler, Kathleen Wallner-Allen, Jennifer L Beaumont, and Sandra Weintraub. 2013. II. NIH Toolbox Cognition Battery (CB): Measuring executive function and attention. Monographs of the Society for Research in Child Development 78, 4 (2013), 16--33.Google ScholarCross Ref
Hong-Bo Zhang, Yi-Xiang Zhang, Bineng Zhong, Qing Lei, Lijie Yang, Ji-Xiang Du, and Duan-Sheng Chen. 2019. A comprehensive survey of vision-based human action recognition methods. Sensors 19, 5 (2019), 1005.Google ScholarCross Ref
Zhengxia Zou, Zhenwei Shi, Yuhong Guo, and Jieping Ye. 2019. Object detection in 20 years: A survey. arXiv preprint arXiv:1905.05055 (2019).Google Scholar

Index Terms

A Multi-modal System to Assess Cognition in Children from their Physical Movements
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

An automated assessment system for embodied cognition in children: from motion data to executive functioning
iWOAR '19: Proceedings of the 6th International Workshop on Sensor-based Activity Recognition and Interaction

We present our preliminary data analysis towards an automated assessment system for the Activate Test for Embodied Cognition (ATEC), a test which measures cognitive skills through physical activity. More specifically, we present two core ATEC tasks ...
Read More
SmartFunction: An Immersive Vr System To Assess Attention Using Embodied Cognition
PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments

In traditional neuropsychological tests, executive functions (EFs) are typically evaluated using paper and pencil or computer-based sit-down tasks. However, a new assessment framework, the Automated Test of Embodied Cognition (ATEC), has been developed ...
Read More
Multi-modal fusion network with intra- and inter-modality attention for prognosis prediction in breast cancer
Abstract
Accurate breast cancer prognosis prediction can help clinicians to develop appropriate treatment plans and improve life quality for patients. Recent prognostic prediction studies suggest that fusing multi-modal data, e.g., genomic data and ...
Highlights
- Multi-modal network with Intra- and Inter-modality attention for breast cancer prognosis prediction.
- Intra-SA modules mine modality-specific relations without bringing in the huge expansion of feature dimensions.
- Inter-CA module ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
October 2020
920 pages
ISBN:9781450375818
DOI:10.1145/3382507
General Chairs:
Khiet Truong
University of Twente, the Netherlands
,
Dirk Heylen
University of Twente, the Netherlands
,
Mary Czerwinski
Microsoft Research, USA
,
Program Chairs:
Nadia Berthouze
University College London, United Kingdom
,
Mohamed Chetouani
Sorbonne University, France
,
Mikio Nakano
C4A Research Institute, Japan
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
attention
cognitive assessment
embodied cognition
human activity recognition (har)
multi-modal fusion
response inhibition
rhythm
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate453of1,080submissions,42%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 309
  Total Downloads
- Downloads (Last 12 months)38
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Multi-modal System to Assess Cognition in Children from their Physical Movements

ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

An automated assessment system for embodied cognition in children: from motion data to executive functioning

SmartFunction: An Immersive Vr System To Assess Attention Using Embodied Cognition

Multi-modal fusion network with intra- and inter-modality attention for prognosis prediction in breast cancer