skip to main content
10.1145/3382507.3418829acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

A Multi-modal System to Assess Cognition in Children from their Physical Movements

Authors Info & Claims
Published:22 October 2020Publication History

ABSTRACT

In recent years, computer and game-based cognitive tests have become popular with the advancement in mobile technology. However, these tests require very little body movements and do not consider the influence that physical motion has on cognitive development. Our work mainly focus on assessing cognition in children through their physical movements. Hence, an assessment test "Ball-Drop-to-the-Beat" that is both physically and cognitively demanding has been used where the child is expected to perform certain actions based on the commands. The task is specifically designed to measure attention, response inhibition, and coordination in children. A dataset has been created with 25 children performing this test. To automate the scoring, a computer vision-based assessment system has been developed. The vision system employs an attention-based fusion mechanism to combine multiple modalities such as optical flow, human poses, and objects in the scene to predict a child's action. The proposed method outperforms other state-of-the-art approaches by achieving an average accuracy of 89.8 percent on predicting the actions and an average accuracy of 88.5 percent on predicting the rhythm on the Ball-Drop-to-the-Beat dataset.

Skip Supplemental Material Section

Supplemental Material

3382507.3418829.mp4

mp4

8.4 MB

References

  1. Ashwin Ramesh Babu, Akilesh Rajavenkatanarayanan, James Robert Brady, and Fillia Makedon. 2018. Multimodal approach for cognitive task performance prediction from body postures, facial expressions and EEG signal. In Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data. 1--7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ashwin Ramesh Babu, Mohammad Zakizadeh, James Robert Brady, Diane Calderon, and Fillia Makedon. 2019. An Intelligent Action Recognition System to assess Cognitive Behavior for Executive Function Disorder. In 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE). IEEE, 164--169.Google ScholarGoogle ScholarCross RefCross Ref
  3. Vinay Bettadapura, Grant Schindler, Thomas Plötz, and Irfan Essa. 2013. Augmenting bag-of-words: Data-driven discovery of temporal and structural information for activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2619--2626.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Matteo Bregonzio, Shaogang Gong, and Tao Xiang. 2009. Recognising action as clouds of space-time interest points. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 1948--1955.Google ScholarGoogle ScholarCross RefCross Ref
  5. Thomas Brox, Andrés Bruhn, Nils Papenberg, and Joachim Weickert. 2004. High accuracy optical flow estimation based on a theory for warping. In European conference on computer vision. Springer, 25--36.Google ScholarGoogle ScholarCross RefCross Ref
  6. Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008 (2018).Google ScholarGoogle Scholar
  7. Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh. 2019. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).Google ScholarGoogle Scholar
  8. Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jen-Yen Chang, Antonio Tejero-de Pablos, and Tatsuya Harada. 2019. Improved Optical Flow for Gesture-based Human-robot Interaction. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 7983--7989.Google ScholarGoogle Scholar
  10. Rizwan Chaudhry, Avinash Ravichandran, Gregory Hager, and René Vidal. 2009. Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1932--1939.Google ScholarGoogle ScholarCross RefCross Ref
  11. Catherine L Davis and Stephanie Cooper. 2011. Fitness, fatness, cognition, behavior, and academic achievement among overweight children: do cross-sectional associations correspond to exercise trial outcomes? Preventive medicine 52 (2011), S65--S69.Google ScholarGoogle Scholar
  12. Emma E Davis, Nicola J Pitchford, and Ellie Limback. 2011. The interrelation between cognitive and motor development in typically developing children aged 4--11 years is underpinned by visual processing and fine manual control. British Journal of Psychology 102, 3 (2011), 569--584.Google ScholarGoogle ScholarCross RefCross Ref
  13. Milton J Dehn. 2011. Working memory and academic learning: Assessment and intervention. John Wiley & Sons.Google ScholarGoogle Scholar
  14. Adele Diamond. 2013. Executive functions. Annual review of psychology 64 (2013), 135--168.Google ScholarGoogle Scholar
  15. Alex Dillhoff, Konstantinos Tsiakas, Ashwin Ramesh Babu, Mohammad Zakizadehghariehali, Benjamin Buchanan, Morris Bell, Vassilis Athitsos, and Fillia Makedon. 2019. An automated assessment system for embodied cognition in children: from motion data to executive functioning. In Proceedings of the 6th international Workshop on Sensor-based Activity Recognition and Interaction. 1--6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. RMPE: Regional Multi-person Pose Estimation. In ICCV.Google ScholarGoogle Scholar
  17. Annalisa Franco, Antonio Magnani, and Dario Maio. 2020. A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recognition Letters (2020).Google ScholarGoogle Scholar
  18. Harshala Gammulle, Simon Denman, Sridha Sridharan, and Clinton Fookes. 2017. Two stream lstm: A deep fusion framework for human action recognition. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 177--186.Google ScholarGoogle ScholarCross RefCross Ref
  19. Srujana Gattupalli, Ashwin Ramesh Babu, James Robert Brady, Fillia Makedon, and Vassilis Athitsos. 2018. Towards deep learning based hand keypoints detection for rapid sequential movements from rgb images. In Proceedings of the 11th PErvasive Technologies Related to Assistive Environments Conference. 31--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Srujana Gattupalli, Dylan Ebert, Michalis Papakostas, Fillia Makedon, and Vassilis Athitsos. 2017. Cognilearn: A deep learning-based interface for cognitive behavior assessment. In Proceedings of the 22nd International Conference on Intelligent User Interfaces. 577--587.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Alexander Grushin, Derek D Monner, James A Reggia, and Ajay Mishra. 2013. Robust human action recognition via long short-term memory. In The 2013 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  22. Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2017. Learning spatiotemporal features with 3D residual networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3154--3160.Google ScholarGoogle ScholarCross RefCross Ref
  23. Samitha Herath, Mehrtash Harandi, and Fatih Porikli. 2017. Going deeper into action recognition: A survey. Image and vision computing 60 (2017), 4--21.Google ScholarGoogle Scholar
  24. Berthold KP Horn and Brian G Schunck. 1981. Determining optical flow. In Techniques and Applications of Image Understanding, Vol. 281. International Society for Optics and Photonics, 319--331.Google ScholarGoogle ScholarCross RefCross Ref
  25. Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.Google ScholarGoogle ScholarCross RefCross Ref
  26. Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2012. 3D convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence 35, 1 (2012), 221--231.Google ScholarGoogle Scholar
  27. Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas PJJ Noldus, and Remco C Veltkamp. 2019. Egocentric Hand Track and Object-based Human Action Recognition. arXiv preprint arXiv:1905.00742 (2019).Google ScholarGoogle Scholar
  28. Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725--1732.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Muhammed Kocabas, Nikos Athanasiou, and Michael J Black. 2020. VIBE: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5253--5263.Google ScholarGoogle ScholarCross RefCross Ref
  30. Yu Kong and Yun Fu. 2018. Human action recognition and prediction: A survey. arXiv preprint arXiv:1806.11230 (2018).Google ScholarGoogle Scholar
  31. Maria Kyrarini, Quan Zheng, Muhammad Abdul Haseeb, and Axel Gräser. 2019. Robot Learning of Assistive Manipulation Tasks by Demonstration via Head Gesture-based Interface. In 2019 IEEE 16th International Conference on Rehabilitation Robotics (ICORR). IEEE, 1139--1146.Google ScholarGoogle ScholarCross RefCross Ref
  32. Tongwei Lu, Shihui Ai, Yongyuan Jiang, Yudian Xiong, and Feng Min. 2018. Deep Optical Flow Feature Fusion Based on 3D Convolutional Networks for Video Action Recognition. In 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, 1077--1080.Google ScholarGoogle Scholar
  33. Haojie Ma, Wenzhong Li, Xiao Zhang, Songcheng Gao, and Sanglu Lu. 2019. AttnSense: multi-level attention mechanism for multimodal human activity recognition. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 3109--3115.Google ScholarGoogle ScholarCross RefCross Ref
  34. Mehran Maghoumi and Joseph J LaViola Jr. 2019. DeepGRU: Deep gesture recognition utility. In International Symposium on Visual Computing. Springer, 16--31.Google ScholarGoogle ScholarCross RefCross Ref
  35. James W Montgomery, Beula M Magimairaj, and Mianisha C Finney. 2010. Working memory and specific language impairment: An update on the relation and perspectives on assessment and treatment. American journal of speech-language pathology (2010).Google ScholarGoogle Scholar
  36. Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML'10). Omnipress, Madison, WI, USA, 807--814.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  38. Akilesh Rajavenkatanarayanan, Ashwin Ramesh Babu, Konstantinos Tsiakas, and Fillia Makedon. 2018. Monitoring task engagement using facial expressions and body postures. In Proceedings of the 3rd International Workshop on Interactive and Spatial Computing. 103--108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Michalis Raptis and Leonid Sigal. 2013. Poselet key-framing: A model for human activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2650--2657.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google ScholarGoogle Scholar
  41. Bin Ren, Mengyuan Liu, Runwei Ding, and Hong Liu. 2020. A Survey on 3D Skeleton-Based Action Recognition Using Learning Method. arXiv preprint arXiv:2002.05907 (2020).Google ScholarGoogle Scholar
  42. Itsaso Rodríguez-Moreno, José María Martínez-Otzeta, Basilio Sierra, Igor Rodriguez, and Ekaitz Jauregi. 2019. Video Activity Recognition: State-of-the-Art. Sensors 19, 14 (2019), 3160.Google ScholarGoogle ScholarCross RefCross Ref
  43. Laura Sevilla-Lara, Yiyi Liao, Fatma Güney, Varun Jampani, Andreas Geiger, and Michael J Black. 2018. On the integration of optical flow and action recognition. In German Conference on Pattern Recognition. Springer, 281--297.Google ScholarGoogle Scholar
  44. Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1010--1019.Google ScholarGoogle ScholarCross RefCross Ref
  45. Ali Sharifara, Ashwin Ramesh Babu, Akilesh Rajavenkatanarayanan, Christopher Collander, and Fillia Makedon. 2018. A robot-based cognitive assessment model based on visual working memory and attention level. In International Conference on Universal Access in Human-Computer Interaction. Springer, 583--597.Google ScholarGoogle ScholarCross RefCross Ref
  46. Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7912--7921.Google ScholarGoogle ScholarCross RefCross Ref
  47. Riyanto Sigit, Dyah Rahma Kartika, et al. 2016. 3D Sign language translator using optical flow. In 2016 International Electronics Symposium (IES). IEEE, 262--266.Google ScholarGoogle ScholarCross RefCross Ref
  48. Hannah R Snyder. 2013. Major depressive disorder is associated with broad impairments on neuropsychological measures of executive function: a metaanalysis and review. Psychological bulletin 139, 1 (2013), 81.Google ScholarGoogle Scholar
  49. Shuyang Sun, Zhanghui Kuang, Lu Sheng, Wanli Ouyang, and Wei Zhang. 2018. Optical flow guided feature: A fast and robust motion representation for video action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1390--1399.Google ScholarGoogle ScholarCross RefCross Ref
  50. Amin Ullah, Jamil Ahmad, Khan Muhammad, Muhammad Sajjad, and Sung Wook Baik. 2017. Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6 (2017), 1155--1166.Google ScholarGoogle ScholarCross RefCross Ref
  51. Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters 119 (2019), 3--11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Wei Wang, Jinjin Zhang, Chenyang Si, and Liang Wang. 2018. Pose-based twostream relational networks for action recognition in videos. arXiv preprint arXiv:1805.08484 (2018).Google ScholarGoogle Scholar
  53. Erik G Willcutt, Alysa E Doyle, Joel T Nigg, Stephen V Faraone, and Bruce F Pennington. 2005. Validity of the executive function theory of attentiondeficit/hyperactivity disorder: a meta-analytic review. Biological psychiatry 57, 11 (2005), 1336--1346.Google ScholarGoogle Scholar
  54. Mohammad Zaki Zadeh, Ashwin Ramesh Babu, Ashish Jaiswal, and Fillia Makedon. 2020. Self-Supervised Human Activity Recognition by Augmenting Generative Adversarial Networks. arXiv:cs.CV/2008.11755Google ScholarGoogle Scholar
  55. Philip David Zelazo, Jacob E Anderson, Jennifer Richler, Kathleen Wallner-Allen, Jennifer L Beaumont, and Sandra Weintraub. 2013. II. NIH Toolbox Cognition Battery (CB): Measuring executive function and attention. Monographs of the Society for Research in Child Development 78, 4 (2013), 16--33.Google ScholarGoogle ScholarCross RefCross Ref
  56. Hong-Bo Zhang, Yi-Xiang Zhang, Bineng Zhong, Qing Lei, Lijie Yang, Ji-Xiang Du, and Duan-Sheng Chen. 2019. A comprehensive survey of vision-based human action recognition methods. Sensors 19, 5 (2019), 1005.Google ScholarGoogle ScholarCross RefCross Ref
  57. Zhengxia Zou, Zhenwei Shi, Yuhong Guo, and Jieping Ye. 2019. Object detection in 20 years: A survey. arXiv preprint arXiv:1905.05055 (2019).Google ScholarGoogle Scholar

Index Terms

  1. A Multi-modal System to Assess Cognition in Children from their Physical Movements

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
      October 2020
      920 pages
      ISBN:9781450375818
      DOI:10.1145/3382507

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 October 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate453of1,080submissions,42%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader