ABSTRACT
In recent years, computer and game-based cognitive tests have become popular with the advancement in mobile technology. However, these tests require very little body movements and do not consider the influence that physical motion has on cognitive development. Our work mainly focus on assessing cognition in children through their physical movements. Hence, an assessment test "Ball-Drop-to-the-Beat" that is both physically and cognitively demanding has been used where the child is expected to perform certain actions based on the commands. The task is specifically designed to measure attention, response inhibition, and coordination in children. A dataset has been created with 25 children performing this test. To automate the scoring, a computer vision-based assessment system has been developed. The vision system employs an attention-based fusion mechanism to combine multiple modalities such as optical flow, human poses, and objects in the scene to predict a child's action. The proposed method outperforms other state-of-the-art approaches by achieving an average accuracy of 89.8 percent on predicting the actions and an average accuracy of 88.5 percent on predicting the rhythm on the Ball-Drop-to-the-Beat dataset.
Supplemental Material
- Ashwin Ramesh Babu, Akilesh Rajavenkatanarayanan, James Robert Brady, and Fillia Makedon. 2018. Multimodal approach for cognitive task performance prediction from body postures, facial expressions and EEG signal. In Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data. 1--7.Google ScholarDigital Library
- Ashwin Ramesh Babu, Mohammad Zakizadeh, James Robert Brady, Diane Calderon, and Fillia Makedon. 2019. An Intelligent Action Recognition System to assess Cognitive Behavior for Executive Function Disorder. In 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE). IEEE, 164--169.Google ScholarCross Ref
- Vinay Bettadapura, Grant Schindler, Thomas Plötz, and Irfan Essa. 2013. Augmenting bag-of-words: Data-driven discovery of temporal and structural information for activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2619--2626.Google ScholarDigital Library
- Matteo Bregonzio, Shaogang Gong, and Tao Xiang. 2009. Recognising action as clouds of space-time interest points. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 1948--1955.Google ScholarCross Ref
- Thomas Brox, Andrés Bruhn, Nils Papenberg, and Joachim Weickert. 2004. High accuracy optical flow estimation based on a theory for warping. In European conference on computer vision. Springer, 25--36.Google ScholarCross Ref
- Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008 (2018).Google Scholar
- Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh. 2019. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).Google Scholar
- Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.Google ScholarCross Ref
- Jen-Yen Chang, Antonio Tejero-de Pablos, and Tatsuya Harada. 2019. Improved Optical Flow for Gesture-based Human-robot Interaction. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 7983--7989.Google Scholar
- Rizwan Chaudhry, Avinash Ravichandran, Gregory Hager, and René Vidal. 2009. Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1932--1939.Google ScholarCross Ref
- Catherine L Davis and Stephanie Cooper. 2011. Fitness, fatness, cognition, behavior, and academic achievement among overweight children: do cross-sectional associations correspond to exercise trial outcomes? Preventive medicine 52 (2011), S65--S69.Google Scholar
- Emma E Davis, Nicola J Pitchford, and Ellie Limback. 2011. The interrelation between cognitive and motor development in typically developing children aged 4--11 years is underpinned by visual processing and fine manual control. British Journal of Psychology 102, 3 (2011), 569--584.Google ScholarCross Ref
- Milton J Dehn. 2011. Working memory and academic learning: Assessment and intervention. John Wiley & Sons.Google Scholar
- Adele Diamond. 2013. Executive functions. Annual review of psychology 64 (2013), 135--168.Google Scholar
- Alex Dillhoff, Konstantinos Tsiakas, Ashwin Ramesh Babu, Mohammad Zakizadehghariehali, Benjamin Buchanan, Morris Bell, Vassilis Athitsos, and Fillia Makedon. 2019. An automated assessment system for embodied cognition in children: from motion data to executive functioning. In Proceedings of the 6th international Workshop on Sensor-based Activity Recognition and Interaction. 1--6.Google ScholarDigital Library
- Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. RMPE: Regional Multi-person Pose Estimation. In ICCV.Google Scholar
- Annalisa Franco, Antonio Magnani, and Dario Maio. 2020. A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recognition Letters (2020).Google Scholar
- Harshala Gammulle, Simon Denman, Sridha Sridharan, and Clinton Fookes. 2017. Two stream lstm: A deep fusion framework for human action recognition. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 177--186.Google ScholarCross Ref
- Srujana Gattupalli, Ashwin Ramesh Babu, James Robert Brady, Fillia Makedon, and Vassilis Athitsos. 2018. Towards deep learning based hand keypoints detection for rapid sequential movements from rgb images. In Proceedings of the 11th PErvasive Technologies Related to Assistive Environments Conference. 31--37.Google ScholarDigital Library
- Srujana Gattupalli, Dylan Ebert, Michalis Papakostas, Fillia Makedon, and Vassilis Athitsos. 2017. Cognilearn: A deep learning-based interface for cognitive behavior assessment. In Proceedings of the 22nd International Conference on Intelligent User Interfaces. 577--587.Google ScholarDigital Library
- Alexander Grushin, Derek D Monner, James A Reggia, and Ajay Mishra. 2013. Robust human action recognition via long short-term memory. In The 2013 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.Google ScholarCross Ref
- Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2017. Learning spatiotemporal features with 3D residual networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3154--3160.Google ScholarCross Ref
- Samitha Herath, Mehrtash Harandi, and Fatih Porikli. 2017. Going deeper into action recognition: A survey. Image and vision computing 60 (2017), 4--21.Google Scholar
- Berthold KP Horn and Brian G Schunck. 1981. Determining optical flow. In Techniques and Applications of Image Understanding, Vol. 281. International Society for Optics and Photonics, 319--331.Google ScholarCross Ref
- Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.Google ScholarCross Ref
- Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2012. 3D convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence 35, 1 (2012), 221--231.Google Scholar
- Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas PJJ Noldus, and Remco C Veltkamp. 2019. Egocentric Hand Track and Object-based Human Action Recognition. arXiv preprint arXiv:1905.00742 (2019).Google Scholar
- Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725--1732.Google ScholarDigital Library
- Muhammed Kocabas, Nikos Athanasiou, and Michael J Black. 2020. VIBE: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5253--5263.Google ScholarCross Ref
- Yu Kong and Yun Fu. 2018. Human action recognition and prediction: A survey. arXiv preprint arXiv:1806.11230 (2018).Google Scholar
- Maria Kyrarini, Quan Zheng, Muhammad Abdul Haseeb, and Axel Gräser. 2019. Robot Learning of Assistive Manipulation Tasks by Demonstration via Head Gesture-based Interface. In 2019 IEEE 16th International Conference on Rehabilitation Robotics (ICORR). IEEE, 1139--1146.Google ScholarCross Ref
- Tongwei Lu, Shihui Ai, Yongyuan Jiang, Yudian Xiong, and Feng Min. 2018. Deep Optical Flow Feature Fusion Based on 3D Convolutional Networks for Video Action Recognition. In 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, 1077--1080.Google Scholar
- Haojie Ma, Wenzhong Li, Xiao Zhang, Songcheng Gao, and Sanglu Lu. 2019. AttnSense: multi-level attention mechanism for multimodal human activity recognition. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 3109--3115.Google ScholarCross Ref
- Mehran Maghoumi and Joseph J LaViola Jr. 2019. DeepGRU: Deep gesture recognition utility. In International Symposium on Visual Computing. Springer, 16--31.Google ScholarCross Ref
- James W Montgomery, Beula M Magimairaj, and Mianisha C Finney. 2010. Working memory and specific language impairment: An update on the relation and perspectives on assessment and treatment. American journal of speech-language pathology (2010).Google Scholar
- Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML'10). Omnipress, Madison, WI, USA, 807--814.Google ScholarDigital Library
- Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Akilesh Rajavenkatanarayanan, Ashwin Ramesh Babu, Konstantinos Tsiakas, and Fillia Makedon. 2018. Monitoring task engagement using facial expressions and body postures. In Proceedings of the 3rd International Workshop on Interactive and Spatial Computing. 103--108.Google ScholarDigital Library
- Michalis Raptis and Leonid Sigal. 2013. Poselet key-framing: A model for human activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2650--2657.Google ScholarDigital Library
- Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google Scholar
- Bin Ren, Mengyuan Liu, Runwei Ding, and Hong Liu. 2020. A Survey on 3D Skeleton-Based Action Recognition Using Learning Method. arXiv preprint arXiv:2002.05907 (2020).Google Scholar
- Itsaso Rodríguez-Moreno, José María Martínez-Otzeta, Basilio Sierra, Igor Rodriguez, and Ekaitz Jauregi. 2019. Video Activity Recognition: State-of-the-Art. Sensors 19, 14 (2019), 3160.Google ScholarCross Ref
- Laura Sevilla-Lara, Yiyi Liao, Fatma Güney, Varun Jampani, Andreas Geiger, and Michael J Black. 2018. On the integration of optical flow and action recognition. In German Conference on Pattern Recognition. Springer, 281--297.Google Scholar
- Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1010--1019.Google ScholarCross Ref
- Ali Sharifara, Ashwin Ramesh Babu, Akilesh Rajavenkatanarayanan, Christopher Collander, and Fillia Makedon. 2018. A robot-based cognitive assessment model based on visual working memory and attention level. In International Conference on Universal Access in Human-Computer Interaction. Springer, 583--597.Google ScholarCross Ref
- Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7912--7921.Google ScholarCross Ref
- Riyanto Sigit, Dyah Rahma Kartika, et al. 2016. 3D Sign language translator using optical flow. In 2016 International Electronics Symposium (IES). IEEE, 262--266.Google ScholarCross Ref
- Hannah R Snyder. 2013. Major depressive disorder is associated with broad impairments on neuropsychological measures of executive function: a metaanalysis and review. Psychological bulletin 139, 1 (2013), 81.Google Scholar
- Shuyang Sun, Zhanghui Kuang, Lu Sheng, Wanli Ouyang, and Wei Zhang. 2018. Optical flow guided feature: A fast and robust motion representation for video action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1390--1399.Google ScholarCross Ref
- Amin Ullah, Jamil Ahmad, Khan Muhammad, Muhammad Sajjad, and Sung Wook Baik. 2017. Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6 (2017), 1155--1166.Google ScholarCross Ref
- Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters 119 (2019), 3--11.Google ScholarDigital Library
- Wei Wang, Jinjin Zhang, Chenyang Si, and Liang Wang. 2018. Pose-based twostream relational networks for action recognition in videos. arXiv preprint arXiv:1805.08484 (2018).Google Scholar
- Erik G Willcutt, Alysa E Doyle, Joel T Nigg, Stephen V Faraone, and Bruce F Pennington. 2005. Validity of the executive function theory of attentiondeficit/hyperactivity disorder: a meta-analytic review. Biological psychiatry 57, 11 (2005), 1336--1346.Google Scholar
- Mohammad Zaki Zadeh, Ashwin Ramesh Babu, Ashish Jaiswal, and Fillia Makedon. 2020. Self-Supervised Human Activity Recognition by Augmenting Generative Adversarial Networks. arXiv:cs.CV/2008.11755Google Scholar
- Philip David Zelazo, Jacob E Anderson, Jennifer Richler, Kathleen Wallner-Allen, Jennifer L Beaumont, and Sandra Weintraub. 2013. II. NIH Toolbox Cognition Battery (CB): Measuring executive function and attention. Monographs of the Society for Research in Child Development 78, 4 (2013), 16--33.Google ScholarCross Ref
- Hong-Bo Zhang, Yi-Xiang Zhang, Bineng Zhong, Qing Lei, Lijie Yang, Ji-Xiang Du, and Duan-Sheng Chen. 2019. A comprehensive survey of vision-based human action recognition methods. Sensors 19, 5 (2019), 1005.Google ScholarCross Ref
- Zhengxia Zou, Zhenwei Shi, Yuhong Guo, and Jieping Ye. 2019. Object detection in 20 years: A survey. arXiv preprint arXiv:1905.05055 (2019).Google Scholar
Index Terms
- A Multi-modal System to Assess Cognition in Children from their Physical Movements
Recommendations
An automated assessment system for embodied cognition in children: from motion data to executive functioning
iWOAR '19: Proceedings of the 6th International Workshop on Sensor-based Activity Recognition and InteractionWe present our preliminary data analysis towards an automated assessment system for the Activate Test for Embodied Cognition (ATEC), a test which measures cognitive skills through physical activity. More specifically, we present two core ATEC tasks ...
SmartFunction: An Immersive Vr System To Assess Attention Using Embodied Cognition
PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive EnvironmentsIn traditional neuropsychological tests, executive functions (EFs) are typically evaluated using paper and pencil or computer-based sit-down tasks. However, a new assessment framework, the Automated Test of Embodied Cognition (ATEC), has been developed ...
Multi-modal fusion network with intra- and inter-modality attention for prognosis prediction in breast cancer
AbstractAccurate breast cancer prognosis prediction can help clinicians to develop appropriate treatment plans and improve life quality for patients. Recent prognostic prediction studies suggest that fusing multi-modal data, e.g., genomic data and ...
Highlights- Multi-modal network with Intra- and Inter-modality attention for breast cancer prognosis prediction.
- Intra-SA modules mine modality-specific relations without bringing in the huge expansion of feature dimensions.
- Inter-CA module ...
Comments