ABSTRACT
Image captioning is a task that involves generating natural language descriptions of the content of an image, and has the potential to support healthcare providers in monitoring patient conditions and routines at home. The ability to remotely monitor patients can provide valuable information to healthcare providers, allowing them to identify changes in patient behavior and facilitate timely interventions. In this study, we examine the usability of transformer neural networks for image caption generation from surveillance camera footage taken at regular intervals of one minute. Our objective is to develop and evaluate a transformer neural network model for generating captions of patient behavior, trained and evaluated on the Common Objects in Context (COCO) dataset. Our study provides a proof-of-concept for the potential of transformer neural networks in image captioning for remote monitoring of patient behavior. By generating natural language descriptions of patient behavior, healthcare providers can obtain valuable insights into patient routines and conditions, allowing them to monitor patients remotely and identify changes in behavior that may require intervention. Furthermore, our study highlights the potential for transformer neural networks to support healthcare providers in identifying patterns and trends in patient behavior over time.
- Haleem, A., Javaid, M., Singh, R. P., & Suman, R. (2021). Telemedicine for healthcare: Capabilities, features, barriers, and applications. Sensors international, 2, 100117. https://doi.org/10.1016/j.sintl.2021.100117Google ScholarCross Ref
- I. Hrga, M. Ivašić-Kos (2019). Deep Image Captioning: An Overview. In the proceedings the 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).Google Scholar
- de Miguel, K., Brunete, A., Hernando, M., & Gambao, E. (2017). Home Camera-Based Fall Detection System for the Elderly. Sensors (Basel, Switzerland), 17(12), 2864. https://doi.org/10.3390/s17122864Google ScholarCross Ref
- Issam B., Xiaojun Z. (2022). Wearable sensors and machine learning in post-stroke rehabilitation assessment: A systematic review. Biomedical Signal Processing and Control, Volume 71, Part B, 103197. https://doi.org/10.1016/j.bspc.2021.103197Google ScholarCross Ref
- Alugubelli, N., Abuissa, H., & Roka, A. (2022). Wearable Devices for Remote Monitoring of Heart Rate and Heart Rate Variability—What We Know and What Is Coming. Sensors, 22(22), 8903. MDPI AG. Retrieved from http://dx.doi.org/10.3390/s22228903Google ScholarCross Ref
- Vandermi S., Vinicius S. Souzaa, Robson G. da Cruzb, (2018). MobiHealth: a System to Improve Medication Adherence in Hypertensive Patients. In the proceedings of the 8th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare (ICTH2018). Procedia Computer Science 141 (2018) 366–373Google Scholar
- Javanmardi, S., Latif, A., Sadeghi, M., Jahanbanifard, M., Bonsangue, M., & Verbeek, F. (2022). Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network. Sensors, 22(21), 8376. MDPI AG. Retrieved from http://dx.doi.org/10.3390/s22218376Google ScholarCross Ref
- Iwamura, K., Louhi Kasahara, J. Y., Moro, A., Yamashita, A., & Asama, H. (2021). Image Captioning Using Motion-CNN with Object Detection. Sensors, 21(4), 1270. MDPI AG. Retrieved from http://dx.doi.org/10.3390/s21041270Google ScholarCross Ref
- Lu, K.-L., & Chu, E. (2018). An Image-Based Fall Detection System for the Elderly. Applied Sciences, 8(10), 1995. https://doi.org/10.3390/app8101995Google ScholarCross Ref
- Beddiar, D. R., Oussalah, M., & Seppänen, T. (2022). Automatic captioning for medical imaging (MIC): a rapid review of literature. Artificial intelligence review, 1–58. Advance online publication. https://doi.org/10.1007/s10462-022-10270-wGoogle ScholarDigital Library
- Marefat, A., Marefat, M., Hassannataj Joloudari, J., Nematollahi, M. A., & Lashgari, R. (2023). CCTCOVID: COVID-19 detection from chest X-ray images using Compact Convolutional Transformers. Frontiers in public health, 11, 1025746. https://doi.org/10.3389/fpubh.2023.1025746Google ScholarCross Ref
- Kelei He, Chen Gan, Zhuoyuan Li, Islem Rekik, Zihao Yin, Wen Ji, Yang Gao, Qian Wang, Junfeng Zhang, Dinggang Shen (2023). Transformers in medical image analysis, Intelligent Medicine, Volume 3, Issue 1, Pages 59-78, ISSN 2667-1026, https://doi.org/10.1016/j.imed.2022.07.002.Google ScholarCross Ref
- Kiran M., Surajit M., Bhushankumar N. (2022). A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings. https://doi.org/10.1016/j.gltp.2022.04.020Google ScholarCross Ref
- Yamashita, R., Nishio, M., Do, R.K.G. et a. Convolutional neural networks: an overview and application in radiology. Insights Imaging 9, 611–629 (2018). https://doi.org/10.1007/s13244-018-0639-9Google ScholarCross Ref
- Rashid khana , M Shujah Islama , Khadija Kanwal, (). A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism. https://arxiv.org/ftp/arxiv/papers/2203/2203.01594.pdfGoogle Scholar
- Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár (2014). Microsoft COCO: Common Objects in Context. arXiv:1405.0312. https://doi.org/10.48550/arXiv.1405.0312Google ScholarCross Ref
- Ashish Vaswani, et. al. Attention Is All You Need. arXiv:1706.03762 [cs.CL], 2017, https://doi.org/10.48550/arXiv.1706.03762Google ScholarCross Ref
- Alexey Dosovitskiy, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2021, arXiv:2010.11929 [cs.CV], https://doi.org/10.48550/arXiv.2010.11929Google ScholarCross Ref
- Jason Brownlee (2017). A Gentle Introduction to Calculating the BLEU Score for Text in Python. Deep Learning for Natural Language Processing.Google Scholar
- Satanjeev Banerjee, Alon Lavie (2005). METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. Carnegie Mellon University, School of Computer Science Journal.Google Scholar
Index Terms
- Patient smart home monitoring using vision neural network transformers
Recommendations
BP Neural Network for Human Activity Recognition in Smart Home
CSSS '12: Proceedings of the 2012 International Conference on Computer Science and Service SystemIn this paper, BP (back propagation) neural network is applied to represent and recognize human activities from observed sensor sequences. The proper features for activity recognition are selected. Then, the model of BP neural network for activity ...
Non-conventional Transformers Cost Estimation Using Neural Network
ICFPEE '10: Proceedings of the 2010 International Conference on Future Power and Energy EngineeringSince the cost of transformer can be divided into 50-60% for material, and the rest being labor costs and modest profit, therefore as the major amount of transformers costs is related to its raw materials, so it has a high importance in costs estimating ...
New Results for Prediction of Chaotic Systems Using Deep Recurrent Neural Networks
AbstractPrediction of nonlinear and dynamic systems is a challenging task, however with the aid of machine learning techniques, particularly neural networks, is now possible to accomplish this objective. Most common neural networks used are the multilayer ...
Comments