ABSTRACT
The paper is focused on the development of hybrid transformer architectures for the detection of risk events on multimodal data recorded on a person with visual and signal sensors. The proposed two-stream architecture consists of a visual transformer and linear transformer of time series. The linear transformer is benchmarked on the publicly available dataset UCI-HAR. The experiments with our architecture have been conducted on the in-the-wild dataset BIRDS. The hybrid transformer architecture has better empirical performance than the 3D CNNs and RNNs in previous work. The accuracy of detection of risk situations shows an improvement of 10% over the single-stream transformers.
Supplemental Material
- D. Anguita, A. Ghio, L. Oneto, X. Parra, and Jorge Luis Reyes-Ortiz. 2013. A Public Domain Dataset for Human Activity Recognition using Smartphones. In ESANN.Google Scholar
- Mirza Mansoor Baig, Shereen Afifi, Hamid GholamHosseini, and Farhaan Mirza. 2019. A Systematic Review of Wearable Sensors and IoT-Based Monitoring Applications for Older Adults--a Focus on Ageing Population and Independent Living. Journal of medical systems 43, 8 (2019), 233.Google ScholarDigital Library
- Susanne Boll, Jeannie S. Lee, Jochen Meyer, Nitish Nag, and Noel E. O'Connor. 2019. HealthMedia'19: 4th International Workshop on Multimedia for Personal Health and Health Care. In ACM Multimedia. ACM, 2720--2721.Google Scholar
- Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, December 2014.Google Scholar
- Cathal Gurrin, Klaus Schoeffmann, Hideo Joho, Andreas Leibetseder, Liting Zhou, Aaron Duane, Duc-Tien Dang-Nguyen, Michael Riegler, Luca Piras, Minh-Triet Tran, Jakub Lokoc, and Wolfgang Huerst. 2019. [Invited papers] Comparing Approaches to Interactive Lifelog Search at the Lifelog Search Challenge (LSC2018). ITE Transactions on Media Technology and Applications 7, 2 (2019), 46--59. https://doi.org/10.3169/mta.7.46Google ScholarCross Ref
- Cathal Gurrin, Klaus Schoeffmann, Hideo Joho, and Bernd Munzer. 2019. A Test Collection for Interactive Lifelog Retrieval. In MMM 2019, the 25th International Conference on MultiMedia Modeling. Thessaloniki, Greece.Google ScholarCross Ref
- Lisa Anne Hendricks, John Mellor, Rosalia Schneider, Jean-Baptiste Alayrac, and Aida Nematzadeh. 2021. Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers. Transactions of the Association for Computational Linguistics 9 (07 2021), 570--585. https://doi.org/10.1162/tacl_a_00385 arXiv: https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00385/1929720/tacl_a_00385.pdfGoogle Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
- Carlos Fernando Crispim Junior, Vincent Buso, Konstantinos Avgerinakis, Georgios Meditskos, Alexia Briassouli, Jenny Benois-Pineau, Ioannis Kompatsiaris, and François Brémond. 2016. Semantic Event Fusion of Different Visual Modality Concepts for Activity Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38, 8 (2016), 1598--1611.Google ScholarDigital Library
- Xiangyu Z. Shaoqing R. Kaiming, H. and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. IEEE Computer Society, 770--778.Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster). http://arxiv.org/abs/1412.6980Google Scholar
- Stephen R Lord, Hylton B Menz, and Catherine Sherrington. 2006. Home environment risk factors for falls in older people and the efficacy of home modifications. Age and ageing 35, suppl_2 (2006), ii55--ii59.Google Scholar
- Rupayan Mallick, Thinhinane Yebda, Jenny Benois-Pineau, Akka Zemmari, Marion Pech, and Hélène Amieva. 2021. A GRU Neural Network with attention mechanism for detection of risk situations on multimodal lifelog data. In CBMI. IEEE, 1--6.Google Scholar
- Rupayan Mallick, Thinhinane Yebda, Jenny Benois-Pineau, Akka Zemmari, Marion Pech, and Helene Amieva. 2022. Detection of Risky Situations for Frail Adults with Hybrid Neural Networks on Multimodal Health Data. IEEE MultiMedia (2022), 1--1. https://doi.org/10.1109/MMUL.2022.3147381Google ScholarCross Ref
- Tasnim M. Newaz N. Kaiser M. Shamim Nahiduzzaman, Md and Mufti Mahmud. 2020. Machine learning based early fall detection for elderly people with neurological disorder using multimodal data fusion. In International Conference on Brain Informatics. Springer, 204--214.Google ScholarDigital Library
- Tomislav Pozaic, Ulrich Lindemann, Anna-Karina Grebe, and Wilhelm Stork. 2016. Sit-to-stand transition reveals acute fall risk in activities of daily living. IEEE journal of translational engineering in health and medicine 4 (2016), 1--11.Google ScholarCross Ref
- Madian Khabsa Han Fang Hao Ma Sinong Wang, Belinda Z. Li. 2020. Lin- former: Self-Attention with Linear Complexity. CoRR abs/2006.04768 (2020). arXiv:2006.04768 https://arxiv.org/abs/2006.04768Google Scholar
- Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, and Ashish Vaswani. 2021. Bottleneck Transformers for Visual Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16519--16529.Google ScholarCross Ref
- Thanos G Stavropoulos, Asterios Papastergiou, Lampros Mpaltadoros, Spiros Nikolopoulos, and Ioannis Kompatsiaris. 2020. IoT wearable sensors and devices in elderly care: a literature review. Sensors 20, 10 (2020), 2826.Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008.Google Scholar
- Bian J. Hogan W.R. Wu Y. Yang, X. 2010. Clinical concept extraction using transformers. Jama 303, 3 (2010), 258--266.Google Scholar
- Thinhinane Yebda, Jenny Benois-Pineau, Marion Pech, Hélène Amièva, and Cathal Gurrin. 2020. Detection of Semantic Risk Situations in Lifelog Data for Improving Life of Frail People. In ICMR. ACM, 402--406.Google Scholar
- Thinhinane Yebda, Jenny Benois-Pineau, Marion Pech, Hélène Amieva, Laura Middleton, and Max Bergelt. 2021. Multimodal Sensor Data Analysis for Detection of Risk Situations of Fragile People in @home Environments. In MMM (2) (Lecture Notes in Computer Science, Vol. 12573). Springer, 342--353.Google Scholar
Recommendations
IFI: Interpreting for Improving: A Multimodal Transformer with an Interpretability Technique for Recognition of Risk Events
MultiMedia ModelingAbstractMethods of Explainable AI (XAI) are popular for understanding the features and decisions of neural networks. Transformers used for single modalities such as videos, texts, or signals as well as multi-modal data can be considered as a state-of-the-...
Partial Discharge Detection of Transformer Winding
AIAM2021: 2021 3rd International Conference on Artificial Intelligence and Advanced ManufactureTransformers are irreplaceable in the power system. However, partial discharge may be caused due to the defects of the transformer itself and the deterioration of the insulation, including winding short-circuit, core overvoltage and overcurrent. These ...
A Transformer Architecture for Stress Detection from ECG
ISWC '21: Proceedings of the 2021 ACM International Symposium on Wearable ComputersElectrocardiogram (ECG) has been widely used for emotion recognition. This paper presents a deep neural network based on convolutional layers and a transformer mechanism to detect stress using ECG signals. We perform leave-one-subject-out experiments ...
Comments