skip to main content
10.1145/3536221.3557037acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
demonstration

MIDriveSafely: Multimodal Interaction for Drive Safely

Published: 07 November 2022 Publication History

Abstract

In this paper, we present a novel multimodal interaction application to help car drivers and increase their road safety. MIDriveSafely is a mobile application that provides the following functions: (1) detect dangerous situations based on video information from a smartphone front-facing camera, such as drowsiness/sleepiness, phone usage while driving, eating, smoking, unfastened seat belt, etc.; gives a feedback to the driver (2) provide entertainment (e.g. rock-paper-scissors game, based on automatic speech recognition), (3) provide voice control capabilities to navigation/multimedia systems of a smartphone (potentially vehicle systems such as lighting conditions/climate control). Speech recognition in driving conditions is highly challenging due to acoustic noises, active head turns, pose variations, distance to recording devices, etc. MIDriveSafely incorporates driver's audio-visual speech recognition (DAVIS) system and uses it for multimodal interaction. Along with this, the original DriveSafely system is used for dangerous state detection. MIDriveSafely improves upon existing driver monitoring applications using multimodal (mainly audio-visual) information. MIDriveSafely motivates people to drive in a safer manner by providing the feedback to the drivers and by creating a fun user experience.

References

[1]
Das, Kapotaksha, 2021. Multimodal Detection of Drivers Drowsiness and Distraction. In Proceedings of the 2021 International Conference on Multimodal Interaction (ICMI), 416-424.
[2]
Aftab, Abdul Rafey. 2019. Multimodal Driver Interaction with Gesture, Gaze and Speech. In 2019 International Conference on Multimodal Interaction (ICMI), 487-492.
[3]
Stappen, Lukas, Georgios Rizos, and Björn Schuller. 2020. X-aware: Context-aware human-environment attention fusion for driver gaze prediction in the wild. Proceedings of the 2020 International Conference on Multimodal Interaction (ICMI), 858-867.
[4]
Du, Yulun, 2018. Multimodal polynomial fusion for detecting driver distraction. arXiv preprint arXiv:1810.10565.
[5]
Roider, Florian, 2019. Investigating the effects of modality switches on driver distraction and interaction efficiency in the car. Journal on Multimodal User Interfaces 13.2 (2019), 89-97.
[6]
Mou, Luntian, 2021. Driver stress detection via multimodal fusion using attention-based CNN-LSTM. Expert Systems with Applications 173 (2021), 114693.
[7]
Bylykbashi, Kevin, 2020. Fuzzy-based Driver Monitoring System (FDMS): Implementation of two intelligent FDMSs and a testbed for safe driving in VANETs. Future Generation Computer Systems 105, 2020, 665-674.
[8]
Xie, Yadong, Real-time detection for drowsy driving via acoustic sensing on smartphones. IEEE Transactions on Mobile Computing 20.8, 2020, 2671-2685.
[9]
Kashevnik, Alexey, 2020. Cloud-Based Driver Monitoring System Using a Smartphone. IEEE Sensors, vol. 20(12), 6701–6715.
[10]
Ivanko, Denis, 2022. Multi-Speaker Audio-Visual Corpus RUSAVIC: Russian Audio-Visual Speech in Cars. In Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), 1555-1559.
[11]
Ivanko, Denis, 2022. DAVIS: Driver's Audio-Visual Speech Recognition. Accepted for ISCA Annual Conference Interspeech 2022.

Cited By

View all
  • (2024)A Comprehensive Review of Recent Advances in Deep Neural Networks for Lipreading With Sign Language RecognitionIEEE Access10.1109/ACCESS.2024.346396912(136846-136879)Online publication date: 2024
  • (2024)Audio–visual speech recognition based on regulated transformer and spatio–temporal fusion strategy for driver assistive systemsExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124159252:PAOnline publication date: 24-Jul-2024
  • (2023)Audio-Visual Speech and Gesture Recognition by Sensors of Mobile DevicesSensors10.3390/s2304228423:4(2284)Online publication date: 17-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '22: Proceedings of the 2022 International Conference on Multimodal Interaction
November 2022
830 pages
ISBN:9781450393904
DOI:10.1145/3536221
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2022

Check for updates

Author Tags

  1. Driver monitoring
  2. Mobile multimodal systems
  3. Multimodal interaction

Qualifiers

  • Demonstration
  • Research
  • Refereed limited

Conference

ICMI '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)5
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Comprehensive Review of Recent Advances in Deep Neural Networks for Lipreading With Sign Language RecognitionIEEE Access10.1109/ACCESS.2024.346396912(136846-136879)Online publication date: 2024
  • (2024)Audio–visual speech recognition based on regulated transformer and spatio–temporal fusion strategy for driver assistive systemsExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124159252:PAOnline publication date: 24-Jul-2024
  • (2023)Audio-Visual Speech and Gesture Recognition by Sensors of Mobile DevicesSensors10.3390/s2304228423:4(2284)Online publication date: 17-Feb-2023
  • (2023)A Review of Recent Advances on Deep Learning Methods for Audio-Visual Speech RecognitionMathematics10.3390/math1112266511:12(2665)Online publication date: 12-Jun-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media