Skip to main content

Towards Lip Motion Based Speaking Mode Detection Using Residual Neural Networks

  • Conference paper
  • First Online:
Proceedings of the 12th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2020) (SoCPaR 2020)

Abstract

Speaking mode (i.e., talking or non-talking) detection is a significant research problem in the area of HCI and computer vision. Detecting the speaking mode of a speaker is quite challenging owing to the coarse-resolution images, interaction styles, and various noises. This paper proposes a vision-based technique to identify a human’s speaking mode in terms of talking and non-talking state by using the residual neural networks. Visual lip motions is a prominent cue and play a pivotal role in detecting the speaking mode of the human. Thus, we consider the vision-based technique rather voice-based, which is noisy or interrupted. The evaluation result in two datasets shows better performance (of \(99.56\%\) accuracy) in mouth state detection than previous approaches. Moreover, analysis with 36 min long video data of 15 participants reveals that the proposed technique acquired an accuracy of \(98.88\%\) in detecting speaking mode.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.youtube.com/watch?v=u0zP9TPMfNg&t=19s.

References

  1. Zhou, Z., Zhao, G., Hong, X., Pietikainen, N.: A review of recent advances in visual speech decoding. Image Vis. Comput. 32(9), 590–605 (2014)

    Article  Google Scholar 

  2. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778. Las Vegas (2016)

    Google Scholar 

  3. Abtahi, S., Omidyeganeh, M., Shirmohammadi, S., Hariri, B.: YawDD: a yawning detection dataset. In: Proceedings of the 5th ACM Multimedia Systems Conference, pp. 24-28, New York, USA (2014)

    Google Scholar 

  4. Bendris, M., Charlet, D., Chollet, G.: Lip activity detection for talking faces classification in TV-content. In: The 3rd International Conference on Machine Vision (2010)

    Google Scholar 

  5. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Computer Vision and Pattern Recognition, April 2001

    Google Scholar 

  6. Milborrow, S., Nicolls, F.: Locating facial features with an extended active shape model, pp. 504–513 (2008)

    Google Scholar 

  7. Everingham, M., Sivic, J., Zisserman, A.: Hello! my name is... buffy – automatic naming of characters in TV video. In: Proceedings of the British Machine Vision Conference (2006)

    Google Scholar 

  8. Ji, Y., et al.: Fatigue state detection based on multi-index fusion and state recognition network. IEEE Access 7, 64136–64147 (2019)

    Google Scholar 

  9. Hossain, M.R., Afroze, S., Hoque, M.M., Siddique, N.: Automatic detection of eye cataract using deep convolution neural networks (DCNNs). In: 2020 IEEE Region 10 Symposium (TENSYMP), pp. 1333-1338, June 2020

    Google Scholar 

  10. Jia, L.N., et al.: Smartphone-based fatigue detection system using progressive locating method. Inst. Eng. Technol. 10(3), 148–156 (2016)

    Google Scholar 

  11. Punitha, A., Geetha, M.K., Sivaprakash, A.: Driver fatigue monitoring system based on eye state analysis. In: International Conference on Circuits, Power and Computing Technologies, pp. 1405–1408. IEEE, India (2014)

    Google Scholar 

  12. Du, C., Gao, S.: Image segmentation-based multi-focus image fusion through multi-scale convolutional neural network. IEEE Access 5(99), 15750–15761 (2017)

    Article  Google Scholar 

  13. Huang, H.Y., Lin, Y.C.: An efficient mouth detection based on face localization and edge projection. Int. J. Comput. Theor. Eng. 5(3), 514 (2013)

    Google Scholar 

  14. Afroze, S., Hoque, M.M.: Talking vs Non-Talking: a vision based approach to detect human speaking mode. In: 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE). Cox’sBazar, Bangladesh (2019)

    Google Scholar 

  15. Chang, C.C., Lin, C.J.: LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (2011)

    Google Scholar 

  16. Zhang, K.P., Zhang, Z.P., Li, Z.F., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Sig. Process. Lett. 23(10), 1499–1503 (2016)

    Article  Google Scholar 

  17. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. Int. Conf. Mach. Learn. 37, 448–456 (2015)

    Google Scholar 

  18. Chollet, F.: Keras (2015). https://github.com/keras-team/keras

  19. Mandal, B., et al.: Towards detection of bus driver fatigue based on robust visual analysis of eye state. IEEE Trans. Intell. Transp. Syst. 18(3), 545–557 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Moshiul Hoque .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Afroze, S., Hoque, M.M. (2021). Towards Lip Motion Based Speaking Mode Detection Using Residual Neural Networks. In: Abraham, A., et al. Proceedings of the 12th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2020). SoCPaR 2020. Advances in Intelligent Systems and Computing, vol 1383. Springer, Cham. https://doi.org/10.1007/978-3-030-73689-7_17

Download citation

Publish with us

Policies and ethics