Skip to main content
Log in

Using Spasmodic Closure Patterns to Simplify Visual Voice Activity Detection

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

While speaking, humans exhibit a number of recognizable patterns; most notably, the repetitive nature of mouth movement from closed to open. The following paper presents a novel method to computationally determine when video data contains a person speaking through the recognition and tally of lip facial closures within a given interval. A combination of Haar-Feature detection and eigenvectors are used to recognize when a target individual is present, but by detecting and quantifying spasmodic lip movements and comparing them to the ranges seen in true positives, we are able to predict when true speech occurs without the need for complex facial mappings. Although the results are within a reasonable accuracy range when compared to current methods, the comprehensibility and simple nature of the approach used can reduce the strenuousness of current techniques and, if paired with synchronous audio recognition methods, can streamline the future of voice activity detection as a whole.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

All datasets used are publicly available: the LFW Image dataset and LiLir lip tracking dataset.

References

  1. Chang J-H, Kim NS, Mitra SK. Voice activity detection based on multiple statistical models. IEEE Trans Signal Process. 2006;54(6):1965–76.

    Article  Google Scholar 

  2. Ghosh PK, Tsiartas A, Narayanan S. Robust voice activity detection using long-term signal variability. IEEE Trans Audio Speech Lang Process. 2010;19(3):600–13.

    Article  Google Scholar 

  3. Ramırez J, Segura JC, Benıtez C, De La Torre A, Rubio A. Efficient voice activity detection algorithms using long-term speech information. Speech Commun. 2004;42(3–4):271–87.

    Article  Google Scholar 

  4. Joosten B, Postma E, Krahmer E. Visual voice activity detection at different speeds. Auditory-Visual Speech Processing (AVSP) 2013, 2013.

  5. Dang K, Sharma S. Review and comparison of face detection algorithms. In: International Conference on Cloud Computing, Data Science & Engineering-Confluence. 2017;7:629–33.

  6. Yang S, Luo P, Loy C-C, Tang X. “Wider face: A face detection benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5525–5533, 2016.

  7. Li H, Lin Z, Shen X, Brandt J, Hua G. A convolutional neural network cascade for face detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5325–5334, 2015.

  8. Caplier A. Lip detection and tracking. In: Proceedings 11th International Conference on Image Analysis and Processing. pp 8–13, 2001.

  9. Wang L, Wang X, Xu J. Lip detection and tracking using variance based haar-like features and kalman filter. In: 2010 Fifth International Conference on Frontier of Computer Science and Technology. pp 608–612, 2010.

  10. Sodoyer D, Rivet B, Girin L, Savariaux C, Schwartz J-L, Jutten C. A study of lip movements during spontaneous dialog and its application to voice activity detection. J Acoust Soc Am. 2009;125(2):1184–96.

    Article  Google Scholar 

  11. Liu Q, Wang W, Jackson P. A visual voice activity detection method with adaboosting, 2011.

  12. Navarathna R, Dean D, Sridharan S, Fookes C, Lucey P. Visual voice activity detection using frontal versus profile views. In: 2011 International Conference on Digital Image Computing: Techniques and Applications, pp 134–139, 2011.

  13. Liu Q, Aubrey AJ, Wang W. Interference reduction in reverberant speech separation with visual voice activity detection. IEEE Trans Multimed. 2014;16(6):1610–23.

    Article  Google Scholar 

  14. Aubrey A, Hicks YA, Chambers J. Visual voice activity detection with optical flow. IET Image Process. 2010;4(6):463–72.

    Article  Google Scholar 

  15. Platt J. Sequential minimal optimization: a fast algorithm for training support vector machines, 1998.

  16. Vikram K, Padmavathi S. Facial parts detection using viola jones algorithm. In: 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS), pp 1–4, IEEE, 2017.

  17. Gupta A, Tiwari R. Face detection using modified viola jones algorithm. Int J Recent Res Math Comput Sci Inform Technol. 2014;1(2):59–66.

    Google Scholar 

  18. Kolsch M, Turk M. Analysis of rotational robustness of hand detection with a viola-jones detector. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol 3, pp 107–110, IEEE, 2004.

  19. Castrillón M, Déniz O, Hernández D, Lorenzo J. A comparison of face and facial feature detectors based on the viola-jones general object detection framework. Mach Vis Appl. 2011;22(3):481–94.

    Google Scholar 

  20. Wang Y-Q. An analysis of the viola-jones face detection algorithm. Image Process Line. 2014;4:128–48.

    Article  Google Scholar 

  21. Jensen OH. Implementing the viola-jones face detection algorithm. Master’s thesis, Technical University of Denmark, DTU, DK-2800 Kgs. Denmark: Lyngby; 2008.

  22. Turk M, Pentland A. Face recognition using eigenfaces. In: Proceedings of 1991 IEEE computer society conference on computer vision and pattern recognition, pp 586–587, 1991.

  23. Yang M.-H, Ahuja N, Kriegman D. Face recognition using kernel eigenfaces. In: Proceedings 2000 International Conference on Image Processing (Cat. No. 00CH37101), vol. 1, pp 37–40, IEEE, 2000.

  24. Barnouti NH, Al-Dabbagh SSM, Matti WE, Naser MAS. Face detection and recognition using viola-jones with PCA-LDA and square Euclidean distance. Int J Adv Comput Sci Appl (IJACSA). 2016;7(5):371–7.

    Google Scholar 

  25. Duda RO, Hart PE, Stork DG. Pattern classification. Amsterdam: Wiley; 2012.

    MATH  Google Scholar 

  26. Kshirsagar V, Baviskar M, Gaikwad M. Face recognition using eigenfaces. In: 2011 3rd International Conference on Computer Research and Development, vol 2, pp 302–306, IEEE, 2011.

Download references

Acknowledgements

I would like to thank the editorial board and review team from the SN Computer Science Research Journal for their kind and constructive feedback. I would also like to thank Professor Jeffery Ullman, Mr. Sudhir Kamath, Mr. Robert Gendron, and Mrs. Katie MacDougall for their continual support throughout my research work.

Funding

No funding was recieved to support and conduct research. Ananth Goyal authored the entirety of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ananth Goyal.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goyal, A. Using Spasmodic Closure Patterns to Simplify Visual Voice Activity Detection. SN COMPUT. SCI. 2, 10 (2021). https://doi.org/10.1007/s42979-020-00395-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-020-00395-6

Keywords

Navigation