Using Spasmodic Closure Patterns to Simplify Visual Voice Activity Detection

Goyal, Ananth

doi:10.1007/s42979-020-00395-6

Using Spasmodic Closure Patterns to Simplify Visual Voice Activity Detection

Original Research
Published: 24 November 2020

Volume 2, article number 10, (2021)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Ananth Goyal ORCID: orcid.org/0000-0002-5860-8946¹

857 Accesses
1 Citation
Explore all metrics

Abstract

While speaking, humans exhibit a number of recognizable patterns; most notably, the repetitive nature of mouth movement from closed to open. The following paper presents a novel method to computationally determine when video data contains a person speaking through the recognition and tally of lip facial closures within a given interval. A combination of Haar-Feature detection and eigenvectors are used to recognize when a target individual is present, but by detecting and quantifying spasmodic lip movements and comparing them to the ranges seen in true positives, we are able to predict when true speech occurs without the need for complex facial mappings. Although the results are within a reasonable accuracy range when compared to current methods, the comprehensibility and simple nature of the approach used can reduce the strenuousness of current techniques and, if paired with synchronous audio recognition methods, can streamline the future of voice activity detection as a whole.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Voice activity detection based on facial movement

Article Open access 22 July 2015

Voice Fatigue Evaluation: A Comparison of Singing and Speech

Spontaneous Facial Expression Analysis Using Optical Flow Technique

Data Availability

All datasets used are publicly available: the LFW Image dataset and LiLir lip tracking dataset.

References

Chang J-H, Kim NS, Mitra SK. Voice activity detection based on multiple statistical models. IEEE Trans Signal Process. 2006;54(6):1965–76.
Article Google Scholar
Ghosh PK, Tsiartas A, Narayanan S. Robust voice activity detection using long-term signal variability. IEEE Trans Audio Speech Lang Process. 2010;19(3):600–13.
Article Google Scholar
Ramırez J, Segura JC, Benıtez C, De La Torre A, Rubio A. Efficient voice activity detection algorithms using long-term speech information. Speech Commun. 2004;42(3–4):271–87.
Article Google Scholar
Joosten B, Postma E, Krahmer E. Visual voice activity detection at different speeds. Auditory-Visual Speech Processing (AVSP) 2013, 2013.
Dang K, Sharma S. Review and comparison of face detection algorithms. In: International Conference on Cloud Computing, Data Science & Engineering-Confluence. 2017;7:629–33.
Yang S, Luo P, Loy C-C, Tang X. “Wider face: A face detection benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5525–5533, 2016.
Li H, Lin Z, Shen X, Brandt J, Hua G. A convolutional neural network cascade for face detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5325–5334, 2015.
Caplier A. Lip detection and tracking. In: Proceedings 11th International Conference on Image Analysis and Processing. pp 8–13, 2001.
Wang L, Wang X, Xu J. Lip detection and tracking using variance based haar-like features and kalman filter. In: 2010 Fifth International Conference on Frontier of Computer Science and Technology. pp 608–612, 2010.
Sodoyer D, Rivet B, Girin L, Savariaux C, Schwartz J-L, Jutten C. A study of lip movements during spontaneous dialog and its application to voice activity detection. J Acoust Soc Am. 2009;125(2):1184–96.
Article Google Scholar
Liu Q, Wang W, Jackson P. A visual voice activity detection method with adaboosting, 2011.
Navarathna R, Dean D, Sridharan S, Fookes C, Lucey P. Visual voice activity detection using frontal versus profile views. In: 2011 International Conference on Digital Image Computing: Techniques and Applications, pp 134–139, 2011.
Liu Q, Aubrey AJ, Wang W. Interference reduction in reverberant speech separation with visual voice activity detection. IEEE Trans Multimed. 2014;16(6):1610–23.
Article Google Scholar
Aubrey A, Hicks YA, Chambers J. Visual voice activity detection with optical flow. IET Image Process. 2010;4(6):463–72.
Article Google Scholar
Platt J. Sequential minimal optimization: a fast algorithm for training support vector machines, 1998.
Vikram K, Padmavathi S. Facial parts detection using viola jones algorithm. In: 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS), pp 1–4, IEEE, 2017.
Gupta A, Tiwari R. Face detection using modified viola jones algorithm. Int J Recent Res Math Comput Sci Inform Technol. 2014;1(2):59–66.
Google Scholar
Kolsch M, Turk M. Analysis of rotational robustness of hand detection with a viola-jones detector. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol 3, pp 107–110, IEEE, 2004.
Castrillón M, Déniz O, Hernández D, Lorenzo J. A comparison of face and facial feature detectors based on the viola-jones general object detection framework. Mach Vis Appl. 2011;22(3):481–94.
Google Scholar
Wang Y-Q. An analysis of the viola-jones face detection algorithm. Image Process Line. 2014;4:128–48.
Article Google Scholar
Jensen OH. Implementing the viola-jones face detection algorithm. Master’s thesis, Technical University of Denmark, DTU, DK-2800 Kgs. Denmark: Lyngby; 2008.
Turk M, Pentland A. Face recognition using eigenfaces. In: Proceedings of 1991 IEEE computer society conference on computer vision and pattern recognition, pp 586–587, 1991.
Yang M.-H, Ahuja N, Kriegman D. Face recognition using kernel eigenfaces. In: Proceedings 2000 International Conference on Image Processing (Cat. No. 00CH37101), vol. 1, pp 37–40, IEEE, 2000.
Barnouti NH, Al-Dabbagh SSM, Matti WE, Naser MAS. Face detection and recognition using viola-jones with PCA-LDA and square Euclidean distance. Int J Adv Comput Sci Appl (IJACSA). 2016;7(5):371–7.
Google Scholar
Duda RO, Hart PE, Stork DG. Pattern classification. Amsterdam: Wiley; 2012.
MATH Google Scholar
Kshirsagar V, Baviskar M, Gaikwad M. Face recognition using eigenfaces. In: 2011 3rd International Conference on Computer Research and Development, vol 2, pp 302–306, IEEE, 2011.

Download references

Acknowledgements

I would like to thank the editorial board and review team from the SN Computer Science Research Journal for their kind and constructive feedback. I would also like to thank Professor Jeffery Ullman, Mr. Sudhir Kamath, Mr. Robert Gendron, and Mrs. Katie MacDougall for their continual support throughout my research work.

Funding

No funding was recieved to support and conduct research. Ananth Goyal authored the entirety of this paper.

Author information

Authors and Affiliations

Dougherty Valley High School, San Ramon, CA, USA
Ananth Goyal

Authors

Ananth Goyal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ananth Goyal.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goyal, A. Using Spasmodic Closure Patterns to Simplify Visual Voice Activity Detection. SN COMPUT. SCI. 2, 10 (2021). https://doi.org/10.1007/s42979-020-00395-6

Download citation

Received: 16 June 2020
Accepted: 09 November 2020
Published: 24 November 2020
DOI: https://doi.org/10.1007/s42979-020-00395-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Spasmodic Closure Patterns to Simplify Visual Voice Activity Detection

Abstract

Access this article

Similar content being viewed by others

Voice activity detection based on facial movement

Voice Fatigue Evaluation: A Comparison of Singing and Speech

Spontaneous Facial Expression Analysis Using Optical Flow Technique

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using Spasmodic Closure Patterns to Simplify Visual Voice Activity Detection

Abstract

Access this article

Similar content being viewed by others

Voice activity detection based on facial movement

Voice Fatigue Evaluation: A Comparison of Singing and Speech

Spontaneous Facial Expression Analysis Using Optical Flow Technique

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation