Abstract
While speaking, humans exhibit a number of recognizable patterns; most notably, the repetitive nature of mouth movement from closed to open. The following paper presents a novel method to computationally determine when video data contains a person speaking through the recognition and tally of lip facial closures within a given interval. A combination of Haar-Feature detection and eigenvectors are used to recognize when a target individual is present, but by detecting and quantifying spasmodic lip movements and comparing them to the ranges seen in true positives, we are able to predict when true speech occurs without the need for complex facial mappings. Although the results are within a reasonable accuracy range when compared to current methods, the comprehensibility and simple nature of the approach used can reduce the strenuousness of current techniques and, if paired with synchronous audio recognition methods, can streamline the future of voice activity detection as a whole.
Similar content being viewed by others
Data Availability
All datasets used are publicly available: the LFW Image dataset and LiLir lip tracking dataset.
References
Chang J-H, Kim NS, Mitra SK. Voice activity detection based on multiple statistical models. IEEE Trans Signal Process. 2006;54(6):1965–76.
Ghosh PK, Tsiartas A, Narayanan S. Robust voice activity detection using long-term signal variability. IEEE Trans Audio Speech Lang Process. 2010;19(3):600–13.
Ramırez J, Segura JC, Benıtez C, De La Torre A, Rubio A. Efficient voice activity detection algorithms using long-term speech information. Speech Commun. 2004;42(3–4):271–87.
Joosten B, Postma E, Krahmer E. Visual voice activity detection at different speeds. Auditory-Visual Speech Processing (AVSP) 2013, 2013.
Dang K, Sharma S. Review and comparison of face detection algorithms. In: International Conference on Cloud Computing, Data Science & Engineering-Confluence. 2017;7:629–33.
Yang S, Luo P, Loy C-C, Tang X. “Wider face: A face detection benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5525–5533, 2016.
Li H, Lin Z, Shen X, Brandt J, Hua G. A convolutional neural network cascade for face detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5325–5334, 2015.
Caplier A. Lip detection and tracking. In: Proceedings 11th International Conference on Image Analysis and Processing. pp 8–13, 2001.
Wang L, Wang X, Xu J. Lip detection and tracking using variance based haar-like features and kalman filter. In: 2010 Fifth International Conference on Frontier of Computer Science and Technology. pp 608–612, 2010.
Sodoyer D, Rivet B, Girin L, Savariaux C, Schwartz J-L, Jutten C. A study of lip movements during spontaneous dialog and its application to voice activity detection. J Acoust Soc Am. 2009;125(2):1184–96.
Liu Q, Wang W, Jackson P. A visual voice activity detection method with adaboosting, 2011.
Navarathna R, Dean D, Sridharan S, Fookes C, Lucey P. Visual voice activity detection using frontal versus profile views. In: 2011 International Conference on Digital Image Computing: Techniques and Applications, pp 134–139, 2011.
Liu Q, Aubrey AJ, Wang W. Interference reduction in reverberant speech separation with visual voice activity detection. IEEE Trans Multimed. 2014;16(6):1610–23.
Aubrey A, Hicks YA, Chambers J. Visual voice activity detection with optical flow. IET Image Process. 2010;4(6):463–72.
Platt J. Sequential minimal optimization: a fast algorithm for training support vector machines, 1998.
Vikram K, Padmavathi S. Facial parts detection using viola jones algorithm. In: 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS), pp 1–4, IEEE, 2017.
Gupta A, Tiwari R. Face detection using modified viola jones algorithm. Int J Recent Res Math Comput Sci Inform Technol. 2014;1(2):59–66.
Kolsch M, Turk M. Analysis of rotational robustness of hand detection with a viola-jones detector. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol 3, pp 107–110, IEEE, 2004.
Castrillón M, Déniz O, Hernández D, Lorenzo J. A comparison of face and facial feature detectors based on the viola-jones general object detection framework. Mach Vis Appl. 2011;22(3):481–94.
Wang Y-Q. An analysis of the viola-jones face detection algorithm. Image Process Line. 2014;4:128–48.
Jensen OH. Implementing the viola-jones face detection algorithm. Master’s thesis, Technical University of Denmark, DTU, DK-2800 Kgs. Denmark: Lyngby; 2008.
Turk M, Pentland A. Face recognition using eigenfaces. In: Proceedings of 1991 IEEE computer society conference on computer vision and pattern recognition, pp 586–587, 1991.
Yang M.-H, Ahuja N, Kriegman D. Face recognition using kernel eigenfaces. In: Proceedings 2000 International Conference on Image Processing (Cat. No. 00CH37101), vol. 1, pp 37–40, IEEE, 2000.
Barnouti NH, Al-Dabbagh SSM, Matti WE, Naser MAS. Face detection and recognition using viola-jones with PCA-LDA and square Euclidean distance. Int J Adv Comput Sci Appl (IJACSA). 2016;7(5):371–7.
Duda RO, Hart PE, Stork DG. Pattern classification. Amsterdam: Wiley; 2012.
Kshirsagar V, Baviskar M, Gaikwad M. Face recognition using eigenfaces. In: 2011 3rd International Conference on Computer Research and Development, vol 2, pp 302–306, IEEE, 2011.
Acknowledgements
I would like to thank the editorial board and review team from the SN Computer Science Research Journal for their kind and constructive feedback. I would also like to thank Professor Jeffery Ullman, Mr. Sudhir Kamath, Mr. Robert Gendron, and Mrs. Katie MacDougall for their continual support throughout my research work.
Funding
No funding was recieved to support and conduct research. Ananth Goyal authored the entirety of this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Goyal, A. Using Spasmodic Closure Patterns to Simplify Visual Voice Activity Detection. SN COMPUT. SCI. 2, 10 (2021). https://doi.org/10.1007/s42979-020-00395-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-020-00395-6