Abstract
Recent research shows that the static and dynamic features of a lip utterance contain abundant identity-related information. In this paper, a new deep convolutional neural network scheme is proposed. The entire lip utterance is first divided into a series of overlapping segments; then an adaptive scheme is designed to automatically examine the discriminative power and assign a corresponding weight of each segment in the entire utterance. The final authentication result of the entire utterance is determined by weighted voting of the results for all the segments. In addition, considering the various lighting condition in the natural environment, an illumination normalization procedure is proposed. Experimental results show that different segments of the same utterance have different discriminative power for user authentication, and focusing on the discriminative details will be more effective. The proposed method has shown superior performance compared with two state-of-the-art lip authentication approaches investigated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
van der Walt, S., et al., and the scikit-image contributors: scikit-image: image processing in Python. PeerJ, 2, e453 (2014). https://doi.org/10.7717/peerj.453
Cheng, F., Wang, S.L., et al.: Visual speaker authentication with random prompt texts by a dual-task CNN framework. Pattern Recogn. 83, 340–352 (2018)
Broun, C.C., Zhang, X., Mersereau, R.M., Clements, M.: Automatic speechreading with application to speaker verification. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, p. 685. IEEE (2002)
Campbell, W.M.: Low-complexity speaker authentication techniques using polynomial classifiers. In: Applications and Science of Computational Intelligence II, vol. 3722, pp. 357–368. International Society for Optics and Photonics (1999)
Cao, K., Jain, A.K.: Automated latent fingerprint recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 788–800 (2019)
Chan, C.H., Goswami, B., Kittler, J., Christmas, W.: Local ordinal contrast pattern histograms for spatiotemporal, lip-based speaker authentication. IEEE Trans. Inf. Forensics Secur. 7(2), 602–612 (2011)
Choraś, M.: The lip as a biometric. Pattern Anal. Appl. 13(1), 105–112 (2010)
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: CVPR (2014)
Lai, J.Y., Wang, S.L., Liew, A.W.C., Shi, X.J.: Visual speaker identification and authentication by joint spatiotemporal sparse coding and hierarchical pooling. Inf. Sci. 373, 219–232 (2016)
Liao, J., Wang, S., et al.: 3D convolutional neural networks based speaker identification and authentication. In: 2018 25th IEEE (ICIP), pp. 2042–2046. IEEE (2018)
Liu, X., Cheung, Y.M.: Learning multi-boosted HMMs for lip-password based speaker verification. IEEE Trans. Inf. Forensics Secur. 9(2), 233–246 (2013)
Luettin, J., Maître, G.: Evaluation protocol for the extended M2VTS database (XM2VTSDB). Technical report, IDIAP (1998)
Marasco, E., Ross, A.: A survey on antispoofing schemes for fingerprint recognition systems. ACM Comput. Surv. (CSUR) 47(2), 28 (2015)
Parkhi, O.M., Vedaldi, A., et al.: Deep face recognition. BMVC 1, 6 (2015)
Raja, K.B., Raghavendra, R., Vemuri, V.K.: Smartphone based visible iris recognition using deep sparse filtering. Pattern Recogn. Lett. 57, 33–42 (2015)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Shi, X.X., Wang, S.L., Lai, J.Y.: Visual speaker authentication by ensemble learning over static and dynamic lip details. In: 2016 IEEE (ICIP), pp. 3942–3946. IEEE (2016)
Suzuki, K., Tsuchihashi, Y.: A trial of personal identification by means of lip print II. Jap. J. Leg. Med. 23, 324–325 (1970)
Wang, S.L., et al.: Physiological and behavioral lip biometrics: a comprehensive study of their discriminative power. Pattern Recogn. 45(9), 3328–3335 (2012)
Zhang, K., et al.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Acknowledgment
The work described in this paper is fully supported by NSFC Fund (No. 61771310).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, J., Wang, S., Zhang, Q. (2019). Visual Speaker Authentication by a CNN-Based Scheme with Discriminative Segment Analysis. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Communications in Computer and Information Science, vol 1142. Springer, Cham. https://doi.org/10.1007/978-3-030-36808-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-36808-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36807-4
Online ISBN: 978-3-030-36808-1
eBook Packages: Computer ScienceComputer Science (R0)