Visual Speaker Authentication by a CNN-Based Scheme with Discriminative Segment Analysis

Sun, Jiahui; Wang, Shilin; Zhang, Quanhai

doi:10.1007/978-3-030-36808-1_18

Jiahui Sun⁹,
Shilin Wang⁹ &
Quanhai Zhang⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1142))

Included in the following conference series:

International Conference on Neural Information Processing

2712 Accesses
6 Citations

Abstract

Recent research shows that the static and dynamic features of a lip utterance contain abundant identity-related information. In this paper, a new deep convolutional neural network scheme is proposed. The entire lip utterance is first divided into a series of overlapping segments; then an adaptive scheme is designed to automatically examine the discriminative power and assign a corresponding weight of each segment in the entire utterance. The final authentication result of the entire utterance is determined by weighted voting of the results for all the segments. In addition, considering the various lighting condition in the natural environment, an illumination normalization procedure is proposed. Experimental results show that different segments of the same utterance have different discriminative power for user authentication, and focusing on the discriminative details will be more effective. The proposed method has shown superior performance compared with two state-of-the-art lip authentication approaches investigated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

van der Walt, S., et al., and the scikit-image contributors: scikit-image: image processing in Python. PeerJ, 2, e453 (2014). https://doi.org/10.7717/peerj.453
Article Google Scholar
Cheng, F., Wang, S.L., et al.: Visual speaker authentication with random prompt texts by a dual-task CNN framework. Pattern Recogn. 83, 340–352 (2018)
Article Google Scholar
Broun, C.C., Zhang, X., Mersereau, R.M., Clements, M.: Automatic speechreading with application to speaker verification. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, p. 685. IEEE (2002)
Google Scholar
Campbell, W.M.: Low-complexity speaker authentication techniques using polynomial classifiers. In: Applications and Science of Computational Intelligence II, vol. 3722, pp. 357–368. International Society for Optics and Photonics (1999)
Google Scholar
Cao, K., Jain, A.K.: Automated latent fingerprint recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 788–800 (2019)
Article Google Scholar
Chan, C.H., Goswami, B., Kittler, J., Christmas, W.: Local ordinal contrast pattern histograms for spatiotemporal, lip-based speaker authentication. IEEE Trans. Inf. Forensics Secur. 7(2), 602–612 (2011)
Article Google Scholar
Choraś, M.: The lip as a biometric. Pattern Anal. Appl. 13(1), 105–112 (2010)
Article MathSciNet Google Scholar
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: CVPR (2014)
Google Scholar
Lai, J.Y., Wang, S.L., Liew, A.W.C., Shi, X.J.: Visual speaker identification and authentication by joint spatiotemporal sparse coding and hierarchical pooling. Inf. Sci. 373, 219–232 (2016)
Article Google Scholar
Liao, J., Wang, S., et al.: 3D convolutional neural networks based speaker identification and authentication. In: 2018 25th IEEE (ICIP), pp. 2042–2046. IEEE (2018)
Google Scholar
Liu, X., Cheung, Y.M.: Learning multi-boosted HMMs for lip-password based speaker verification. IEEE Trans. Inf. Forensics Secur. 9(2), 233–246 (2013)
Article Google Scholar
Luettin, J., Maître, G.: Evaluation protocol for the extended M2VTS database (XM2VTSDB). Technical report, IDIAP (1998)
Google Scholar
Marasco, E., Ross, A.: A survey on antispoofing schemes for fingerprint recognition systems. ACM Comput. Surv. (CSUR) 47(2), 28 (2015)
Google Scholar
Parkhi, O.M., Vedaldi, A., et al.: Deep face recognition. BMVC 1, 6 (2015)
Google Scholar
Raja, K.B., Raghavendra, R., Vemuri, V.K.: Smartphone based visible iris recognition using deep sparse filtering. Pattern Recogn. Lett. 57, 33–42 (2015)
Article Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar
Shi, X.X., Wang, S.L., Lai, J.Y.: Visual speaker authentication by ensemble learning over static and dynamic lip details. In: 2016 IEEE (ICIP), pp. 3942–3946. IEEE (2016)
Google Scholar
Suzuki, K., Tsuchihashi, Y.: A trial of personal identification by means of lip print II. Jap. J. Leg. Med. 23, 324–325 (1970)
Google Scholar
Wang, S.L., et al.: Physiological and behavioral lip biometrics: a comprehensive study of their discriminative power. Pattern Recogn. 45(9), 3328–3335 (2012)
Article Google Scholar
Zhang, K., et al.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar

Download references

Acknowledgment

The work described in this paper is fully supported by NSFC Fund (No. 61771310).

Author information

Authors and Affiliations

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
Jiahui Sun, Shilin Wang & Quanhai Zhang

Authors

Jiahui Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shilin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Quanhai Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shilin Wang .

Editor information

Editors and Affiliations

Australian National University, Canberra, ACT, Australia
Tom Gedeon
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, J., Wang, S., Zhang, Q. (2019). Visual Speaker Authentication by a CNN-Based Scheme with Discriminative Segment Analysis. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Communications in Computer and Information Science, vol 1142. Springer, Cham. https://doi.org/10.1007/978-3-030-36808-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-36808-1_18
Published: 05 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36807-4
Online ISBN: 978-3-030-36808-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics