Speaker Recognition Based on Lightweight Neural Network for Smart Home Solutions

Ai, Haojun; Xia, Wuyang; Zhang, Quanxin

doi:10.1007/978-3-030-37352-8_37

Haojun Ai^11,12,
Wuyang Xia¹¹ &
Quanxin Zhang¹³

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11983))

Included in the following conference series:

International Symposium on Cyberspace Safety and Security

1075 Accesses
2 Citations

Abstract

With the technological advancement of smart home devices, the lifestyles of people have been gradually changed. Meanwhile, speaker recognition is available in almost all smart home devices. Currently, the mainstream speaker recognition service is provided by a very deep neural network which trained on the cloud server. However, these deep neural networks are not suitable for deployment and operation on smart home devices. Aiming at this problem, in this paper, we propose a packet bottleneck method to improve SqueezeNet which has been widely used in the speaker recognition task. In the meantime, a lightweight structure named TrimNet has been designed. Besides, a model updating strategy based on transfer learning has been adopted to avoid model deteriorates due to the cold speech. The experimental results demonstrate that the proposed lightweight structure TrimNet is superior to SqueezeNet in classification accuracy, structural parameter quantity, and calculation amount. Moreover, the model updating method can increase the recognition rate of cold speech without damaging the recognition rate of other speakers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hansen, J.H.L., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)
Article Google Scholar
Richards, H., Haynes, R., Kim, Y., Bridle, J.: Generalised discriminative transform via curriculum learning for speaker recognition. In: 2018 IEEE ICASSP, pp. 5324–5328 (2018)
Google Scholar
Ghiurcau, M.V., Rusu, C., Astola, J.: A study of the effect of emotional state upon text-independent speaker identification. In: 2011 IEEE International Conference on ICASSP, 2011, pp. 4944–4947 (2011)
Google Scholar
Matveev, Y.: The problem of voice template aging in speaker recognition systems. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS (LNAI), vol. 8113, pp. 345–353. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-01931-4_46
Chapter Google Scholar
Przybocki, M.A., Martin, A.F., Le, A.N.: Nist speaker recognition evaluations utilizing the mixer corporał 2004, 2005, 2006. IEEE Trans. Audio Speech Lang. Process. 15(7), 1951–1959 (2007)
Article Google Scholar
Wagner, J., Fraga-Silva, T., Josse, Y., Schiller, D., Sei-derer, A., Andre, E.: Infected phonemes: how a cold impairs speech on a phonetic level. In: Proceedings of Interspeech 2017, pp. 3457–3461 (2017)
Google Scholar
Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: 2016 IEEE Inter- national Conference on ICASSP, 2016, pp. 4945–4949 (2016)
Google Scholar
Berry, D.A., Herzel, H., Titze, I.R., Krischer, K.: Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillations with empirical eigenfunctions. J. Acoust. Soc. Am. 95(6), 3595–3604 (1994)
Article Google Scholar
Godino Llorente, J.I., Díazde María, F.: Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans. Audio Speech Lang. Process. 17(6), 1186–1195 (2009)
Article Google Scholar
Hansen, J.H.L., Gavidia Ceballos, L., Kaiser, J.F.: A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment. IEEE Trans. Biomed. Eng. 45(3), 300–313 (1998)
Article Google Scholar
Tull, R.G., Rutledge, J.C., Larson, C.R: Cepstral analysis of cold-speech for speaker recognition: a second look. Ph.D. thesis, ASA (1996)
Google Scholar
Cole, R.A., Noel, M., Noel, V.: The CSLU speaker recognition corpus. In: Fifth International Conference on Spoken Language Processing (1998)
Google Scholar
Beigi, H.: Effects of time lapse on speaker recognition results. In: 2009 16th Inter- national Conference on Digital Signal Processing, pp. 1–6 (2009)
Google Scholar
Reynolds, D.A., Rose, R.C., et al.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Article Google Scholar
Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Speech Audio Process. 19(4), 788–798 (2011)
Article Google Scholar
Senior, I., Lopez-Moreno, A.: Improving DNN speaker independence with i-vector inputs. In: 2014 IEEE International Conference on ICASSP, 2014, pp. 225–229 (2014)
Google Scholar
Kenny, P.: Bayesian speaker verification with heavy tailed priors. In: Odyssey 2010, p. 14 (2010)
Google Scholar
Rohdin, J., Silnova, A., Diez, M., Plchot, O., Matějka, P., Burget, L.: End-to-end DNN based speaker recognition inspired by i-vector and PLDA. In: 2018 IEEE ICAS-SP, 2018, pp. 4874–4878 (2018)
Google Scholar
Yamada, T., Wang, L., Kai, A.: Improvement of distant-talking speaker identification using bottleneck features of DNN. In: Interspeech 2013, pp. 3661–3664 (2013)
Google Scholar
Lei, Y., Scheffer, N., Ferrer, L., McLaren, M.: A novel scheme for speaker recognition using a phonetically- aware deep neural network. In: 2014 IEEE International Conference on ICASSP, 2014, pp. 1695–1699 (2014)
Google Scholar
Torfi, A., Dawson, J., Nasrabadi, N.M.: Text-independent speaker verification using 3d convolutional neural networks. In: 2018 IEEE ICME, 2018, pp. 1–6 (2018)
Google Scholar

Download references

Acknowledgement

This paper is supported by the National Natural Science Foundation of China (General Program). Grant No. 61971316.

Author information

Authors and Affiliations

School of Cyber Science and Engineering, Wuhan University, Wuhan, People’s Republic of China
Haojun Ai & Wuyang Xia
Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, Wuhan, China
Haojun Ai
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, People’s Republic of China
Quanxin Zhang

Authors

Haojun Ai
View author publications
You can also search for this author in PubMed Google Scholar
Wuyang Xia
View author publications
You can also search for this author in PubMed Google Scholar
Quanxin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Haojun Ai or Wuyang Xia .

Editor information

Editors and Affiliations

Rutgers University, Newark, NJ, USA
Jaideep Vaidya
Beihang University, Beijing, China
Xiao Zhang
Guangzhou University, Guangzhou, China
Jin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ai, H., Xia, W., Zhang, Q. (2019). Speaker Recognition Based on Lightweight Neural Network for Smart Home Solutions. In: Vaidya, J., Zhang, X., Li, J. (eds) Cyberspace Safety and Security. CSS 2019. Lecture Notes in Computer Science(), vol 11983. Springer, Cham. https://doi.org/10.1007/978-3-030-37352-8_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-37352-8_37
Published: 03 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37351-1
Online ISBN: 978-3-030-37352-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics