Abstract
This paper proposes a method for structuring and transforming a speech signal. For this, segmentation method, methods for determining the fundamental tone of the vocal segment and determining on its basis the boundaries of the quasiperiodic oscillations of the vocal segment, the geometric transformation of quasiperiodic oscillations of the vocal segment were suggested. The proposed segmentation of the speech signal uses statistical estimation of short-term energies, which allows the use of an adaptive threshold, thus increasing the vocal segments determination accuracy. The proposed definition of fundamental tone of the vocal segment uses bandpass filtering and statistical estimation of local extremum, which reduces computational complexity, and also reduces noise dependency and allows the use of an adaptive threshold, thus increasing the accuracy of determining the fundamental tone and the boundaries of quasiperiodic oscillations of the vocal segment. The proposed geometric transformation of quasiperiodic oscillations of the vocal segment allows you to transform quasiperiodic oscillations to a single amplitude-time window, which allows you to form patterns of the vocal segment, taking into account its structure. A method for determining a model structure for transforming speech signal patterns is proposed, which is based on a statistical evaluation of the quality of the transforming, which provides a high degree of compression and the speech signal identification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bartlett, M., Movellan, J., Sejnowski, T.: Face recognition by independent component analysis. IEEE Trans. Neural Netw. 13(6), 1450–1464 (2002). https://doi.org/10.1109/TNN.2002.804287
Beigi, H.: Fundamentals of Speaker Recognition. Springer, New York (2011). https://doi.org/10.1007/978-0-387-77592-0
Belhumeur, P., Hespanha, J., Kriegman, D.: Eigenfaces vs fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19, 711–720 (1997). https://doi.org/10.1109/34.598228
Bolle, R., Connell, J., Pankanti, S., Ratha, N., Senior, A.: Guide to Biometrics. Springer, New York (2004). https://doi.org/10.1007/978-1-4757-4036-3
Campbell, J.: Speaker recognition: a tutorial. IEEE 85, 1437–1462 (1997). https://doi.org/10.1109/5.628714
Chauhan, V., Dwivedi, S., Karale, P., Potdar, S.: Speech to text converter using gaussian mixture model (gmm). Int. Res. J. Eng. Technol. (IRJET) 3, 160–164 (2016)
Draper, B., Baek, K., Bartlett, M., Beveridge, J.: Recognizing faces with PCA and ICA. Comput. Vis. Image Understand. (Special Issue Face Recognit.) 91(1–2), 115–137 (2003). https://doi.org/10.1016/S1077-3142(03)00077-8
Dunstone, T., Yager, N.: Biometric System and Data Analysis Design, Evaluation, and Data Mining. Springer, New York (2009). https://doi.org/10.1007/978-0-387-77627-9
Fedorov, E., Lukashenko, V., Utkina, T., Lukashenko, A., Rudakov, K.: Method for parametric identification of gaussian mixture model based on clonal selection algorithm. In: CEUR Workshop Proceedings, vol. 2353, pp. 41–55 (2019). https://doi.org/10.15588/1607-3274-2019-2-10
Larin, V.J., Fedorov, E.E.: Combination of PNN network and DTW method for identification of reserved words, used in aviation during radio negotiation. Radioelectron. Commun. Syst. 57(8), 362–368 (2014). https://doi.org/10.3103/S0735272714080044
He, J., Zhang, D.: Face recognition using uniform eigen-space svd on enhanced image for single training sample. J. Comput. Inf. Syst. 7(5), 1655–1662 (2011)
Herbig, T., Gerl, F., Minker, W.: Self-Learning Speaker Identification a System for Enhanced Speech Recognition. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19899-1
Jain, A., Flynn, P., Ross, A.: Handbook of Biometrics. Springer, New York (2008). https://doi.org/10.1007/978-0-387-71041-9
Jeyalakshmi, C., Krishnamurthi, V., Revathi, A.: Speech recognition of deaf and hard of hearing people using hybrid neural network. In: Mechanical and Electronic Engineering (ICMEE 2010), vol. 1, pp. 83–87. (2010). https://doi.org/10.1109/ICMEE.2010.5558589
Keshet, J., Bengio, S.: Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods. John Wiley, Chichester, West Sussex (2009)
Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: from features to supervectors. Elsevier Speech Commun. 52, 12–40 (2010). https://doi.org/10.1016/j.specom.2009.08.009
Li, Q.: Speaker Authentication. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23731-7
Markel, J., Gray, A.: Linear Prediction of Speech. Springer, Berlin (1976). https://doi.org/10.1007/978-3-642-66286-7
Nayana, P., Mathew, D., Thomas, A.: Comparison of text independent speaker identification systems using gmm and i-vector methods. Proc. Comput. Sci. 115, 47–54 (2017). https://doi.org/10.1016/j.procs.2017.09.075
Rabiner, L., Jang, B.: Fundamentals of Speech Recognition. Prentice Hall PTR, Englewood Cliffs (1993)
Reynolds, D.: Automatic speaker recognition using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, 1738–1752 (1995)
Reynolds, D.: An overview of automatic speaker recognition technology. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4, pp. 4072–4075 (2002)
Reynolds, D., Rose, R.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, 72–83 (1995)
Singh, N., Khan, R., Shree, R.: Applications of speaker recognition. Proc. Eng. 38, 3122–3126 (2012). https://doi.org/10.1016/j.proeng.2012.06.363
Togneri, R., Pullela, D.: An overview of speaker identification: accuracy and robustness issues. IEEE Circ. Syst. Mag. 11, 23–61 (2011). https://doi.org/10.1109/MCAS.2011.941079
Zeng, F.Z., Zhou, H.: Speaker recognition based on a novel hybrid algorithm. Proc. Eng. 61, 220–226 (2013). https://doi.org/10.1016/j.proeng.2013.08.007
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Fedorov, E., Utkina, T., Nechyporenko, O., Korpan, Y. (2020). Method of Speech Signal Structuring and Transforming for Biometric Personality Identification. In: Babichev, S., Peleshko, D., Vynokurova, O. (eds) Data Stream Mining & Processing. DSMP 2020. Communications in Computer and Information Science, vol 1158. Springer, Cham. https://doi.org/10.1007/978-3-030-61656-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-61656-4_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61655-7
Online ISBN: 978-3-030-61656-4
eBook Packages: Computer ScienceComputer Science (R0)