Rapid Speaker Adaptation Based on Combination of KPCA and Latent Variable Model

Ansari, Zohreh; Almasganj, Farshad; Kabudian, Seyed Jahanshah

doi:10.1007/s00034-021-01660-6

Rapid Speaker Adaptation Based on Combination of KPCA and Latent Variable Model

Published: 22 March 2021

Volume 40, pages 3996–4017, (2021)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Zohreh Ansari¹,
Farshad Almasganj² &
Seyed Jahanshah Kabudian³

134 Accesses
1 Citation
Explore all metrics

Abstract

Speaker adaptation is implemented in order to shift the speaker-independent model closer to the new speaker speech characteristics to improve the speech recognition performance. The kernel eigenspace-based speaker adaptation methods provide satisfactory performance using only a small amount of adaptation data. In such adaptation methods, kernel principal component analysis (KPCA) is applied to the training speaker space in order to create kernel eigenspace. Then, the adapted acoustic model to the new user is calculated in that space. One limitation of KPCA is its inability to define a precise pre-image of the model adapted in the kernel eigenspace, back to the speaker space. Therefore, a huge amount of computations is required to perform adaptation. The previously developed solutions for calculation of an approximate pre-image of the adapted model do not necessarily lead to the optimal conditions. Therefore, in this paper, we propose an efficient solution for this problem to construct more reliable pre-image of the adapted model in the speaker space. For this purpose, we benefit from the latent variable model to define a probabilistic model for description of the applied mapping between the kernel eigenspace and the speaker space. The experiments were conducted on two speech databases: FARSDAT, a Persian, and TIMIT, an English speech database. Implementing a typical HMM-based automatic speech recognition system, it was verified that the proposed method, utilizing about three seconds of adaptation data, achieves up to 4.4% and 7.6% relative phoneme recognition accuracy rate over the speaker-independent model on FARSDAT and TIMIT, respectively. Moreover, the proposed approach demonstrated superior performance compared to the other kernel eigenspace-based adaptation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Basis-Based Speaker Adaptation Using Partitioned HMM Mean Parameters of Training Speaker Models

Article 01 April 2015

Modelling Speaker Variability Using Covariance Learning

Robust Principal Component Analysis Based Speaker Verification Under Additive Noise Conditions

Data Availability

The datasets analyzed during the current study are not freely available. Monetary payment is required to access the dataset from the following Web links: FARSDAT: http://catalog.elra.info/en-us/repository/browse/ELRA-S0112/. TIMIT: https://www.ldc.upenn.edu/language-resources/data/obtaining.

Code Availability

The code will be made available on reasonable request.

References

S.M. Ahadi, P.C. Woodland, Combined Bayesian and predictive techniques for rapid speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 11, 187–206 (1997)
Article Google Scholar
Z. Ansari, F. Almasganj, Implementing PCA-based speaker adaptation methods in a Persian ASR system, in Proceeding of 5th International Symposium on Telecommunications (IST 2010) (2010)
Z. Ansari, F. Almasganj, Implementing KPCA-based speaker adaptation methods with different optimization algorithms in a Persian ASR system. Procedia Soc. Behav. Sci. 32, 117–127 (2012)
Article Google Scholar
M.S. Bazarra, H.D. Sherali, C.M. Shelty, Nonlinear Programming, Theory and Algorithms (Wiley, New York, 2006).
Book Google Scholar
M. Bijankhan, M.J. Sheikhzadegan, FARSDAT-the Farsi spoken language database, in Proceedings of International Conference on Speech Sciences and Technology, Vol. 2, (1994), pp. 826–829
C.M. Bishop, Latent variable models, in Learning in Graphical Models (Springer, Dordrecht, 1998), pp. 371–403
M.A. Carreira-Perpinàn, Continuous Latent Variable Models for Dimensionality Reduction and Sequential Data Reconstruction. PhD Thesis (Department of Computer Science, University of Sheffield, UK, 2001)
M.A. Carreira-Perpinán, Z. Lu, The Laplacian eigenmaps latent variable model, in Proceeding of Artificial Intelligence and Statistics (2007), pp. 59–66
K.T. Chen, W.W. Liau, H.M. Wang, L.S. Lee, Fast speaker adaptation using eigenspace-based maximum likelihood linear regression, in Proceedings of International Conference on Spoken Language Processing (2000), pp. 742–745
D.J. Choi, J.S. Park, Y.H. Oh, Unsupervised rapid speaker adaptation based on selective eigenvoice merging for user-specific voice interaction. Eng. Appl. Artif. Intell. 40, 95–102 (2015)
Article Google Scholar
A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
P. Díez, S. Zlotnik, A. García-González, A. Huerta, Algebraic PGD for tensor separation and compression: an algorithmic approach. Comptes Rendus Mécanique 346(7), 501–514 (2018)
Article Google Scholar
M.J.F. Gales, P.C. Woodland, Mean and variance adaptation within the MLLR framework. Comput. Speech Lang. 10, 249–264 (1996)
Article Google Scholar
M.J.F. Gales, Cluster adaptive training of hidden Markov models. IEEE Trans. Speech Audio Process. 8, 417–428 (2000)
Article Google Scholar
A. García-González, A. Huerta, S. Zlotnik, P. Díez, A kernel Principal Component Analysis (kPCA) digest with a new backward mapping (pre-image reconstruction) strategy (2020). arXiv preprint arXiv:2001.01958
J.S. Garofolo, L.F.Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM, in NIST Speech Disc 1–1.1. STIN, 93 (1993), p. 27403
J.-L. Gauvain, C.-H. Lee, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2, 291–298 (1994)
Article Google Scholar
D. González, J.V. Aguado, E. Cueto, E. Abisset-Chavanne, F. Chinesta, kPCA-based parametric solutions within the PGD framework. Arch. Comput. Methods Eng. 25(1), 69–86 (2018)
Article MathSciNet Google Scholar
L. Grassi, E. Schileo, C. Boichon, M. Viceconti, F. Taddei, Comprehensive evaluation of PCA-based finite element modelling of the human femur. Med. Eng. Phys. 36(10), 1246–1252 (2014)
Article Google Scholar
S. Hahm, Y. Ohkawa, M. Ito, M. Suzuki, A. Ito, S. Makino, Aspect-model based referenced speaker weighting, in Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP2010) (2010), pp. 4302–4305
T. Hazen, J. Glass, A Comparison of novel techniques for instantaneous speaker adaptation, in Proceedings of Eurospeech, Greece (1997)
R.W.H. Hsiao, Kernel Eigenspace Based MLLR Adaptation. Master Thesis, Department of Computer Science, Hong Kong University (2004)
X. Hung, A. Acero, H.-W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development (Prentice Hall, NJ, 2001).
Google Scholar
A. Jafari, F. Almasganj, Using Laplacian eigenmaps latent variable model and manifold learning to improve speech recognition accuracy. J. Speech Commun. 52, 725–735 (2010)
Article Google Scholar
Y. Jeong, Speaker adaptation based on the multilinear decompositions of training speaker models, in Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP2010), Dallas, TX (2010), pp. 4870–4873
Y. Jeong, Adaptation of hidden Markov models using model-as-matrix representation. IEEE Trans. Audio Speech Lang. Process. 20(8), 2352–2364 (2012)
Article Google Scholar
I.T. Jolliffe, Principal Component Analysis (Springer, New York, 2002).
MATH Google Scholar
S. Keyhanian, B. Nasersharif, Laplacian eigenmaps latent variable model modification for pattern recognition, in Proceeding of the 23rd of Iranian Conference on Electrical Engineering (2015), pp. 668–673
N.S. Kim, J.S. Sung, D.H. Hong, Factored MLLR adaptation. IEEE Signal Process. Lett. 18(2), 99–102 (2011)
Article Google Scholar
R. Kuhn, J.-C. Junqua, P. Nguyen, N. Niedzeiki, Rapid speaker adaptation in eigenvoice space. IEEE Trans. Speech Audio Process. 14, 695–707 (2000)
Article Google Scholar
J.T. Kwok, I.W. Tsang, The pre-image problem in kernel methods, in Proceedings of the International Conference on Machine Learning (ICML 2003), Washington, DC (2003)
K.F. Lee, H.W. Hon, Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 66, 371641–1648 (1989)
Google Scholar
E. Lopez, D. Gonzalez, J.V. Aguado, E. Abisset-Chavanne, E. Cueto, C. Binetruy, F. Chinesta, A manifold learning approach for integrated computational materials engineering. Arch. Comput. Methods Eng. 25(1), 59–68 (2018)
Article MathSciNet Google Scholar
Z. Lu, C. Sminchisescu, M.Á. Carreira-Perpiñán, People tracking with the laplacian eigenmaps latent variable model, in Proceeding of Advances in Neural Information Processing Systems (2008), pp. 1705–1712
B. Mak, J.T. Kwok, S. Ho, Kernel eigenvoice speaker adaptation. IEEE Trans. Audio Speech Lang. Process. 13, 984–992 (2005)
Article Google Scholar
B. Mak, R.W.-H. Hsiao, S.K.-L. Ho, J.T. Kwok, Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting. IEEE Trans. Audio Speech Lang. Process. 14, 1267–1280 (2006)
Article Google Scholar
B. Mak, R.W.-H. Hsiao, Kernel eigen-space-based MLLR adaptation. IEEE Trans. Audio Speech Lang. Process. 15, 784–795 (2007)
Article Google Scholar
B.K.-W. Mak, T.-C. Lai, I.W. Tsang, J.T.-Y. Kwok, Maximum penalized likelihood kernel regression for fast adaptation. IEEE Trans Audio Speech Lang. Process. 17(7), 1372–1381 (2009)
Article Google Scholar
G.J. McLachlan, T. Krishnan, The EM Algorithm and Extensions (Wiley, New York, 1996).
MATH Google Scholar
Y. Miao, F. Metze, A. Waibel, Learning discriminative basis coefficients for eigenspace MLLR unsupervised adaptation, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2013), pp. 7927–7931
Y. Pang, L. Wang, Y. Yuan, Generalized KPCA by adaptive rules in feature space. Intern. J. Comput. Math. 87(5), 956–968 (2010)
Article MathSciNet Google Scholar
Z. Roupakia, M. Gales, Kernel eigenvoices (revisited) for large-vocabulary speech recognition. IEEE Signal Process. Lett. 18(12), 709–712 (2011)
Article Google Scholar
Z. Roupakia, A. Ragni, M. Gales, Rapid nonlinear speaker adaptation for large-vocabulary continuous speech recognition, in Proceedings of INTERSPEECH 2012, Portland, Oregon (2012)
O. Saz, T. Hain, Using contextual information in joint factor eigenspace MLLR for speech recognition in diverse scenarios, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014), pp. 6314–6318
B. Schölkopf, K.-R. Müller, Nonlinear component analysis as a kernel eigenvalue problem. J. Neural Comput. 10(5), 1299–1319 (1998)
Article Google Scholar
Y. Shiokawa, Y. Date, J. Kikuchi, Application of kernel principal component analysis and computational machine learning to exploration of metabolites strongly associated with diet. Sci. Rep. 8(1), 1–8 (2018)
Article Google Scholar
J.C. Snyder, S. Mika, K. Burke, K.R. Müller, Kernels, pre-images and optimization, in Empirical Inference (Springer, Berlin, 2013), pp. 245–259
Y. Tang, R. Rose, Rapid speaker adaptation using clustered maximum-likelihood linear basis with sparse training data. J. IEEE Trans. Audio Speech Lang. Process. 16(3), 607–616 (2008)
Article Google Scholar
Q. Wang, Kernel principal component analysis and its applications in face recognition and active shape models (2012). arXiv preprint arXiv:1207.3538
D. Widjaja, C. Varon, A. Dorado, J.A. Suykens, S. Van Huffel, Application of kernel principal component analysis for single-lead-ECG-derived respiration. IEEE. Trans. Biomed. Eng. 59(4), 1169–1176 (2012)
Article Google Scholar
Young et al., HTK Book (2009). http://htk.eng.cam.ac.uk
W.S. Zheng, J. Lai, P.C. Yuen, Penalized preimage learning in kernel principal component analysis. IEEE Trans. Neural Netw. 21(4), 551–570 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Engineering Faculty, Meybod University, Yahyazadeh Blv., Khorramshahr Blv., PO Box: 8961699557, Meybod, Yazd, Iran
Zohreh Ansari
Speech Processing Laboratory, Biomedical Engineering Department, Amirkabir University of Technology, Tehran, Iran
Farshad Almasganj
Engineering Faculty, Razi University, Kermanshah, Iran
Seyed Jahanshah Kabudian

Authors

Zohreh Ansari
View author publications
You can also search for this author in PubMed Google Scholar
Farshad Almasganj
View author publications
You can also search for this author in PubMed Google Scholar
Seyed Jahanshah Kabudian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zohreh Ansari.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ansari, Z., Almasganj, F. & Kabudian, S.J. Rapid Speaker Adaptation Based on Combination of KPCA and Latent Variable Model. Circuits Syst Signal Process 40, 3996–4017 (2021). https://doi.org/10.1007/s00034-021-01660-6

Download citation

Received: 23 October 2019
Revised: 13 January 2021
Accepted: 23 January 2021
Published: 22 March 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00034-021-01660-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rapid Speaker Adaptation Based on Combination of KPCA and Latent Variable Model

Abstract

Access this article

Similar content being viewed by others

Basis-Based Speaker Adaptation Using Partitioned HMM Mean Parameters of Training Speaker Models

Modelling Speaker Variability Using Covariance Learning

Robust Principal Component Analysis Based Speaker Verification Under Additive Noise Conditions

Data Availability

Code Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Rapid Speaker Adaptation Based on Combination of KPCA and Latent Variable Model

Abstract

Access this article

Similar content being viewed by others

Basis-Based Speaker Adaptation Using Partitioned HMM Mean Parameters of Training Speaker Models

Modelling Speaker Variability Using Covariance Learning

Robust Principal Component Analysis Based Speaker Verification Under Additive Noise Conditions

Data Availability

Code Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation