Abstract
This paper presents the basis-based speaker adaptation method that includes approaches using principal component analysis (PCA) and two-dimensional PCA (2DPCA). The proposed method partitions the hidden Markov model (HMM) mean vectors of training models into subvectors of smaller dimension. Consequently, the sample covariance matrix computed using the partitioned HMM mean vectors has various dimensions according to the dimension of the subvectors. From the eigen-decomposition of the sample covariance matrix, basis vectors are constructed. Thus, the dimension of basis vectors varies according to the dimension of the sample covariance matrix, and the proposed method includes PCA and 2DPCA-based approaches. We present the adaptation equation in both the maximum likelihood (ML) and maximum a posteriori (MAP) frameworks. We perform continuous speech recognition experiments using the Wall Street Journal (WSJ) corpus. The results show that the model with basis vectors whose dimensions are between those of PCA and 2DPCA-based approaches shows good overall performance. The proposed approach in the MAP framework shows additional performance improvement over the ML counterpart when the number of adaptation parameters is large but the amount of available adaptation data is small. Furthermore, the performance of the approach in the MAP framework approach is less sensitive to the choice of model order than the ML counterpart.
Similar content being viewed by others
References
Rabiner, L.R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Gales, M., & Young, S. (2008). The application of hidden Markov models in speech recognition. Foundations and Trends in Signal Processing, 1(3), 195–304.
Kuhn, R., Junqua, J.-C., Nguyen, P., & Niedzielski, N. (2000). Rapid speaker adaptation in eigenvoice space. IEEE Transactions on Audio, Speech, and Language Processing, 8(6), 695–707.
Jolliffe, I.T. (2002). Principal Component Analysis, 2nd edn. New York: Springer.
Chen, S., & Zhu, Y. (2004). Subpattern-based principle component analysis. Pattern Recognition, 37(5), 1081–1083.
Gottumukkal, R., & Asari, V.K. (2004). An improved face recognition technique based on modular PCA approach. Pattern Recognition Letters, 25(4), 429–436.
Jeong, Y. (2013). Unified framework for basis-based speaker adaptation based on sample covariance matrix of variable dimension. Speech Communication, 55(2), 340–346.
Yang, J., Zhang, D., Frangi, A.F., & Yang, J.-Y. (2004). Two-dimensional PCA: A new approach to appearance-based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(1), 131–137.
Paul, D.B., & Baker, J.M. (1992). The design for the Wall Street Journal-based CSR corpus. In Proceedings of DARPA speech and natural language workshop (pp. 357–362).
Jeong, Y. (2012). Adaptation of hidden Markov models using model-as-matrix representation. IEEE Transactions on Audio, Speech, and Language Processing, 20(8), 2352–2364.
Shan, S., Cao, B., Su, Y., Qing, L., Chen, X., & Gao, W. (2008). Unified principal component analysis with generalized covariance matrix for face recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–7).
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 39(1), 1–38.
Leggetter, C.J., & Woodland, P.C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9(2), 171–185.
Gupta, A.K., & Varga, T. (1993). Elliptically contoured models in statistics. Norwell: Kluwer.
Gupta, A.K., & Nagar, D.K. Matrix Variate Distributions. Boca Raton: Chapman & Hall/CRC.
Siohan, O., Chesta, C., & Lee, C.-H. (2001). Joint maximum a posteriori adaptation of transformation and HMM parameters. IEEE Transactions on Speech and Audio Processing, 9(14), 417–428.
Parihar, N., & Picone, J. (2002). Aurora working group: DSR front end LVCSR evaluation AU/384/02. Technical Report, Institute for Signal and Information Processing, Mississippi State University.
Gauvain, J.-L., & Lee, C.-H. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2(2), 291–298.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jeong, Y. Basis-Based Speaker Adaptation Using Partitioned HMM Mean Parameters of Training Speaker Models. J Sign Process Syst 82, 303–310 (2016). https://doi.org/10.1007/s11265-015-0996-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-015-0996-2