Regular ArticleStructural maximum a posteriori linear regression for fast HMM adaptation
References (39)
Speech recognition in noisy environments: a survey
Speech Communication
(1995)On stochastic feature and model compensation approaches to robust speech recognition
Speech Communication
(1998)- et al.
Maximum likelihood linear regression for speaker adaptation of the parameters of continuous density hidden Markov models
Computer Speech and Language
(1995) - K. Chen, H. Wang, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Salt...
- C. Chesta, O. Siohan, C.-H. Lee, Proceedings of the European Conference on Speech Communication and Technology,...
- et al.
A hybrid algorithm for speaker adaptation using MAP transformation and adaptation
IEEE Signal Processing Letters
(1997) - J.-T. Chien, C.-H. Lee, H.-C. Wang, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal...
- J.-T. Chien, H.-C. Wang, C.-H. Lee, Proceedings of the European Conference on Speech Communication and Technology,...
- J.-T. Chien, H.-C. Wang, L.-M. Lee, Proceedings of the International Conference on Spoken Language Processing,...
- et al.
Maximum likelihood from incomplete data via the EM algorithm
Journal of Royal Statistical Society Series B
(1977)
Maximum-likelihood stochastic-transformation adaptation of hidden Markov models
IEEE Transactions on Speech and Audio Processing
Speaker adaptation using combined transformation and Bayesian methods
IEEE Transactions on Speech and Audio Processing
Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains
IEEE Transactions on Speech and Audio Processing
Elliptically Contoured Models in Statistics
On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate
IEEE Transactions on Speech and Audio Processing
Online adaptive learning of continuous-density hidden Markov models based on multiple-stream prior evolution and posterior pooling
IEEE Transactions on Speech and Audio Processing
Rapid speaker adaptation in eigenvoice space
IEEE Transactions on Speech and Audio Processing
Adaptive classification and decision strategies for robust speech recognition
Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland
Cited by (76)
Ensemble environment modeling using affine transform group
2015, Speech CommunicationCitation Excerpt :The mapping function is then estimated to adapt the source model to the target model. Several estimation algorithms have been proposed such as linear and nonlinear stochastic matching approaches (Lee, 1998; Sankar and Lee, 1996; Suredran et al., 1999), signal bias removal (SBR) (Rahim and Juang, 1996), maximum likelihood linear regression (MLLR) (Leggetter and Woodland, 1995; Gales, 1997), maximum a posteriori linear regression (MAPLR) (Chesta et al., 1999; Siohan et al., 2001; Siohan et al., 2002), structural Bayesian linear regression (SBLR) (Watanabe et al., 2014), VTS-based model adaptation (Kim et al., 1998), joint compensation of additive and convolutive distortions (JAC) (Gong, 2005; Hu and Huo, 2007; Li et al., 2009), and JAC with unscented transform (JAC-UT) (Hu and Huo, 2006; Li et al., 2010). For Category-2, multiple models ({Λ1, Λ2, … , ΛP} in Fig. 1) that are trained using subsets of the entire training data allow more effective local statistics of environment conditions.
Prior-shared feature and model space speaker adaptation by consistently employing map estimation
2013, Speech CommunicationCitation Excerpt :However, because the estimation criteria for both spaces are based on MAP, setting the prior information for both feature and model spaces is a crucial issue. Siohan et al. (2002) have already pointed out the importance of the choice of the prior densities for the transformation parameters (Siohan et al., 2002). In this work, by sharing the prior distribution we keep the consistency of the adaptation in the two different spaces.
SNAC: Speaker-Normalized Affine Coupling Layer in Flow-Based Architecture for Zero-Shot Multi-Speaker Text-to-Speech
2022, IEEE Signal Processing LettersFeature Extraction Methods for Speaker Recognition: A Review
2017, International Journal of Pattern Recognition and Artificial IntelligenceMulti-scale patch prior learning for image denoising using student's-t mixture model
2017, Journal of Internet Technology