Abstract
This paper presents a cross-language development method for speech recognition and synthesis applications for Macedonian language. Unified system for speech recognition and synthesis trained on German language data was used for acoustic model bootstrapping and adaptation. Both knowledge-based and data-driven approaches for source and target language phoneme mapping were used for initial transcription and labeling of small amount of recorded speech. The recognition experiments on the source language acoustic model with target language dataset showed significant recognition performance degradation. Acceptable performance was achieved after Maximum a posteriori (MAP) model adaptation with limited amount of target language data, allowing suitable use for small to medium vocabulary speech recognition applications. The same unified system was used again to train new separate acoustic model for HMM based synthesis. Qualitative analysis showed, despite the low quality of the available recordings and sub-optimal phoneme mapping, that HMM synthesis produces perceptually good and intelligible synthetic speech.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Vu, N.T., Kraus, F., Schultz, T.: Rapid building of an ASR system for Under-Resourced Languages based on Multilingual Unsupervised training. In: Interspeech 2011, Florence, Italy, August 28 (2011)
Schultz, T., Waibel, A.: Experiments on Cross-language Acoustic Modeling. In: Proceedings of the 7th European Conference on Speech Communication and Technology, Eurospeech 2001, Aalborg, Denmark, p. 2721 (2001)
Le, V.B., Besacier, L.: First steps in fast acoustic modeling for a new target language: application to Vietnamese. In: ICASSP 2005, Philadelphia, USA, March 19-23, vol. 1, pp. 821–824 (2005)
Martin, T., Sridharan, S.: Cross-language acoustic model refinement for the Indonesian language. In: International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 865–868 (March 2005)
Lööf, J., Gollan, C., Ney, H.: Cross-language Bootstrapping for Unsupervised Acoustic Model Training: Rapid Development of a Polish Speech Recognition System. In: Interspeech, pp. 88–91 (September 2009)
Le, V.B., Besacier, L., Schultz, T.: Acoustic-Phonetic Unit Similarities for Context Dependent Acoustic Model Portability. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006 (2006)
Chungurski, S., Kraljevski, I., Mihajlov, D., Arsenovski, S.: Concatenative speech synthesizers and speech corpus for Macedonian language. In: 30th International Conference on Information Technology Interfaces, Dubrovnik, Croatia, June 23-26, pp. 669–674 (2008)
Hoffmann, R., Eichner, M., Wolff, M.: Analysis of verbal and nonverbal acoustic signals with the Dresden UASR system. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 200–218. Springer, Heidelberg (2007)
Strecha, G., Wolff, M.: Speech synthesis using HMM based diphone inventory encoding for low-resource devices. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 22-27, pp. 5380–5383 (2011)
Bub, T., Schwinn, J.: VERBMOBIL: The Evolution of a Complex Large Speech-to-Speech Translation System. In: Int. Conf. on Spoken Language Processing, Philadelphia, PA, USA, vol. 4, pp. 2371–2374 (October 1996)
Gauvain, J.-L., Lee, C.-H.: Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Transactions on Speech and Audio Processing 2(2), 291–298 (1994)
Imai, S., Sumita, K., Furuichi, C.: Mel log spectrum approximation (MLSA) filter for speech synthesis. Trans. IECE J66-A, 122–129 (1983)
Tokuda, K., et al.: Speech parameter generation algorithms for HMM-based speech synthesis. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Istanbul, June 5-9, vol. III, pp. 1315–1318. IEEE Computer Society Press, Los Alamitos (2000)
Hoffmann, R., Hirschfeld, D., Jokisch, O., Kordon, U., Mixdorff, H., Mehnert, D.: Evaluation of a multilingual TTS system with respect to the prosodic quality. In: Proc. 14th Intern. Congress of Phonetic Sciences (ICPhS), San Francisco, USA, August 1-7, pp. 2307–2310 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kraljevski, I., Strecha, G., Wolff, M., Jokisch, O., Chungurski, S., Hoffmann, R. (2013). Cross-Language Acoustic Modeling for Macedonian Speech Technology Applications. In: Markovski, S., Gusev, M. (eds) ICT Innovations 2012. ICT Innovations 2012. Advances in Intelligent Systems and Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37169-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-37169-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37168-4
Online ISBN: 978-3-642-37169-1
eBook Packages: EngineeringEngineering (R0)