Cross-Language Acoustic Modeling for Macedonian Speech Technology Applications

Kraljevski, Ivan; Strecha, Guntram; Wolff, Matthias; Jokisch, Oliver; Chungurski, Slavcho; Hoffmann, Rüdiger

doi:10.1007/978-3-642-37169-1_4

Cross-Language Acoustic Modeling for Macedonian Speech Technology Applications

Ivan Kraljevski³,
Guntram Strecha³,
Matthias Wolff⁴,
Oliver Jokisch³,
Slavcho Chungurski⁵ &
…
Rüdiger Hoffmann³

Conference paper

1018 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 207))

Abstract

This paper presents a cross-language development method for speech recognition and synthesis applications for Macedonian language. Unified system for speech recognition and synthesis trained on German language data was used for acoustic model bootstrapping and adaptation. Both knowledge-based and data-driven approaches for source and target language phoneme mapping were used for initial transcription and labeling of small amount of recorded speech. The recognition experiments on the source language acoustic model with target language dataset showed significant recognition performance degradation. Acceptable performance was achieved after Maximum a posteriori (MAP) model adaptation with limited amount of target language data, allowing suitable use for small to medium vocabulary speech recognition applications. The same unified system was used again to train new separate acoustic model for HMM based synthesis. Qualitative analysis showed, despite the low quality of the available recordings and sub-optimal phoneme mapping, that HMM synthesis produces perceptually good and intelligible synthetic speech.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Vu, N.T., Kraus, F., Schultz, T.: Rapid building of an ASR system for Under-Resourced Languages based on Multilingual Unsupervised training. In: Interspeech 2011, Florence, Italy, August 28 (2011)
Google Scholar
Schultz, T., Waibel, A.: Experiments on Cross-language Acoustic Modeling. In: Proceedings of the 7th European Conference on Speech Communication and Technology, Eurospeech 2001, Aalborg, Denmark, p. 2721 (2001)
Google Scholar
Le, V.B., Besacier, L.: First steps in fast acoustic modeling for a new target language: application to Vietnamese. In: ICASSP 2005, Philadelphia, USA, March 19-23, vol. 1, pp. 821–824 (2005)
Google Scholar
Martin, T., Sridharan, S.: Cross-language acoustic model refinement for the Indonesian language. In: International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 865–868 (March 2005)
Google Scholar
Lööf, J., Gollan, C., Ney, H.: Cross-language Bootstrapping for Unsupervised Acoustic Model Training: Rapid Development of a Polish Speech Recognition System. In: Interspeech, pp. 88–91 (September 2009)
Google Scholar
Le, V.B., Besacier, L., Schultz, T.: Acoustic-Phonetic Unit Similarities for Context Dependent Acoustic Model Portability. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006 (2006)
Google Scholar
Chungurski, S., Kraljevski, I., Mihajlov, D., Arsenovski, S.: Concatenative speech synthesizers and speech corpus for Macedonian language. In: 30th International Conference on Information Technology Interfaces, Dubrovnik, Croatia, June 23-26, pp. 669–674 (2008)
Google Scholar
Hoffmann, R., Eichner, M., Wolff, M.: Analysis of verbal and nonverbal acoustic signals with the Dresden UASR system. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 200–218. Springer, Heidelberg (2007)
Chapter Google Scholar
Strecha, G., Wolff, M.: Speech synthesis using HMM based diphone inventory encoding for low-resource devices. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 22-27, pp. 5380–5383 (2011)
Google Scholar
Bub, T., Schwinn, J.: VERBMOBIL: The Evolution of a Complex Large Speech-to-Speech Translation System. In: Int. Conf. on Spoken Language Processing, Philadelphia, PA, USA, vol. 4, pp. 2371–2374 (October 1996)
Google Scholar
Gauvain, J.-L., Lee, C.-H.: Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Transactions on Speech and Audio Processing 2(2), 291–298 (1994)
Article Google Scholar
Imai, S., Sumita, K., Furuichi, C.: Mel log spectrum approximation (MLSA) filter for speech synthesis. Trans. IECE J66-A, 122–129 (1983)
Google Scholar
Tokuda, K., et al.: Speech parameter generation algorithms for HMM-based speech synthesis. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Istanbul, June 5-9, vol. III, pp. 1315–1318. IEEE Computer Society Press, Los Alamitos (2000)
Google Scholar
Hoffmann, R., Hirschfeld, D., Jokisch, O., Kordon, U., Mixdorff, H., Mehnert, D.: Evaluation of a multilingual TTS system with respect to the prosodic quality. In: Proc. 14th Intern. Congress of Phonetic Sciences (ICPhS), San Francisco, USA, August 1-7, pp. 2307–2310 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Chair for System Theory and Speech Technology, TU Dresden, Dresden, Germany
Ivan Kraljevski, Guntram Strecha, Oliver Jokisch & Rüdiger Hoffmann
Electronics and Information Technology Institute, BTU Cottbus, Cottbus, Germany
Matthias Wolff
Faculty of Informatics, FON University, Skopje, Macedonia
Slavcho Chungurski

Authors

Ivan Kraljevski
View author publications
You can also search for this author in PubMed Google Scholar
Guntram Strecha
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Wolff
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Jokisch
View author publications
You can also search for this author in PubMed Google Scholar
Slavcho Chungurski
View author publications
You can also search for this author in PubMed Google Scholar
Rüdiger Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivan Kraljevski .

Editor information

Editors and Affiliations

, Faculty of Computer Science, Ss Cyrill and Methodius University, Ruger Boskovic 16, POBox 393, Skopje, 1000, Macedonia
Smile Markovski
, Faculty of Information Sciences, Ss Cyrill and Methodius University, Ruger Boskovic 16, Skopje, 1000, Macedonia
Marjan Gusev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kraljevski, I., Strecha, G., Wolff, M., Jokisch, O., Chungurski, S., Hoffmann, R. (2013). Cross-Language Acoustic Modeling for Macedonian Speech Technology Applications. In: Markovski, S., Gusev, M. (eds) ICT Innovations 2012. ICT Innovations 2012. Advances in Intelligent Systems and Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37169-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-37169-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37168-4
Online ISBN: 978-3-642-37169-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics