A-STAR: Toward translating Asian spoken languages☆
Highlights
► The first Asian network-based speech-to-speech translation system developed by the A-STAR consortium. ► A-STAR field testing experiments was carried out in July 2009, covering eight Asian languages and the English language. ► All the speech-to-speech translation engines have already been successfully implemented into Web servers that can be accessed via client applications worldwide.
Introduction
With an area of 17 million square miles, Asia is the world's largest and most populous continent. It hosts approximately four billion people or 60% of the world's current human population. A wide variety of societies, religions and ethnicities shape the culture of Asia. There is also great linguistic diversity, and the majority of Asian countries have more than one language that is natively spoken. In fact, from a total of more than 6800 living languages in the world, one-third are found in Asia (Lewis, 2009).
As we enter the 21st century, information exchange among people from and into Asia is increasing. Meanwhile, the travel destinations of international travelers – whether for the purpose of tourism, emigration, or foreign study – are becoming increasingly diverse. In particular, the Asian region has witnessed a strengthening of its social and economic relationships. Enhancing mutual understanding and economic relations has thus become a key challenge. Socio-economic relations in the Asian region are more vital today than ever before. These changes are increasing the need for devising means that facilitate interaction among people in Asia speaking different languages. The language barriers between Asian countries are critical problems to overcome.
Spoken language translation is one of the innovative technologies that enable people to communicate with each other by speaking in their own languages. However, translating a spoken language is an extremely complex task. This technology involves research in automatic speech recognition (ASR), machine translation (MT), text-to-speech generation (TTS) and doing this for many languages imposes an even greater challenge. Ideally, to create effective systems, many different countries would conduct joint research. Currently, a number of research groups are making progress in a bid to break down the language barrier. As one of the first examples of international cooperation, the Consortium for Speech TrAnslation Research (C-STAR)2 was formed as a voluntary group of institutions committed to building speech translation systems. Institutions in European countries have also formed TC-STAR3 (Technology and Corpora for Speech-to-speech Translation) consortium, which has the objective of translating public speeches and discussions in international meeting. However, these activities are still rare in the Asian community. Therefore, the Asian Speech Translation Advanced Research (A-STAR)4 consortium was established in June 2006 as a speech translation consortium for creating the basic infrastructure for spoken language communication technologies and overcoming language barriers in Asia (Nakamura et al., 2007).
The A-STAR consortium was founded by the National Institute of Information and Communications Technology/Advanced Telecommunications Research (NICT/ATR), Japan, in collaboration with other research institutes in Asia. Currently, the A-STAR consortium consists of the following members: the Electronics and Telecommunication Research Institute (ETRI) in Korea, the National Electronics and Computer Technology Center (NECTEC) in Thailand, the Institute of Automation, Chinese Academy of Sciences (CASIA) in China, the Agency for Assessment and Application Technology (BPPT) in Indonesia, the Center for Development of Advanced Computing (CDAC) in India, the Institute of Information Technology (IOIT) in Vietnam, and the Institute for Infocomm Research (I2R) in Singapore. The consortium is working collaboratively to collect Asian language corpora, create common speech recognition and translation dictionaries, develop Web service speech translation modules for various Asian languages, and standardize interfaces and data formats that facilitate international connections between the different speech translation modules from different countries. The main objective is to create a network-based speech-to-speech translation system in the Asian region, as illustrated in Fig. 1.
In this paper, we outline the development of the Asian network-based speech-to-speech translation system. The system was designed to translate common spoken utterances of travel conversations from a given source language into multiple target languages in order to facilitate multiparty travel conversations between people speaking different Asian languages. Each A-STAR member contributed one or more of the following spoken language technologies: ASR, MT, and TTS through Web servers. Currently, the system successfully covers nine languages: Hindi (Hi), Indonesian (Id), Japanese (Ja), Korean (Ko), Malay (Ms), Thai (Th), Vietnamese (Vi), Chinese (Zh) and English (En). The system covers travel expressions including proper nouns such as the names of famous places or attractions in Asian countries.
The rest of this paper is organized as follows. Section 2 gives an overview of Asian language processing. Section 3 describes the development of the spoken language translation engines for ASR, MT, and TTS. Issues related to how to handle the additional proper nouns are discussed in Section 4. Next, the overall architecture of the A-STAR speech-to-speech translation system architecture is provided in Section 5. Here, we also describe the standardized data format and client application. Details about the first A-STAR demo experiments and speech-translation results are reported and discussed in Section 6. Some highlights of the challenges that we faced in multiparty multilingual speech-to-speech translation are summarized in Section 7. Finally, our conclusion is presented in Section 8.
Section snippets
Asian language processing
One of the first stages of developing a spoken language technology is the design, collection, and annotation of data corpora. However, the processing of natural Asian languages into suitable structured data format poses a great challenge.
Table 1 summarizes the characteristics of the eight Asian languages (LNG) used in this experiment. As a reference, English is also included. It shows that the Asian languages characteristics are very diverse, in many ways. In contrast to Western languages that
Spoken language technologies
The development of A-STAR spoken language technologies is described in the following section. Each A-STAR member contributes their ASR, MT, and TTS systems. All systems are allowed to be trained with any available corpora. There is no restriction on the type of resources applied.
Multiparty dialog scenario and proper nouns
We have also collected additional multiparty dialog scenarios including famous proper nouns, as follows:
- •
Proper nouns
The proper nouns are collected from ten Asian countries: India, Indonesia, Japan, Korea, Malaysia, Singapore, Thailand, Vietnam, China, and the United States. They mainly comprise city names (e.g., Kyoto in Japan, Beijing in China), tourist areas (e.g., Bulkuksa in Korea, Wat Pra Kaew in Thailand), attractions (e.g., Wayang kulit in Indonesia, Kathak in India), and similar nouns.
Integration of network-based speech-to-speech translation
The spoken language technologies, including ASR, MT, and TTS engines described in Section 3, are provided by A-STAR members through Web servers, and form a unified multilingual network-based speech-to-speech translation system. The overall structure is illustrated in Fig. 2, and the other component systems are described in the following sections.
Evaluation of translation server components
The first A-STAR demo experiments for connecting spoken language technologies in the Asian region were carried out in July 2009. Two evaluations based on basic travel expression sentences, BTEC, and dialog utterances were conducted as described in the following section.
Challenges on multiparty multilingual speech-to-speech translation
In addition to the questionnaire described above, users were allowed to provide some feedback on problems that occur during the A-STAR demo experiments. Here are some highlights of the challenges that we faced in multiparty multilingual speech-to-speech translation.
Conclusion
The first Asian network-based speech-to-speech translation experiments were performed in July 2009. Eight research groups comprising the A-STAR consortium members participated in the experiments, covering eight Asian languages and the English language. All the speech-to-speech translation engines have already been successfully implemented into Web servers that can be accessed via client applications worldwide. This implementation has achieved the desired objective of real-time, location-free
References (54)
- et al.
Korean large vocabulary continuous speech recognition with morpheme-based recognition units
Speech Communication
(2003) - et al.
An introduction to the Chinese speech recognition front-end of the NICT/ATR multi-lingual speech translation system
Tsinghua Science and Technology
(2008) - et al.
Handling of out-of-vocabulary words in phrase-based statistical machine translation for Hindi-Japanese
- et al.
Development of HMM-based Hindi speech synthesis system
- et al.
Extended models and tools for high performance part-of-speech Tagger
- et al.
Piramid: Bahasa Indonesia and Bahasa Malaysia translation system enhanced through comparable corpora
- et al.
A Statistical-machine-translation approach to word boundary identification: a projective analogy of bilingual translation
- et al.
Resource Report: Building Parallel Text Corpora for Multi-Domain Translation System
Absolute Category Rating (ACR) Method for Subjective Testing of Digital Processors
(1984)- et al.
I2R multi-pass machine translation system for IWSLT 2008
A Bayesian model of bilingual segmentation for transliteration
The NICT/ATR speech translation system for IWSLT 2007
TLex: Thai Lexeme analyser based on the conditional random fields
Customizing a Korean–English MT system for patent translation
Construction of Chinese segmented and POS-tagged conversational corpora and their evaluations on spontaneous speech recognitions
Spoken Language Processing
Automatic generation of non-uniform HMM topologies based on the MDL criterion
IEICE Trans. Inf. & Syst.
Thai speech database for speech recognition
XIMERA: a new TTS from ATR based on corpus-based technologies
Comparative study on corpora for speech translation
IEEE Transactions on Audio, Speech, and Language Processing
Statistical phrase-based translation
Moses: open source toolkit for statistical machine translation
A method for English–Korean target word selection using multiple knowledge sources
IEICE Trans. Fundamentals
An Overview of Korean–English speech-to-speech translation system
Query-by-example spoken document retrieval – the star challenge 2008
Cited by (11)
Language learnability analysis of Hindi: A comparison with ideal and constrained learning approaches
2020, Cognitive Informatics, Computer Modelling, and Cognitive Science: Volume 2: Application to Neural Engineering, Robotics, and STEMSpeech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in Indonesian
2023, EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, ProceedingsNusaCrowd: Open Source Initiative for Indonesian NLP Resources
2023, Proceedings of the Annual Meeting of the Association for Computational LinguisticsLeveraging the Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task
2023, 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023Computational intelligence in processing of speech acoustics: a survey
2022, Complex and Intelligent SystemsRule-based reordering and post-processing for indonesian-Korean statistical machine translation
2019, PACLIC 2017 - Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation
- ☆
This paper has been recommended for acceptance by ‘Guest Editors Speech–Speech Translation.’.
- 1
Currently belongs to Nara Institute of Science and Technology, Japan.