research-article

Robust deep-learning models for text-to-speech synthesis support on embedded devices

Authors:

Stefan Daniel DumitrescuAuthors Info & Claims

MEDES '15: Proceedings of the 7th International Conference on Management of computational and collective intElligence in Digital EcoSystems

Pages 98 - 102

https://doi.org/10.1145/2857218.2857234

Published: 25 October 2015 Publication History

Get Access

Abstract

Currently, smartphones and tablets are firmly implanted within our daily lives. These devices have an entire ecosystem devoted to them, with applications and tools designed for their specifications: they use touch-enabled interfaces, have a limited amount of memory and CPU time available for apps (16/32MB limit on Android and iOS devices). A well-established research domain is the development of natural human-computer-interfaces (HCI) via voice and gestures. However, these interfaces are bound by the hardware resources available to them, and by the fact that they use network/Internet access to send/receive data, relying on dedicated servers for the decision making process. This paper focuses on the development of small robust deep-learning models that are designed to provide high quality text-to-speech (TTS) functionality (one of the three main components of HCI) on smart devices, without requiring network access. We obtain very good results in TTS text sub-tasks using models significantly smaller than those used in state-of-the-art approaches.

References

[1]

Bartlett, S., Kondrak, G., & Cherry, C. 2008. Automatic syllabification with structured SVMs for letter-to-phoneme conversion. Proceedings of ACL-08: HLT, 568--576.

Google Scholar

[2]

Bengio, Lamblin, Popovici, Larochelle. 2006. "Greedy Layer-Wise Training of Deep Networks", in NIPS

Google Scholar

[3]

Boroş, Tiberiu and Dumitrescu, Ştefan Daniel. 2013. Improving the RACAI Neural Network MSD Tagger. In Engineering Applications of Neural Networks (Lazaros Iliadis and Harris Papadopoulos and Chrisina Jayne). Springer, vol. 383, pp. 42--51

Google Scholar

[4]

Demberg, V., Schmid, H., & Mohler, G. 2007. Phonological constraints and morphological preprocessing for grapheme-to-phoneme conversion. In Annual Meeting-Association for Computational Linguistics (Vol. 45, No. 1, p. 96).

Google Scholar

[5]

Erjavec, T. and Monachini, M. (Eds.). 1997. Specifications and Notation for Lexicon Encoding. Deliverable D1.1 F. Multext-East Project COP-106

Google Scholar

[6]

Hinton, Osindero and Teh. 2006. "A fast Learning Algorithm for Deep Belief Nets", in Neural Computation

Digital Library

Google Scholar

[7]

Jiampojamarn, S., Cherry, C., & Kondrak, G. 2008. Joint processing and discriminative training for letter-to-phoneme conversion. Proceedings of ACL-08: HLT, 905--913.

Google Scholar

[8]

Stan, A., Yamagishi, J., King, S., & Aylett, M. 2011. The Romanian Speech Synthesis (RSS) corpus: building a high quality HMM-based speech synthesis system using a high sampling rate. Speech Communication, 53(3), 442--450.

Digital Library

Google Scholar

[9]

Ungurean, C., Burileanu, D., Popescu, V. and Derviş, A. 2011. Hybrid Syllabification and Letter-To-Phone Conversion For TTS Synthesis. In U.P.B. Sci. Bull., Series C, Vol. 73, Iss. 3, 2011, ISSN 1454-234x

Google Scholar

Cited By

View all

Chougule RKamkhalia A(2024)Deep Learning’s Impact on Speech Synthesis for Mobile DevicesArtificial Intelligence and Sustainable Computing10.1007/978-981-97-0327-2_9(117-131)Online publication date: 24-Apr-2024
https://doi.org/10.1007/978-981-97-0327-2_9
Suciu GDobre RButca CSuciu VMihaila ICheveresan R(2016)Search based applications for speech processing2016 8th International Conference on Electronics, Computers and Artificial Intelligence (ECAI)10.1109/ECAI.2016.7861101(1-6)Online publication date: Jun-2016
https://doi.org/10.1109/ECAI.2016.7861101

Index Terms

Robust deep-learning models for text-to-speech synthesis support on embedded devices

Recommendations

Emilia: a speech corpus for Argentine Spanish text to speech synthesis
Abstract
This paper introduces Emilia, a speech corpus created to build a female voice in Spanish spoken in Buenos Aires for the Aromo text-to-speech system. Aromo is a unit selection text-to-speech system, which employs diphones as units of synthesis. The ...
WordNet Based Sindhi Text to Speech Synthesis System
ICCRD '10: Proceedings of the 2010 Second International Conference on Computer Research and Development

The text-to-speech (TTS) synthesis technology enables machine to convert text into audible speech and used throughout the world to enhance the accessibility of the information. The important component of any TTS synthesis system is the database of ...
Development of an automatic phonetization system for Arabic text-to-speech synthesis

The goal of this paper is to solve the phonetization task of the Arabic text dedicated to text-to-speech synthesis. In this paper, we will describe the steps we followed in order to develop an automatic phonetization system, and the various established ...

Comments

Information & Contributors

Information

Published In

MEDES '15: Proceedings of the 7th International Conference on Management of computational and collective intElligence in Digital EcoSystems

October 2015

271 pages

ISBN:9781450334808

DOI:10.1145/2857218

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MEDES '15

Sponsor:

IFSP

MEDES '15: The 7th International Conference on Management of computational and collective IntElligence in Digital EcoSystems

October 25 - 29, 2015

Caraguatatuba, Brazil

Acceptance Rates

MEDES '15 Paper Acceptance Rate 13 of 64 submissions, 20%;

Overall Acceptance Rate 267 of 682 submissions, 39%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
148
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Chougule RKamkhalia A(2024)Deep Learning’s Impact on Speech Synthesis for Mobile DevicesArtificial Intelligence and Sustainable Computing10.1007/978-981-97-0327-2_9(117-131)Online publication date: 24-Apr-2024
https://doi.org/10.1007/978-981-97-0327-2_9
Suciu GDobre RButca CSuciu VMihaila ICheveresan R(2016)Search based applications for speech processing2016 8th International Conference on Electronics, Computers and Artificial Intelligence (ECAI)10.1109/ECAI.2016.7861101(1-6)Online publication date: Jun-2016
https://doi.org/10.1109/ECAI.2016.7861101

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Emilia: a speech corpus for Argentine Spanish text to speech synthesis

WordNet Based Sindhi Text to Speech Synthesis System

Development of an automatic phonetization system for Arabic text-to-speech synthesis

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations