Indonesian graphemic syllabification using a nearest neighbour classifier and recovery procedure

Parande, Edwina Anky; Suyanto, Suyanto

doi:10.1007/s10772-018-09569-3

Indonesian graphemic syllabification using a nearest neighbour classifier and recovery procedure

Published: 08 November 2018

Volume 22, pages 13–20, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

194 Accesses
22 Citations
Explore all metrics

Abstract

An automatic syllabification, decomposing a word into syllables, is an important part in an automatic speech recognition (ASR) that uses both syllable-based acoustic and language models. It can be performed to either phoneme or grapheme sequences. The phonemic syllabification is more complex than the other since it requires a grapheme-to-phoneme conversion (G2P) as a previous process. It generally gives a high accuracy for many formal words but its accuracy may decrease for person-names. In contrast, the graphemic syllabification is simpler and more potential to be applied for person-names. This research focuses on developing a model of graphemic syllabification using a combination of phonotactic rules and Fuzzy k-nearest neighbour in every Class (FkNNC). The phonotactic rules are designed to find some deterministic syllabification points while FkNNC, as a statistical classifier, is expected to search the remaining stochastic syllabification points. A recovery procedure is proposed to correct the wrong syllabification points produced by FkNNC. Fivefold cross-validating on a dataset of 50k formal words, selected from the great dictionary of the Indonesian language, shows that the proposed model gives syllable error rate (SER) of 2.48% and the proposed recovery procedure reduces the SER to be 2.27%, which is higher than that produced by the phonemic syllabification (only 0.99%). But, this model is capable of handling a dataset of 15k high variance person-names with SER of 7.45% and the proposed recovery procedure reduces the SER to be 6.78%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incorporating syllabification points into a model of grapheme-to-phoneme conversion

Article 06 May 2019

A Syllable Structure Approach to Spoken Language Recognition

Soft-computation based speech recognition system for Sylheti language

Article 25 May 2022

References

Adsett, C. R., Marchand, Y., & Kešelj, V. (2009). Syllabification rules versus data-driven methods in a language with low syllabic complexity: The case of Italian. Computer Speech and Language, 23, 444–463. https://doi.org/10.1016/j.csl.2009.02.004.
Article Google Scholar
Alwi, H., Dardjowidjojo, S., Lapoliwa, H., & Moeliono, A. M. (1998). Tata Bahasa Baku Bahasa Indonesia [The Standard Indonesian Grammar] (3rd ed.). Jakarta: Balai Pustaka.
Google Scholar
Bartlett, S., Kondrak, G., & Cherry, C. (2008). Automatic syllabification with structured SVMs for letter-to-phoneme conversion. In Proceedings of Human Language Technologies: The 2008 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 568–576, Columbus.
Bartlett, S., Kondrak, G., & Cherry, C. (2009). On the syllabification of phonemes. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 308–316. Boulder. https://doi.org/10.3115/1620754.1620799.
Chaer, A. (2009). Fonologi Bahasa Indonesia [Indonesian Phonology]. Jakarta: Rineka Cipta.
Google Scholar
Janakiraman, R., Kumar, J.C., & Murthy, H.A. (2010). Robust syllable segmentation and its application to syllable-centric continuous speech recognition. In National Conference on Communications (NCC), pp. 1–5, Joint Telematics Group of IITs & IISc, Chennai. https://doi.org/10.1109/NCC.2010.5430189. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5430189.
Keller, J. M., Gray, M. R., & Givens, J. A. (1985). A Fuzzy K-Nearest Neighbor Algorithm. EEE Transactions on Systems Man and Cybernetics, 15(4), 580–585. https://doi.org/10.1109/TSMC.1985.6313426.
Article Google Scholar
Kristensen, T. (2000). A neural network approach to hyphenating Norwegian. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, pp. 148–153, Vol. 2, IEEE. https://doi.org/10.1109/IJCNN.2000.857889.
Majewski, P. (2008). Syllable Based Language Model for large vocabulary continuous speech recognition of polish. In P. Sojka, A. Horák, I. Kopeček, & K. Pala (Eds.), Text, Speech and Dialogue (pp. 397–401). Heidelberg: Springer.
Chapter Google Scholar
Marchand, Y., Adsett, C.R., & Damper, R.I. (2007). Evaluating automatic syllabification algorithms for English. In Proceedings of the 6th International Speech Communication Association ISCA Workshop on Speech Synthesis, pp. 316–321.
Mohanty, S. (2011). Phonotactic model for spoken language identification in indian language perspective. International Journal of Computer Applications, 19(9), 18–24.
Article Google Scholar
Suyanto, S., Hartati, S., & Harjoko, A. (2016). Modified grapheme encoding and phonemic rule to improve PNNR-based Indonesian G2P. International Journal of Advanced Computer Science and Applications (IJACSA), 7(3), 430–435.
Google Scholar
Suyanto, S., Hartati, S., Harjoko, A., & Compernolle, D. V. (2016). Indonesian syllabification using a pseudo nearest neighbour rule and phonotactic knowledge. Speech Communication, 85, 109–118. https://doi.org/10.1016/j.specom.2016.10.009.
Article Google Scholar
Tian, J. (2004). Data-driven approaches for automatic detection of syllable boundaries. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 61–64

Download references

Acknowledgements

We would like to thank Forum Alumni Universitas Telkom (FAST) for the dataset of 15k high variance person-names.

Funding

This work is supported by Forum Alumni Universitas Telkom (FAST).

Author information

Authors and Affiliations

School of Computing, Telkom University, Bandung, West Java, 40257, Indonesia
Edwina Anky Parande & Suyanto Suyanto

Authors

Edwina Anky Parande
View author publications
Search author on:PubMed Google Scholar
Suyanto Suyanto
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Suyanto Suyanto.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Parande, E.A., Suyanto, S. Indonesian graphemic syllabification using a nearest neighbour classifier and recovery procedure. Int J Speech Technol 22, 13–20 (2019). https://doi.org/10.1007/s10772-018-09569-3

Download citation

Received: 08 June 2018
Accepted: 13 October 2018
Published: 08 November 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10772-018-09569-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Indonesian graphemic syllabification using a nearest neighbour classifier and recovery procedure

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Incorporating syllabification points into a model of grapheme-to-phoneme conversion

A Syllable Structure Approach to Spoken Language Recognition

Soft-computation based speech recognition system for Sylheti language

Explore related subjects

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now