Comparison of algorithms to divide noisy phone sequences into syllables for automatic unconstrained English speaking proficiency scoring

Johnson, David O.; Kang, Okim

doi:10.1007/s10462-017-9594-y

Comparison of algorithms to divide noisy phone sequences into syllables for automatic unconstrained English speaking proficiency scoring

Published: 22 November 2017

Volume 52, pages 1781–1804, (2019)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

David O. Johnson¹ &
Okim Kang¹

456 Accesses
4 Citations
Explore all metrics

Abstract

Four algorithms for syllabifying phones are compared in automatically scoring English oral proficiency. The first algorithm clusters consonants into groups with the vowel nearer to them temporally, taking into account the maximal onset principle. A Hidden Markov Model (HMM) predicts the syllable boundaries based on their sonority value in the second algorithm. The third one employs three HMMs which are tuned to specific categories of utterances. The final algorithm uses a genetic algorithm to identify a set of rules for syllabifying the phones. They were evaluated by: (1) how well they syllabified utterances from the Boston University Radio News Corpus (BURNC) and (2) how well they worked as part of a process to automatically score English speaking proficiency. A measure of the temporal alignment of the syllables was utilized to judge how satisfactorily they syllabified utterances. Their suitability in the proficiency process was assessed with the Pearson correlation between the computer’s predicted proficiency scores and the scores determined by human examiners. We found that syllabification-by-genetic-algorithm performed the best in syllabifying the BURNC, but that syllabification-by-grouping (i.e., syllables are made by grouping non-syllabic consonant phones with the vowel or syllabic consonant phone nearest to them with respect to time) performed the best in the English oral proficiency rating application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

Article Open access 09 April 2024

Robert W. Wiley, Sartaj Singh, … Jeremy J. Purcell

Chinese dialect speech recognition: a comprehensive survey

Article Open access 31 January 2024

Qiang Li, Qianyu Mai, … Mingjuan Ma

Phonics and Spelling: Learning the Structure of Language at the Word Level

References

Ananthakrishnan S (2004) Statistical syllabification of english phoneme sequences using supervised unsupervised algorithms. CS562 term project report
Baayen RH, Piepenbrock R, Van Rijn H (1993) The CELEX lexical database. Linguistic data consortium, University of Pennsylvania, Philadelphia
Bartlett S, Kondrak G, Cherry C (2009) On the syllabification of phonemes. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics, pp 308–316
Bernstein J, Van Moere A, Cheng J (2010) Validating automated speaking tests. Lang Test 27:355–377
Article Google Scholar
Boersma P, Weenink D (2014) Praat: doing phonetics by computer (version 5.3.83) [Computer program]. Retrieved 19 Aug 2014
Brazil D (1997) The communicative value of intonation in english book. Cambridge University Press, Cambridge
Google Scholar
Cambridge English Language Assessment (2015) [Online]. www.cambridgeenglish.org. Accessed 24 Sept 2017
Clements GN (1990) The role of the sonority cycle in core syllabification. Pap Lab Phonol 1:283–333
Article Google Scholar
Daelemans W, van den Bosch A (1992) Generalization performance of backpropagation learning on a syllabification task. In: Proceedings of the 3rd twente workshop on language technology, pp 27–38
Daelemans W, van den Bosch A, Weijters T (1997) IGTree: using trees for compression classification in lazy learning algorithms. Artif Intell Rev 11(1–5):407–423
Article Google Scholar
Dehak N, Dumouchel P, Kenny P (2007) Modeling prosodic features with joint factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 15(7):2095–2103
Article Google Scholar
Demberg V (2006) Letter-to-phoneme conversion for a German text-to-speech system. Master’s thesis, University of Stuttgart
Evanini K, Wang X (2013) Automated speech scoring for non-native middle school students with multiple task types. In: INTERSPEECH, pp 2435–2439
Fine S, Singer Y, Tishby N (1998) The hierarchical hidden Markov model: analysis and applications. Mach Learn 32(1):41–62
Article MATH Google Scholar
Fisher W (1996) The tsylb2 program: algorithm description. NIST, 1996b, Part of the tsylb2-11 software package
Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS (1993) DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM NIST speech disc 1–11. NASA STI/Recon Technical Report N, vol 93, p 27403
Goldwater S, Johnson M (2005) Representational bias in unsupervised learning of syllable structure. In: Proceedings of the ninth conference on computational natural language learning, pp 112–119
Hammond M (1997) Parsing syllables: modeling OT computationally. Preprint arXiv:cmp-lg/9710004
Hooper JB (1972) The syllable in phonological theory. Language 48:525–540
Article Google Scholar
Johnson DO, Kang O (2015) Automatic prominent syllable detection with machine learning classifiers. Int J Speech Technol 18(4):583–592
Article Google Scholar
Johnson DO, Kang O (2016) Automatic detection of Brazil’s prosodic tone unit. In: Proceedings of speech prosody 8, Boston, MA, USA, May 31–June 3, 2016
Johnson DO, Kang O, Ghanem R (2016a) Improved automatic english proficiency rating of unconstrained speech with multiple corpora. Int J Speech Technol 19(4):755–768. https://doi.org/10.1007/s10772-016-9366-0
Article Google Scholar
Johnson DO, Kang O, Ghanem R (2016b) Language proficiency ratings: human versus machine. In: Proceedings of the 7th pronunciation in second language learning and teaching conference, pp 119–129
Kahn D (1976) Syllable-based generalizations in English phonology, vol 156. Indiana University Linguistics Club, Bloomington
Google Scholar
Kibre N, Pearson S, Kuhn R, Fincke S (2000) Automatic methods for lexical stress assignment syllabification. In: The proceedings of the 6th international conference on spoken language processing, vol 2
Kiraz GA, Möbius B (1998) Multilingual syllabification using weighted finite-state transducers. In: The third ESCA/COCOSDA workshop (ETRW) on speech synthesis
Kockmann M, Burget L (2008) Contour modeling of prosodic acoustic features for speaker recognition. In: 2008 IEEE spoken language technology workshop—SLT, pp 45–48
Kockmann M, Burget L, Černocky JH (2010) Investigations into prosodic syllable contour features for speaker recognition. In: 2010 IEEE international conference on acoustics speech signal processing (ICASSP), pp 4418–4421
Krenn B (1997) Tagging syllables. In: Fifth European conference on speech communication and technology (EUROSPEECH’97)
Lin CY, Wang HC (2005) Language identification using pitch contour information. In: ICASSP, vol 1, pp 601–604
Longman P (2009) Official guide to Pearson test of English academic (with CD-ROM). Pearson Education, India
Google Scholar
Marchand Y, Adsett CR, Damper RI (2007) Evaluating automatic syllabification algorithms for English. 316–321
Marchand Y, Adsett CR, Damper RI (2009) Automatic syllabification in English: a comparison of different algorithms. Lang Speech 52(1):1–27
Article Google Scholar
Mary L, Yegnanarayana B (2008) Extraction representation of prosodic features for language speaker recognition. Speech Commun 50(10):782–796
Article Google Scholar
MathWorks, Inc (2013) MATLAB release 2013a. [Computer program]
Mayer T (2010) Toward a totally unsupervised, language-independent method for the syllabification of written texts. In: Proceedings of the 11th meeting of the ACL special interest group on computational morphology phonology, pp 63–71
Müller K (2001) Automatic detection of syllable boundaries combining the advantages of treebank bracketed corpora training. In: Proceedings of the 39th annual meeting on association for computational linguistics, pp 410–417
Müller K (2006) Improving syllabification models with phonotactic knowledge. In: Proceedings of the eighth meeting of the ACL special interest group on computational phonology morphology, pp 11–20
Oller DK, Niyogi P, Gray S, Richards JA, Gilkerson J, Xu D, Yapaneld U, Warren SF (2010) Automated vocal analysis of naturalistic recordings from children with autism, language delay, typical development. Proc Natl Acad Sci 107(30):13354–13359
Article Google Scholar
Ostendorf M, Price PJ, Shattuck-Hufnagel S (1995) The Boston University radio news corpus. Linguist Data Consort 323:1–19
Google Scholar
Ouellet P, Dumouchel P (2001) Heuristic syllabification and statistical syllable-based modeling for speech-input topic identification. In: Workshop on grammar and NLP, pp 13–14
Pearson Education, Inc (2015) Versant Spanish test. http://www.versanttest.com/products/spanish.jsp. Accessed 24 Sept 2017
Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlíček P, Qian Y, Schwarz P, Silovsky J, Stemmer G, Vesel K (2011) The Kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding (No. EPFL-CONF-192584). IEEE Signal Processing Society
Pulgram E (1970) Syllable, word, nexus, cursus. No. 81–85. Mouton, The Hague
Rogova K, Demuynck K, Van Compernolle D (2013) Automatic syllabification using segmental conditional random fields. In: Book of abstracts of the 23rd meeting of computational linguistics in the Netherlands: CLIN, p 41
Schmid H, Möbius B, Weidenkaff J (2007) Tagging syllable boundaries with joint n-gram models. In: INTERSPEECH, pp 2857–2860
Selkirk EO (1984) On the major class features and syllable theory. 107–136
Shriberg E, Ferrer L, Kajarekar S, Venkataraman A, Stolcke A (2005) Modeling prosodic feature sequences for speaker recognition. Speech Commun 46(3):455–472
Article Google Scholar
Taherdangkoo M, Paziresh M, Yazdi M, Bagheri M (2013) An efficient algorithm for function optimization: modified stem cells algorithm. Open Eng 3(1):36–50
Article Google Scholar
Tuzlukov V (2002) Signal processing noise, vol 8. CRC Press, Boca Rotan
Book MATH Google Scholar
Vennemann T (1987) Preference laws for syllable structure: and the explanation of sound change with special reference to German, Germanic, Italian, and Latin. Walter de Gruyter, Berlin
Book Google Scholar
Zechner K, Higgins D, Xi X, Williamson DM (2009) Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Commun 51(10):883–895
Article Google Scholar
Zhang J, Hamilton HJ (1997) Learning English syllabification for words. Foundations of intelligent systems. Springer, Berlin, pp 177–186
Chapter Google Scholar
Ziaei A, Kaushik L, Sangwan A, Hansen JH, Oard D (2014) Speech activity detection for nasa apollo space missions: challenges solutions. In: INTERSPEECH

Download references

Acknowledgements

The authors would like to thank Michael Albanese, Tory Bottiglieriy, Trent Coopery, Drew McDaniely, and Adam Thomas for developing the initial prototypes of the HMM, k-means, and genetic algorithm methods of syllabification as part of their senior Computer Science Capstone project at Northern Arizona University.

Author information

Authors and Affiliations

Northern Arizona University, Flagstaff, AZ, 86011, USA
David O. Johnson & Okim Kang

Authors

David O. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Okim Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David O. Johnson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Johnson, D.O., Kang, O. Comparison of algorithms to divide noisy phone sequences into syllables for automatic unconstrained English speaking proficiency scoring. Artif Intell Rev 52, 1781–1804 (2019). https://doi.org/10.1007/s10462-017-9594-y

Download citation

Published: 22 November 2017
Issue Date: 01 October 2019
DOI: https://doi.org/10.1007/s10462-017-9594-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of algorithms to divide noisy phone sequences into syllables for automatic unconstrained English speaking proficiency scoring

Abstract

Access this article

Similar content being viewed by others

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

Chinese dialect speech recognition: a comprehensive survey

Phonics and Spelling: Learning the Structure of Language at the Word Level

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comparison of algorithms to divide noisy phone sequences into syllables for automatic unconstrained English speaking proficiency scoring

Abstract

Access this article

Similar content being viewed by others

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

Chinese dialect speech recognition: a comprehensive survey

Phonics and Spelling: Learning the Structure of Language at the Word Level

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation