Improved automatic English proficiency rating of unconstrained speech with multiple corpora

Johnson, David O.; Kang, Okim; Ghanem, Romy

doi:10.1007/s10772-016-9366-0

Improved automatic English proficiency rating of unconstrained speech with multiple corpora

Published: 19 September 2016

Volume 19, pages 755–768, (2016)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

David O. Johnson¹,
Okim Kang¹ &
Romy Ghanem¹

324 Accesses
6 Citations
Explore all metrics

Abstract

The performance of machine learning classifiers in automatically scoring the English proficiency of unconstrained speech has been explored. Suprasegmental measures were computed by software, which identifies the basic elements of Brazil’s model in human discourse. This paper explores machine learning training with multiple corpora to improve two of those algorithms: prominent syllable detection and tone choice classification. The results show that machine learning training with the Boston University Radio News Corpus can improve automatic English proficiency scoring of unconstrained speech from a Pearson’s correlation of 0.677–0.718. This correlation is higher than any other existing computer programs for automatically scoring the proficiency of unconstrained speech and is approaching that of human raters in terms of inter-rater reliability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated evaluation of foreign language speaking performance with machine learning

Article 03 August 2021

Ramon F. Brena, Evelyn Zuvirie, … Carlos Zozaya-Gorostiza

Automated Speech Scoring System Under The Lens

Article 15 March 2022

Pakhi Bamdev, Manraj Singh Grover, … Rajiv Ratn Shah

Classification of Speaking Proficiency Level by Machine Learning and Feature Selection

References

Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater^® V. 2. The Journal of Technology, Learning and Assessment, 4(3), 3–30.
Google Scholar
Bernstein, J. (1999). PhonePass testing: Structure and construct. Menlo Park: Ordinate Corporation.
Google Scholar
Bernstein, J., Van Moere, A., & Cheng, J. (2010). Validating automated speaking tests. Language Testing, 27(3), 355–377.
Article Google Scholar
Boersma, P., & Weenink, D. (2014). Praat: doing phonetics by computer (Version 5.3.83), [Computer program]. Retrieved August 19, 2014.
Brazil, D. (1997). The communicative value of intonation in English. Cambridge: Cambridge University Press.
Google Scholar
Burstein, J., Kukich, K., Braden-Harder, L., Chodorow, M., Hua, S., Kaplan, B., et al. (1998). Computer analysis of essay content for automated score prediction: A prototype automated scoring system for GMAT analytical writing assessment essays. ETS Research Report Series, 1998(1), i-67.
Article Google Scholar
Cambridge English Language Assessment (2015). Retrieved March 29, 2015 from www.cambridgeenglish.org.
Černý, V. (1985). Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. Journal of Optimization Theory and Applications, 45(1), 41–51.
Article MathSciNet MATH Google Scholar
Chodorow, M., & Burstein, J. (2004). Beyond essay length: evaluating e‐rater^®’s performance on toefl^® essays. ETS Research Report Series, 2004(1), i-38.
Article Google Scholar
Chun, D. M. (2002). Discourse intonation in L2: From theory and research to practice (Vol. 1). Philadelphia: John Benjamins Publishing.
Book Google Scholar
Evanini, K., & Wang, X. (2013). Automated speech scoring for non-native middle school students with multiple task types. In INTERSPEECH (pp. 2435–2439).
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., & Pallett, D. S. (1993). DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical Report N, 93, 27403.
Hastie, T., & Tibshirani, R. (1998). Classification by pairwise coupling. The Annals of Statistics, 26(2), 451–471.
Article MathSciNet MATH Google Scholar
Johnson, D. O., & Kang, O. (2015). Automatic prominent syllable detection with machine learning classifiers. International Journal of Speech Technology, 18(4), 583–592.
Article Google Scholar
Johnson, D. O., & Kang, O. (2016). Automatic prosodic tone choice classification with Brazil’s intonation model. International Journal of Speech Technology, 19(1), 95–109.
Article Google Scholar
Kahn, D. (1976). Syllable-based generalizations in English phonology (Vol. 156). Bloomington: Indiana University Linguistics Club.
Google Scholar
Kang, O. (2010). Relative salience of suprasegmental features on judgments of L2 comprehensibility and accentedness. System, 38(2), 301–315.
Article Google Scholar
Kang, O., & Johnson, D. O. (2015). Comparison of inter-rater reliability of human and computer prosodic annotation using brazil’s prosody model. English Linguistics Research, 4(4), p58.
Article Google Scholar
Kang, O., & Johnson, D. O. (2016). Systems and Methods for Automated Evaluation of Human Speech. U.S. Patent Application No. 15/054,128. Washington, DC: U.S. Patent and Trademark Office.
Kang, O., Rubin, D., & Pickering, L. (2010). Suprasegmental measures of accentedness and judgments of language learner proficiency in oral English. The Modern Language Journal, 94(4), 554–566.
Article Google Scholar
Kang, O., & Wang, L. (2014). Impact of different task types on candidates’ speaking performances and interactive features that distinguish between CEFR levels. ISSN 1756-509X, 40.
KayPENTAX. (2008). Multi-speech and CSL software. Lincoln Park: KayPENTAX.
Google Scholar
Kirkpatrick, S., & Vecchi, M. P. (1983). Optimization by simmulated annealing. Science, 220(4598), 671–680.
Article MathSciNet MATH Google Scholar
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211.
Article Google Scholar
Leacock, C. (2004). Scoring free-responses automatically: A case study of a large-scale assessment. Examens, 1(3).
Leacock, C., & Chodorow, M. (2003). C-rater: Automated scoring of short-answer questions. Computers and the Humanities, 37(4), 389–405.
Article Google Scholar
Longman, P. (2013). Official guide to Pearson test of English academic.
MathWorks, Inc. (2013). MATLAB Release 2013a. [Computer program]. Retrieved February 15, 2013.
Ostendorf, M., Price, P. J., & Shattuck-Hufnagel, S. (1995). The Boston University radio news corpus. Linguistic Data Consortium, pp. 1–19.
Pearson Education, Inc. (2015). Versant English Test. Retrieved from https://www.versanttest.com/products/english.jsp.
Pickering, L. (1999). An analysis of prosodic systems in the classroom discourse of native speaker and nonnative speaker teaching assistants (Doctoral dissertation, University of Florida).
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlíček, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., & Vesel, K. (2011). The Kaldi speech recognition toolkit.
Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of IntelliMetric™ essay scoring system. The Journal of Technology, Learning and Assessment, 4(4), 1–22.
Google Scholar
Zechner, K., Higgins, D., Xi, X., & Williamson, D. M. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication, 51(10), 883–895.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Northern Arizona University, Flagstaff, AZ, USA
David O. Johnson, Okim Kang & Romy Ghanem

Authors

David O. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Okim Kang
View author publications
You can also search for this author in PubMed Google Scholar
Romy Ghanem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David O. Johnson.

Appendices

Appendix

The 35 suprasegmental measures are computed as follows:

Definitions

1.
# (i.e. number of) syllables includes coughs, laughs, etc. and filled pauses.
2.
# runs = # silent pauses +1.
3.
Duration of utterance includes silent and filled pauses (in seconds).
4.
(Prominent) syllable pitch = maximum F0 of low-pass filtered Praat pitch contour.
5.
Tone choice only calculated for termination prominent syllables.
6.
Relative pitch only calculated for key and termination prominent syllables.
7.
Paratone boundary = termination followed by key with higher relative pitch.
8.
Lexical item = prominent syllable that occurs more than once in an utterance.
9.
New lexical item = first time a lexical item occurs.
10.
Given lexical item = subsequent times a lexical item occurs.

Calculations

SYPS = # syllables/duration of utterance.

PHTR = (duration of utterance − duration of silent pauses)/duration of utterance.

ARTI = # syllables/(duration of utterance − duration of silent pauses) = SYPS/PHTR.

RNLN = # syllables/# runs.

SPRT = # silent pauses/duration of utterance * 60.

SPLN = duration of silent pauses/# silent pauses.

FPRT = # filled pauses/duration of utterance * 60.

FPLN = duration of filled pauses/# filled pauses.

SPAC = # prominent syllables/# syllables.

PACE = # prominent syllables/# runs = SPAC * RNLN.

PCHR = # tone units (termination prominent syllables) / # runs.

RISL = % of termination prominent syllables with rising tone choice & low relative pitch.

RISM = % of termination prominent syllables with rising tone choice & mid relative pitch.

RISH = % of termination prominent syllables with rising tone choice & high relative pitch.

NEUL = % of termination prominent syllables with neutral tone choice & low relative pitch.

NEUM = % of termination prominent syllables with neutral tone choice & mid relative pitch.

NEUH = % of termination prominent syllables with neutral tone choice & high relative pitch.

FALL = % of termination prominent syllables with falling tone choice & low relative pitch.

FALM = % of termination prominent syllables with falling tone choice & mid relative pitch.

FALH = % of termination prominent syllables with falling tone choice & high relative pitch.

FRSL = % of termination prominent syllables with fall-rise tone choice & low relative pitch.

FRSM = % of termination prominent syllables with fall-rise tone choice & mid relative pitch.

FRSH = % of termination prominent syllables with fall-rise tone choice & high relative pitch.

RFAL = % of termination prominent syllables with rise-fall tone choice & low relative pitch.

RFAM = % of termination prominent syllables with rise-fall tone choice & mid relative pitch.

RFAH = % of termination prominent syllables with rise-fall tone choice & high relative pitch.

PRAN = maximum prominent syllable pitch of utterance – minimum prominent syllable pitch of utterance.

AVNP = average non-prominent syllable pitch.

AVPP = average prominent syllable pitch.

PARA = # paratone boundaries / duration of utterance.

TPTH = average pitch of termination prominent syllables at paratone boundaries.

OPTH = average pitch of key prominent syllables at paratone boundaries.

PPLN = average duration of silent pauses at paratone boundaries (if present).

NEWP = average pitch of new lexical items.

GIVP = average pitch of given lexical items.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Johnson, D.O., Kang, O. & Ghanem, R. Improved automatic English proficiency rating of unconstrained speech with multiple corpora. Int J Speech Technol 19, 755–768 (2016). https://doi.org/10.1007/s10772-016-9366-0

Download citation

Received: 01 July 2016
Accepted: 22 August 2016
Published: 19 September 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10772-016-9366-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved automatic English proficiency rating of unconstrained speech with multiple corpora

Abstract

Access this article

Similar content being viewed by others

Automated evaluation of foreign language speaking performance with machine learning

Automated Speech Scoring System Under The Lens

Classification of Speaking Proficiency Level by Machine Learning and Feature Selection

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix

Definitions

Calculations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved automatic English proficiency rating of unconstrained speech with multiple corpora

Abstract

Access this article

Similar content being viewed by others

Automated evaluation of foreign language speaking performance with machine learning

Automated Speech Scoring System Under The Lens

Classification of Speaking Proficiency Level by Machine Learning and Feature Selection

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix

Definitions

Calculations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation