Skip to main content

Advertisement

Log in

Improved automatic English proficiency rating of unconstrained speech with multiple corpora

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The performance of machine learning classifiers in automatically scoring the English proficiency of unconstrained speech has been explored. Suprasegmental measures were computed by software, which identifies the basic elements of Brazil’s model in human discourse. This paper explores machine learning training with multiple corpora to improve two of those algorithms: prominent syllable detection and tone choice classification. The results show that machine learning training with the Boston University Radio News Corpus can improve automatic English proficiency scoring of unconstrained speech from a Pearson’s correlation of 0.677–0.718. This correlation is higher than any other existing computer programs for automatically scoring the proficiency of unconstrained speech and is approaching that of human raters in terms of inter-rater reliability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® V. 2. The Journal of Technology, Learning and Assessment, 4(3), 3–30.

    Google Scholar 

  • Bernstein, J. (1999). PhonePass testing: Structure and construct. Menlo Park: Ordinate Corporation.

    Google Scholar 

  • Bernstein, J., Van Moere, A., & Cheng, J. (2010). Validating automated speaking tests. Language Testing, 27(3), 355–377.

    Article  Google Scholar 

  • Boersma, P., & Weenink, D. (2014). Praat: doing phonetics by computer (Version 5.3.83), [Computer program]. Retrieved August 19, 2014.

  • Brazil, D. (1997). The communicative value of intonation in English. Cambridge: Cambridge University Press.

    Google Scholar 

  • Burstein, J., Kukich, K., Braden-Harder, L., Chodorow, M., Hua, S., Kaplan, B., et al. (1998). Computer analysis of essay content for automated score prediction: A prototype automated scoring system for GMAT analytical writing assessment essays. ETS Research Report Series, 1998(1), i-67.

    Article  Google Scholar 

  • Cambridge English Language Assessment (2015). Retrieved March 29, 2015 from www.cambridgeenglish.org.

  • Černý, V. (1985). Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. Journal of Optimization Theory and Applications, 45(1), 41–51.

    Article  MathSciNet  MATH  Google Scholar 

  • Chodorow, M., & Burstein, J. (2004). Beyond essay length: evaluating e‐rater®’s performance on toefl® essays. ETS Research Report Series, 2004(1), i-38.

    Article  Google Scholar 

  • Chun, D. M. (2002). Discourse intonation in L2: From theory and research to practice (Vol. 1). Philadelphia: John Benjamins Publishing.

    Book  Google Scholar 

  • Evanini, K., & Wang, X. (2013). Automated speech scoring for non-native middle school students with multiple task types. In INTERSPEECH (pp. 2435–2439).

  • Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., & Pallett, D. S. (1993). DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical Report N, 93, 27403.

  • Hastie, T., & Tibshirani, R. (1998). Classification by pairwise coupling. The Annals of Statistics, 26(2), 451–471.

    Article  MathSciNet  MATH  Google Scholar 

  • Johnson, D. O., & Kang, O. (2015). Automatic prominent syllable detection with machine learning classifiers. International Journal of Speech Technology, 18(4), 583–592.

    Article  Google Scholar 

  • Johnson, D. O., & Kang, O. (2016). Automatic prosodic tone choice classification with Brazil’s intonation model. International Journal of Speech Technology, 19(1), 95–109.

    Article  Google Scholar 

  • Kahn, D. (1976). Syllable-based generalizations in English phonology (Vol. 156). Bloomington: Indiana University Linguistics Club.

    Google Scholar 

  • Kang, O. (2010). Relative salience of suprasegmental features on judgments of L2 comprehensibility and accentedness. System, 38(2), 301–315.

    Article  Google Scholar 

  • Kang, O., & Johnson, D. O. (2015). Comparison of inter-rater reliability of human and computer prosodic annotation using brazil’s prosody model. English Linguistics Research, 4(4), p58.

    Article  Google Scholar 

  • Kang, O., & Johnson, D. O. (2016). Systems and Methods for Automated Evaluation of Human Speech. U.S. Patent Application No. 15/054,128. Washington, DC: U.S. Patent and Trademark Office.

  • Kang, O., Rubin, D., & Pickering, L. (2010). Suprasegmental measures of accentedness and judgments of language learner proficiency in oral English. The Modern Language Journal, 94(4), 554–566.

    Article  Google Scholar 

  • Kang, O., & Wang, L. (2014). Impact of different task types on candidates’ speaking performances and interactive features that distinguish between CEFR levels. ISSN 1756-509X, 40.

  • KayPENTAX. (2008). Multi-speech and CSL software. Lincoln Park: KayPENTAX.

    Google Scholar 

  • Kirkpatrick, S., & Vecchi, M. P. (1983). Optimization by simmulated annealing. Science, 220(4598), 671–680.

    Article  MathSciNet  MATH  Google Scholar 

  • Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211.

    Article  Google Scholar 

  • Leacock, C. (2004). Scoring free-responses automatically: A case study of a large-scale assessment. Examens, 1(3).

  • Leacock, C., & Chodorow, M. (2003). C-rater: Automated scoring of short-answer questions. Computers and the Humanities, 37(4), 389–405.

    Article  Google Scholar 

  • Longman, P. (2013). Official guide to Pearson test of English academic.

  • MathWorks, Inc. (2013). MATLAB Release 2013a. [Computer program]. Retrieved February 15, 2013.

  • Ostendorf, M., Price, P. J., & Shattuck-Hufnagel, S. (1995). The Boston University radio news corpus. Linguistic Data Consortium, pp. 1–19.

  • Pearson Education, Inc. (2015). Versant English Test. Retrieved from https://www.versanttest.com/products/english.jsp.

  • Pickering, L. (1999). An analysis of prosodic systems in the classroom discourse of native speaker and nonnative speaker teaching assistants (Doctoral dissertation, University of Florida).

  • Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlíček, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., & Vesel, K. (2011). The Kaldi speech recognition toolkit.

  • Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of IntelliMetric™ essay scoring system. The Journal of Technology, Learning and Assessment, 4(4), 1–22.

    Google Scholar 

  • Zechner, K., Higgins, D., Xi, X., & Williamson, D. M. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication, 51(10), 883–895.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David O. Johnson.

Appendices

Appendix

The 35 suprasegmental measures are computed as follows:

Definitions

  1. 1.

    # (i.e. number of) syllables includes coughs, laughs, etc. and filled pauses.

  2. 2.

    # runs = # silent pauses +1.

  3. 3.

    Duration of utterance includes silent and filled pauses (in seconds).

  4. 4.

    (Prominent) syllable pitch = maximum F0 of low-pass filtered Praat pitch contour.

  5. 5.

    Tone choice only calculated for termination prominent syllables.

  6. 6.

    Relative pitch only calculated for key and termination prominent syllables.

  7. 7.

    Paratone boundary = termination followed by key with higher relative pitch.

  8. 8.

    Lexical item = prominent syllable that occurs more than once in an utterance.

  9. 9.

    New lexical item = first time a lexical item occurs.

  10. 10.

    Given lexical item = subsequent times a lexical item occurs.

Calculations

SYPS = # syllables/duration of utterance.

PHTR = (duration of utterance − duration of silent pauses)/duration of utterance.

ARTI = # syllables/(duration of utterance − duration of silent pauses) = SYPS/PHTR.

RNLN = # syllables/# runs.

SPRT = # silent pauses/duration of utterance * 60.

SPLN = duration of silent pauses/# silent pauses.

FPRT = # filled pauses/duration of utterance * 60.

FPLN = duration of filled pauses/# filled pauses.

SPAC = # prominent syllables/# syllables.

PACE = # prominent syllables/# runs = SPAC * RNLN.

PCHR = # tone units (termination prominent syllables) / # runs.

RISL = % of termination prominent syllables with rising tone choice & low relative pitch.

RISM = % of termination prominent syllables with rising tone choice & mid relative pitch.

RISH = % of termination prominent syllables with rising tone choice & high relative pitch.

NEUL = % of termination prominent syllables with neutral tone choice & low relative pitch.

NEUM = % of termination prominent syllables with neutral tone choice & mid relative pitch.

NEUH = % of termination prominent syllables with neutral tone choice & high relative pitch.

FALL = % of termination prominent syllables with falling tone choice & low relative pitch.

FALM = % of termination prominent syllables with falling tone choice & mid relative pitch.

FALH = % of termination prominent syllables with falling tone choice & high relative pitch.

FRSL = % of termination prominent syllables with fall-rise tone choice & low relative pitch.

FRSM = % of termination prominent syllables with fall-rise tone choice & mid relative pitch.

FRSH = % of termination prominent syllables with fall-rise tone choice & high relative pitch.

RFAL = % of termination prominent syllables with rise-fall tone choice & low relative pitch.

RFAM = % of termination prominent syllables with rise-fall tone choice & mid relative pitch.

RFAH = % of termination prominent syllables with rise-fall tone choice & high relative pitch.

PRAN = maximum prominent syllable pitch of utterance – minimum prominent syllable pitch of utterance.

AVNP = average non-prominent syllable pitch.

AVPP = average prominent syllable pitch.

PARA = # paratone boundaries / duration of utterance.

TPTH = average pitch of termination prominent syllables at paratone boundaries.

OPTH = average pitch of key prominent syllables at paratone boundaries.

PPLN = average duration of silent pauses at paratone boundaries (if present).

NEWP = average pitch of new lexical items.

GIVP = average pitch of given lexical items.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Johnson, D.O., Kang, O. & Ghanem, R. Improved automatic English proficiency rating of unconstrained speech with multiple corpora. Int J Speech Technol 19, 755–768 (2016). https://doi.org/10.1007/s10772-016-9366-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-016-9366-0

Keywords

Navigation