Skip to main content

S-Capade: Spelling Correction Aimed at Particularly Deviant Errors

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2020)

Abstract

S-capade (spelling correction aimed at particularly deviant errors) is a phonemic distance based spellchecking tool (Source code repository may be found in the references section [35].) intended for the correction of misspellings made by children. Whilst typographic misspellings typically deviate from the target by only one or two characters, children’s misspellings tend to be more phonetic. They are influenced both by how the child perceives the pronunciation of a word and by the letters they choose to represent that pronunciation. As such, these misspellings are particularly deviant from the target and can negatively impact the performance of conventional spellcheckers. In this paper we demonstrate that S-capade is capable of correcting a significant portion of misspellings made by children where conventional correction tools fail.

E. O’Neill and R. Young—Both the authors have equal contribution to this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Throughout this paper we use the ARPAbet notation when referring to phonemes.

References

  1. Atkinson, K.: Aspell spell checker test data (2002). http://aspell.net/test/cur-all/batch0.tab. Accessed 19 May 2020

  2. Aw, A., Zhang, M., Xiao, J., Su, J.: A phrase-based statistical model for SMS text normalization. In: COLING/ACL, pp. 33–40 (2006)

    Google Scholar 

  3. Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: ACL, pp. 286–293 (2000)

    Google Scholar 

  4. Church, K.W., Gale, W.A.: Probability scoring for spelling correction. Stat. Comput. 1(2), 93–103 (1991)

    Article  Google Scholar 

  5. CMUSphinx: Grapheme-to-phoneme tool based on sequence-to-sequence learning (2016). https://github.com/cmusphinx/g2p-seq2seq

  6. Collins, B., Mees, I.M.: Practical Phonetics and Phonology: A Resource Book for Students. Routledge, London (2013)

    Book  Google Scholar 

  7. Daffern, T., Critten, S.: Student and teacher perspectives on spelling. Aust. J. Lang. Literacy 42(1), 40–57 (2019)

    Google Scholar 

  8. Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)

    Article  Google Scholar 

  9. Fisher, W.M.: A statistical text-to-phone function using ngrams and rules. ICASSP. 2, 649–652 (1999)

    Google Scholar 

  10. Fourakis, M., Port, R.: Stop epenthesis in English. J. Phonetics 14(2), 197–221 (1986)

    Article  Google Scholar 

  11. Francis, W.N., Kucera, H.: Brown corpus manual (1979)

    Google Scholar 

  12. FrequencyWords: Frequency word list generator (2020). https://github.com/hermitdave/FrequencyWords. Accessed 21 May 2020

  13. Gallagher, G., Graff, P.: The role of similarity in phonology. Lingua 2(122), 107–111 (2012)

    Article  Google Scholar 

  14. Garbe, W.: Symspell (2020). https://github.com/wolfgarbe/symspell. Accessed 21 May 2020

  15. Gimson, A.C., Ramsaran, S.: An Introduction to the Pronunciation of English, vol. 4. Edward Arnold London, London (1970)

    Google Scholar 

  16. Google: Google books Ngram viewer (2012). http://storage.googleapis.com/books/ngrams/books/datasetsv2.html. Accessed 21 May 2020

  17. Hodge, V.J., Austin, J.: An evaluation of phonetic spell checkers (2001)

    Google Scholar 

  18. Itô, J.: A prosodic theory of epenthesis. Nat. Lang. Linguist. Theory 7(2), 217–259 (1989)

    Article  Google Scholar 

  19. Kane, M., Carson-Berndsen, J.: Enhancing data-driven phone confusions using restricted recognition. In: INTERSPEECH, pp. 3693–3697 (2016)

    Google Scholar 

  20. Kevin Atkinson, G.A.: How aspell works (2004). http://aspell.net/0.50-doc/man-html/8_How.html. Accessed 21 May 2020

  21. Khoury, R.: Microtext normalization using probably-phonetically-similar word discovery. In: WiMob, pp. 384–391 (2015)

    Google Scholar 

  22. Kukich, K.: Techniques for automatically correcting words in text. ACM Comput. Surv. (CSUR) 24(4), 377–439 (1992)

    Article  Google Scholar 

  23. de Mendonça Almeida, G.A., Avanço, L., Duran, M.S., Fonseca, E.R., Nunes, M.d.G.V., Aluísio, S.M.: Evaluating phonetic spellers for user-generated content in Brazilian Portuguese. In: International Conference on Computational Processing of the Portuguese Language, pp. 361–373 (2016)

    Google Scholar 

  24. Mitton, R.: Birkbeck spelling error corpus (1980). https://ota.bodleian.ox.ac.uk/repository/xmlui/handle/20.500.12024/0643. Accessed 19 May 2020

  25. Mitton, R.: Corpora of misspellings for download (2007). https://www.dcs.bbk.ac.uk/~ROGER/corpora.html. Accessed 19 May 2020

  26. Norvig, P.: Pyspellchecker (2020). https://pypi.org/project/pyspellchecker/. Accessed 21 May 2020

  27. O’Neill, E., Carson-Berndsen, J.: The effect of phoneme distribution on perceptual similarity in English. In: INTERSPEECH, pp. 1941–1945 (2019)

    Google Scholar 

  28. Philips, L.: The double metaphone search algorithm. C/C++ users J. 18(6), 38–43 (2000)

    Google Scholar 

  29. Read, C.: Children’s Creative Spelling. Routledge, London (2018)

    Book  Google Scholar 

  30. Russell, R., Odell, M.: Soundex. US patent 1,261,167 (1918)

    Google Scholar 

  31. Silfverberg, M., Kauppinen, P., Lindén, K.: Data-driven spelling correction using weighted finite-state methods. In: SIGFSM Workshop on Statistical NLP and Weighted Automata, pp. 51–59 (2016)

    Google Scholar 

  32. Stüker, S., Fay, J., Berkling, K.: Towards context-dependent phonetic spelling error correction in children’s freely composed text for diagnostic and pedagogical purposes. In: INTERSPEECH (2011)

    Google Scholar 

  33. Torgerson, C., Brooks, G., Hall, J.: A Systematic Review of the Research Literature on the Use of Phonics in the Teaching of Reading and Spelling. DfES Publications, Nottingham (2006)

    Google Scholar 

  34. Toutanova, K., Moore, R.C.: Pronunciation modeling for improved spelling correction. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 144–151 (2002)

    Google Scholar 

  35. University College Dublin: S-capade github repository (2020). https://github.com/ucd-csl/Scapade. Accessed 15 Jul 2020

  36. Veronis, J.: Computerized correction of phonographic errors. Comput. Humanit. 22(1), 43–56 (1988)

    Article  Google Scholar 

  37. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. JACM 21(1), 168–173 (1974)

    Article  MathSciNet  Google Scholar 

  38. Weide, R.L.: The CMU pronouncing dictionary (1998). http://www.speech.cs.cmu.edu/cgi-bin/cmudict

  39. Wikipedia: Wikipedia:lists of common misspellings (2020). https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings. Accessed 19 May 2020

  40. Wolf Garbe, S.: Symspellpy (2020). https://github.com/mammothb/symspellpy. Accessed 21 May 2020

  41. Wordlist, A.: Scowl (spell checker oriented word lists) (2019). http://wordlist.aspell.net. Accessed 21 May 2020

  42. Yip, M.: English vowel epenthesis. Nat. Lang. Linguist. Theory 5, 463–484 (1987). https://doi.org/10.1007/BF00138986

    Article  Google Scholar 

  43. Zeeko: Free text survey responses (2020). https://zeeko.ie. Accessed 19 May 2020

Download references

Acknowledgements

This work was supported with the financial support of the Science Foundation Ireland grant 13/RC/2094 to Lero - the SFI Research Centre for Software (www.lero.ie). The ADAPT Centre for Digital Content Technology (www.adaptcentre.ie) is funded under the SFI Research Centres Programme (Grant 13/RC/2106). The authors would like to thank the team at Zeeko (https://zeeko.ie/) for supporting their research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emma O’Neill .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

O’Neill, E., Young, R., Thiaville, E., MacCarthy, M., Carson-Berndsen, J., Ventresque, A. (2020). S-Capade: Spelling Correction Aimed at Particularly Deviant Errors. In: Espinosa-Anke, L., Martín-Vide, C., Spasić, I. (eds) Statistical Language and Speech Processing. SLSP 2020. Lecture Notes in Computer Science(), vol 12379. Springer, Cham. https://doi.org/10.1007/978-3-030-59430-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59430-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59429-9

  • Online ISBN: 978-3-030-59430-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics