Abstract
S-capade (spelling correction aimed at particularly deviant errors) is a phonemic distance based spellchecking tool (Source code repository may be found in the references section [35].) intended for the correction of misspellings made by children. Whilst typographic misspellings typically deviate from the target by only one or two characters, children’s misspellings tend to be more phonetic. They are influenced both by how the child perceives the pronunciation of a word and by the letters they choose to represent that pronunciation. As such, these misspellings are particularly deviant from the target and can negatively impact the performance of conventional spellcheckers. In this paper we demonstrate that S-capade is capable of correcting a significant portion of misspellings made by children where conventional correction tools fail.
E. O’Neill and R. Young—Both the authors have equal contribution to this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Throughout this paper we use the ARPAbet notation when referring to phonemes.
References
Atkinson, K.: Aspell spell checker test data (2002). http://aspell.net/test/cur-all/batch0.tab. Accessed 19 May 2020
Aw, A., Zhang, M., Xiao, J., Su, J.: A phrase-based statistical model for SMS text normalization. In: COLING/ACL, pp. 33–40 (2006)
Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: ACL, pp. 286–293 (2000)
Church, K.W., Gale, W.A.: Probability scoring for spelling correction. Stat. Comput. 1(2), 93–103 (1991)
CMUSphinx: Grapheme-to-phoneme tool based on sequence-to-sequence learning (2016). https://github.com/cmusphinx/g2p-seq2seq
Collins, B., Mees, I.M.: Practical Phonetics and Phonology: A Resource Book for Students. Routledge, London (2013)
Daffern, T., Critten, S.: Student and teacher perspectives on spelling. Aust. J. Lang. Literacy 42(1), 40–57 (2019)
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)
Fisher, W.M.: A statistical text-to-phone function using ngrams and rules. ICASSP. 2, 649–652 (1999)
Fourakis, M., Port, R.: Stop epenthesis in English. J. Phonetics 14(2), 197–221 (1986)
Francis, W.N., Kucera, H.: Brown corpus manual (1979)
FrequencyWords: Frequency word list generator (2020). https://github.com/hermitdave/FrequencyWords. Accessed 21 May 2020
Gallagher, G., Graff, P.: The role of similarity in phonology. Lingua 2(122), 107–111 (2012)
Garbe, W.: Symspell (2020). https://github.com/wolfgarbe/symspell. Accessed 21 May 2020
Gimson, A.C., Ramsaran, S.: An Introduction to the Pronunciation of English, vol. 4. Edward Arnold London, London (1970)
Google: Google books Ngram viewer (2012). http://storage.googleapis.com/books/ngrams/books/datasetsv2.html. Accessed 21 May 2020
Hodge, V.J., Austin, J.: An evaluation of phonetic spell checkers (2001)
Itô, J.: A prosodic theory of epenthesis. Nat. Lang. Linguist. Theory 7(2), 217–259 (1989)
Kane, M., Carson-Berndsen, J.: Enhancing data-driven phone confusions using restricted recognition. In: INTERSPEECH, pp. 3693–3697 (2016)
Kevin Atkinson, G.A.: How aspell works (2004). http://aspell.net/0.50-doc/man-html/8_How.html. Accessed 21 May 2020
Khoury, R.: Microtext normalization using probably-phonetically-similar word discovery. In: WiMob, pp. 384–391 (2015)
Kukich, K.: Techniques for automatically correcting words in text. ACM Comput. Surv. (CSUR) 24(4), 377–439 (1992)
de Mendonça Almeida, G.A., Avanço, L., Duran, M.S., Fonseca, E.R., Nunes, M.d.G.V., Aluísio, S.M.: Evaluating phonetic spellers for user-generated content in Brazilian Portuguese. In: International Conference on Computational Processing of the Portuguese Language, pp. 361–373 (2016)
Mitton, R.: Birkbeck spelling error corpus (1980). https://ota.bodleian.ox.ac.uk/repository/xmlui/handle/20.500.12024/0643. Accessed 19 May 2020
Mitton, R.: Corpora of misspellings for download (2007). https://www.dcs.bbk.ac.uk/~ROGER/corpora.html. Accessed 19 May 2020
Norvig, P.: Pyspellchecker (2020). https://pypi.org/project/pyspellchecker/. Accessed 21 May 2020
O’Neill, E., Carson-Berndsen, J.: The effect of phoneme distribution on perceptual similarity in English. In: INTERSPEECH, pp. 1941–1945 (2019)
Philips, L.: The double metaphone search algorithm. C/C++ users J. 18(6), 38–43 (2000)
Read, C.: Children’s Creative Spelling. Routledge, London (2018)
Russell, R., Odell, M.: Soundex. US patent 1,261,167 (1918)
Silfverberg, M., Kauppinen, P., Lindén, K.: Data-driven spelling correction using weighted finite-state methods. In: SIGFSM Workshop on Statistical NLP and Weighted Automata, pp. 51–59 (2016)
Stüker, S., Fay, J., Berkling, K.: Towards context-dependent phonetic spelling error correction in children’s freely composed text for diagnostic and pedagogical purposes. In: INTERSPEECH (2011)
Torgerson, C., Brooks, G., Hall, J.: A Systematic Review of the Research Literature on the Use of Phonics in the Teaching of Reading and Spelling. DfES Publications, Nottingham (2006)
Toutanova, K., Moore, R.C.: Pronunciation modeling for improved spelling correction. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 144–151 (2002)
University College Dublin: S-capade github repository (2020). https://github.com/ucd-csl/Scapade. Accessed 15 Jul 2020
Veronis, J.: Computerized correction of phonographic errors. Comput. Humanit. 22(1), 43–56 (1988)
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. JACM 21(1), 168–173 (1974)
Weide, R.L.: The CMU pronouncing dictionary (1998). http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Wikipedia: Wikipedia:lists of common misspellings (2020). https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings. Accessed 19 May 2020
Wolf Garbe, S.: Symspellpy (2020). https://github.com/mammothb/symspellpy. Accessed 21 May 2020
Wordlist, A.: Scowl (spell checker oriented word lists) (2019). http://wordlist.aspell.net. Accessed 21 May 2020
Yip, M.: English vowel epenthesis. Nat. Lang. Linguist. Theory 5, 463–484 (1987). https://doi.org/10.1007/BF00138986
Zeeko: Free text survey responses (2020). https://zeeko.ie. Accessed 19 May 2020
Acknowledgements
This work was supported with the financial support of the Science Foundation Ireland grant 13/RC/2094 to Lero - the SFI Research Centre for Software (www.lero.ie). The ADAPT Centre for Digital Content Technology (www.adaptcentre.ie) is funded under the SFI Research Centres Programme (Grant 13/RC/2106). The authors would like to thank the team at Zeeko (https://zeeko.ie/) for supporting their research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
O’Neill, E., Young, R., Thiaville, E., MacCarthy, M., Carson-Berndsen, J., Ventresque, A. (2020). S-Capade: Spelling Correction Aimed at Particularly Deviant Errors. In: Espinosa-Anke, L., Martín-Vide, C., Spasić, I. (eds) Statistical Language and Speech Processing. SLSP 2020. Lecture Notes in Computer Science(), vol 12379. Springer, Cham. https://doi.org/10.1007/978-3-030-59430-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-59430-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59429-9
Online ISBN: 978-3-030-59430-5
eBook Packages: Computer ScienceComputer Science (R0)