Skip to main content

A Word Prediction Methodology Based on Posgrams

  • Conference paper
  • First Online:
Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015)

Abstract

This work introduces a two steps methodology for the prediction of missing words in incomplete sentences. In a first step the number of candidate words is restricted to the ones fulfilling the predicted part of speech; to this aim a novel algorithm based on “posgrams” analysis is also proposed. Then, in a second step, a word prediction algorithm is applied on the reduced words set. The work quantifies the advantages in predicting a word part of speech before predicting the word itself, in terms of accuracy and execution time. The methodology can be applied in several tasks, such as Text Autocompletion, Speech Recognition and Optical Text Recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Witten, I.H., Cleary, J.G., Darragh, J.J.: The reactive keyboard: a new technology for text entry (1983)

    Google Scholar 

  2. Darragh, J.J., Witten, I.H., James, M.L.: The reactive keyboard: a predictive typing aid. Computer 23(11), 41–49 (1990)

    Article  Google Scholar 

  3. Carlberger, A., Carlberger, J., Magnuson, T., Hunnicutt, S., Palazuelos-Cagigas, S.E., Navarro, S.A.: Profet, a new generation of word prediction: an evaluation study. In: Proceedings, ACL Workshop on Natural Language Processing for Communication Aids, pp. 23–28 (1997)

    Google Scholar 

  4. Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40(3–4), 237–264 (1953)

    Article  MathSciNet  MATH  Google Scholar 

  5. Katz, S.M.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. Acoust. Speech Sig. Process. 35(3), 400–401 (1987)

    Article  Google Scholar 

  6. Jelinek, F., Mercer, R.L.: Interpolated estimation of Markov source parameters from sparse data. In: Proceedings of the Workshop on Pattern Recognition in Practice (1980)

    Google Scholar 

  7. Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 181–184 (1995)

    Google Scholar 

  8. Guthrie, D., Allison, B., Liu, W., Guthrie, L., Wilks, Y.: A closer look at skip-gram modelling. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), pp. 1–4 (2006)

    Google Scholar 

  9. Zweig, G., Burges, C.J.C.: The Microsoft Research Sentence Completion Challenge. Microsoft Research Technical report, MSR-TR-2011-129 (2011)

    Google Scholar 

  10. Gubbins, J., Vlachos, A.: Dependency language models for sentence completion. In: EMNLP, pp. 1405–1410 (2013)

    Google Scholar 

  11. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)

    Article  Google Scholar 

  12. Spiccia, C., Augello, A., Pilato, G., Vassallo, G.: A word prediction methodology for automatic sentence completion. In: 2015 IEEE International Conference on Semantic Computing (ICSC), pp. 240–243 (2015)

    Google Scholar 

  13. Agostaro, F., Pilato, G., Vassallo, G., Gaglio, S.: A sub-symbolic approach to word modelling for domain specific speech recognition. In: Proceedings, IEEE 7th International Workshop on Computer Architecture for Machine Perception (CAMP), pp. 321–326 (2005)

    Google Scholar 

  14. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning. Studies in Fuzziness and Soft Computing, vol. 194, pp. 137–186. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. arXiv preprint (2012). arXiv:1206.6426

  16. Pachitariu, M., Sahani, M.: Regularization and nonlinearities for neural language models: when are they needed? arXiv preprint (2013). arXiv:1301.5650

  17. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint (2013). arXiv:1301.3781

  18. Kučera, F., Kučera, H.: A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). Brown University (1979)

    Google Scholar 

  19. Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Lang. Resour. Eval. 43(3), 209–226 (2009)

    Article  Google Scholar 

  20. Semantically and Syntactically Annotated Italian Wikipedia. WaCky Corpora. University of Bologna. http://wacky.sslmit.unibo.it/doku.php?id=corpora. Accessed 1 July 2015

  21. Calzolari, N., McNaught, J., Zampolli, A.: EAGLES Final Report: EAGLES Editors’ Introduction. EAG-EB-EI, Pisa (1996)

    Google Scholar 

  22. Tanl POS Tagset, University of Pisa. http://medialab.di.unipi.it/wiki/Tanl_POS_Tagset. Accessed 1 July 2015

  23. Stubbs, M.: An example of frequent English phraseology: distributions, structures and functions. Lang. Comput. 62(1), 89–105 (2007)

    Google Scholar 

  24. Lindquist, H.: Corpus Linguistics and the Description of English, pp. 102–103. Edinburg University Press, Edinburgh (2009)

    Google Scholar 

  25. Lyding, V., Stemle, E., Borghetti, C., Brunello, M., Castagnoli, S., Dell’Orletta, F., Dittmann, H., Lenci, A., Pirrelli, V.: The PAISA corpus of Italian web texts. In: Proceedings of the 9th Web as Corpus Workshop (WaC-9), 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 36–43 (2014)

    Google Scholar 

  26. Doyle, A.C.: The adventures of Sherlock Holmes. Gutenberg Project, EBook #1661, Edition 12 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Agnese Augello .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Spiccia, C., Augello, A., Pilato, G. (2016). A Word Prediction Methodology Based on Posgrams. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2015. Communications in Computer and Information Science, vol 631. Springer, Cham. https://doi.org/10.1007/978-3-319-52758-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52758-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52757-4

  • Online ISBN: 978-3-319-52758-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics