A Word Prediction Methodology Based on Posgrams

Spiccia, Carmelo; Augello, Agnese; Pilato, Giovanni

doi:10.1007/978-3-319-52758-1_9

Carmelo Spiccia¹⁵,
Agnese Augello¹⁵ &
Giovanni Pilato¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 631))

Included in the following conference series:

International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management

628 Accesses

Abstract

This work introduces a two steps methodology for the prediction of missing words in incomplete sentences. In a first step the number of candidate words is restricted to the ones fulfilling the predicted part of speech; to this aim a novel algorithm based on “posgrams” analysis is also proposed. Then, in a second step, a word prediction algorithm is applied on the reduced words set. The work quantifies the advantages in predicting a word part of speech before predicting the word itself, in terms of accuracy and execution time. The methodology can be applied in several tasks, such as Text Autocompletion, Speech Recognition and Optical Text Recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Witten, I.H., Cleary, J.G., Darragh, J.J.: The reactive keyboard: a new technology for text entry (1983)
Google Scholar
Darragh, J.J., Witten, I.H., James, M.L.: The reactive keyboard: a predictive typing aid. Computer 23(11), 41–49 (1990)
Article Google Scholar
Carlberger, A., Carlberger, J., Magnuson, T., Hunnicutt, S., Palazuelos-Cagigas, S.E., Navarro, S.A.: Profet, a new generation of word prediction: an evaluation study. In: Proceedings, ACL Workshop on Natural Language Processing for Communication Aids, pp. 23–28 (1997)
Google Scholar
Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40(3–4), 237–264 (1953)
Article MathSciNet MATH Google Scholar
Katz, S.M.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. Acoust. Speech Sig. Process. 35(3), 400–401 (1987)
Article Google Scholar
Jelinek, F., Mercer, R.L.: Interpolated estimation of Markov source parameters from sparse data. In: Proceedings of the Workshop on Pattern Recognition in Practice (1980)
Google Scholar
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 181–184 (1995)
Google Scholar
Guthrie, D., Allison, B., Liu, W., Guthrie, L., Wilks, Y.: A closer look at skip-gram modelling. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), pp. 1–4 (2006)
Google Scholar
Zweig, G., Burges, C.J.C.: The Microsoft Research Sentence Completion Challenge. Microsoft Research Technical report, MSR-TR-2011-129 (2011)
Google Scholar
Gubbins, J., Vlachos, A.: Dependency language models for sentence completion. In: EMNLP, pp. 1405–1410 (2013)
Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)
Article Google Scholar
Spiccia, C., Augello, A., Pilato, G., Vassallo, G.: A word prediction methodology for automatic sentence completion. In: 2015 IEEE International Conference on Semantic Computing (ICSC), pp. 240–243 (2015)
Google Scholar
Agostaro, F., Pilato, G., Vassallo, G., Gaglio, S.: A sub-symbolic approach to word modelling for domain specific speech recognition. In: Proceedings, IEEE 7th International Workshop on Computer Architecture for Machine Perception (CAMP), pp. 321–326 (2005)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning. Studies in Fuzziness and Soft Computing, vol. 194, pp. 137–186. Springer, Heidelberg (2006)
Chapter Google Scholar
Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. arXiv preprint (2012). arXiv:1206.6426
Pachitariu, M., Sahani, M.: Regularization and nonlinearities for neural language models: when are they needed? arXiv preprint (2013). arXiv:1301.5650
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint (2013). arXiv:1301.3781
Kučera, F., Kučera, H.: A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). Brown University (1979)
Google Scholar
Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Lang. Resour. Eval. 43(3), 209–226 (2009)
Article Google Scholar
Semantically and Syntactically Annotated Italian Wikipedia. WaCky Corpora. University of Bologna. http://wacky.sslmit.unibo.it/doku.php?id=corpora. Accessed 1 July 2015
Calzolari, N., McNaught, J., Zampolli, A.: EAGLES Final Report: EAGLES Editors’ Introduction. EAG-EB-EI, Pisa (1996)
Google Scholar
Tanl POS Tagset, University of Pisa. http://medialab.di.unipi.it/wiki/Tanl_POS_Tagset. Accessed 1 July 2015
Stubbs, M.: An example of frequent English phraseology: distributions, structures and functions. Lang. Comput. 62(1), 89–105 (2007)
Google Scholar
Lindquist, H.: Corpus Linguistics and the Description of English, pp. 102–103. Edinburg University Press, Edinburgh (2009)
Google Scholar
Lyding, V., Stemle, E., Borghetti, C., Brunello, M., Castagnoli, S., Dell’Orletta, F., Dittmann, H., Lenci, A., Pirrelli, V.: The PAISA corpus of Italian web texts. In: Proceedings of the 9th Web as Corpus Workshop (WaC-9), 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 36–43 (2014)
Google Scholar
Doyle, A.C.: The adventures of Sherlock Holmes. Gutenberg Project, EBook #1661, Edition 12 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Italian National Research Council (CNR), Istituto di Calcolo e Reti ad Alte Prestazioni (ICAR), 90100, Palermo, Italy
Carmelo Spiccia, Agnese Augello & Giovanni Pilato

Authors

Carmelo Spiccia
View author publications
You can also search for this author in PubMed Google Scholar
Agnese Augello
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Pilato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Agnese Augello .

Editor information

Editors and Affiliations

Instituto de Telecomunicações/IST, Lisbon, Portugal
Ana Fred
Delft University of Technology, Delft, The Netherlands
Jan L.G. Dietz
University of Madeira, Funchal, Portugal
David Aveiro
University of Reading, Reading, United Kingdom
Kecheng Liu
Polytechnic Institute of Setúbal/INSTICC, Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Spiccia, C., Augello, A., Pilato, G. (2016). A Word Prediction Methodology Based on Posgrams. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2015. Communications in Computer and Information Science, vol 631. Springer, Cham. https://doi.org/10.1007/978-3-319-52758-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-52758-1_9
Published: 22 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52757-4
Online ISBN: 978-3-319-52758-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics