Skip to main content

LIT: Rule Based Italian Lemmatizer

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1038))

Included in the following conference series:

Abstract

In natural language processing applications, such as those related to question answering systems, and more specifically, to semantic role labelling, an important task to perform during the text normalization phase is lemmatization which consists in determining those two words which have the same root, despite their surface differences. Due to the lack of a practical lemmatizing tool suitable for the Italian language (which is a highly inflectional one), in this paper we aim to present LIT, a Rule based Italian lemmatizer consisting of a full rule-base lemmatization of all dictionary-words and a discovery algorithm which attempts to predict the grammar of neologisms. This is followed by a practical application of LIT on Europarl v7, a well-known open corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. Prentice Hall, Upper Saddle River (2009)

    Google Scholar 

  2. Allam, A.M.N., Haggag, M.H.: The question answering systems: a survey. Int. J. Res. Rev. Inf. Sci. (IJRRIS) 2(3), 211–221 (2012)

    Google Scholar 

  3. Kurdi, M.Z.: Natural Language Processing and Computational Linguistics: Speech, Morphology, and Syntax, vol. 1. ISTE-Wiley (2016). ISBN 978-1848218482

    Google Scholar 

  4. Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11, 22–31 (1968)

    Google Scholar 

  5. Porter, M.F.: An algorithm for suffix stripping Program: electronic library and information systems. 14(3), 130–137 (1980)

    Article  Google Scholar 

  6. Paice, C.D.: Another stemmer. In: ACM SIGIR Forum, vol. 24, no. 3, pp. 56–61 (1990)

    Article  Google Scholar 

  7. Melucci, M., Orio, N.: A novel method for stemmer generation based on hidden Markov models. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 131–138 (2003)

    Google Scholar 

  8. Prasenjit, M., Mandar, M., Swapan, K.P., Gobinda, K., Kalyankumar, D.: YASS: yet another suffix stripper. ACM Trans. Inf. Syst. 25(4) (2007). Article no. 18

    Google Scholar 

  9. Krovetz, R.: Viewing morphology as an inference process. In: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 191–202 (1993)

    Google Scholar 

  10. Xu, J., Croft, B.W.: Corpus-based stemming using co-occurrence of word variants. ACM Trans. Inf. Syst. 16(1), 61–81 (1998)

    Article  Google Scholar 

  11. Delmonte, R.: Italian lemmatization by rules with getaruns. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds.) Evaluation of Natural Language and Speech Tools for Italian. EVALITA 2012. Lecture Notes in Computer Science, vol. 7689. Springer, Berlin (2013)

    Google Scholar 

Download references

Acknowledgment

All the research activities here described are related to project “ABAUT - Application for Brand Auditing and Trend” funded to SPHERA Srl by the Italian Ministry of Economic Development - progetto n. F/050420/00/X32, bando “HORIZON 2020” PON I&C 2014–2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Guerrieri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Molendini, S., Guerrieri, A., Filieri, A. (2020). LIT: Rule Based Italian Lemmatizer. In: Bi, Y., Bhatia, R., Kapoor, S. (eds) Intelligent Systems and Applications. IntelliSys 2019. Advances in Intelligent Systems and Computing, vol 1038. Springer, Cham. https://doi.org/10.1007/978-3-030-29513-4_28

Download citation

Publish with us

Policies and ethics