Abstract
In this paper, we study how existing natural language processing tools for Italian perform on ancient texts. The first goal is to understand to what extent such tools can be used “as they are” for the automatic analysis of old literary works. Indeed, while NLP tools for Italian achieve today good performance, it is not clear if they could be successfully used for the humanities, to support the critical study of historical works. Our analysis will show how tools’ performance systematically vary across different time periods, and within literary movements. As a second goal, we want to verify whether or not simple customization methods can improve the tools performance over the old works.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
TEIconsortium: TEI P5: Guidelines for Electronic Text Encoding and Interchange. TEI Consortium (2005)
Moon, T., Baldridge, J.: Part-of-speech tagging for middle English through alignment and projection of parallel diachronic texts. In: Proceedings of the 2007 JointConference on Empirical Methods in Natural Language Processing and ComputationalNatural Language Learning (EMNLP-CoNLL), pp. 390–399 (2007)
Rocio, V., Alves, M.A., Lopes, J.G.P., Xavier, M.F., Vicente, G.: Automated creation of a partially syntactially annotated corpus of medieval portuguese using contemporary portuguese resources. In: Proceedings of the ATALA workshop on Treebanks, Paris, France (1999)
Britto, H., Finger, M., Galves, C.: Computational and linguistic aspects of the construction of the Tycho Brahe Parsed Corpus of Historical Portuguese. Gunter Narr Verlag, Tubingen (2002)
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21(4) (1995)
Yarowsky, D., Ngai, G.: Inducing multilingual pos taggers and np bracketers via robust projection across aligned corpora. In: Proceedings of NAACL 2001: Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies, Morristown, NJ, pp. 1–8 (2001)
Kroch, A., Taylor, A.: Penn-helsinki parsed corpus of middle english (2000)
Kroch, A., Santorini, B., Delfs, L.: Penn-helsinki parsed corpus of early modern english (2004)
Taylor, A., Warner, A., Pintzuk, S., Beths, F.: The york-toronto-helsinki parsed corpus of old english prose (2003)
Pollidori, V., Larson, P.: Il Tesoro della Lingua Italiana delle Origini(TLIO): il progetto lessicograco e i suoi risultati attuali. Franco Cesati Editore, Dordrecht, Germany (2005)
Barbera, Manuel Barbera, C.M., Marello, C.: Corpus Taurinense: italiano antico annotato in modo nuovo. Bulzoni Editore, Roma, Dordrecht, Germany (2003)
Basili, R., Di Stefano, A., Gigliucci, R., Moschitti, A., Pennacchiotti, M.: Automatic analysis and annotation of literary texts. In: Wokshop on Cultural Heritage, 9th AIIA Conference, Milan, Italy (2005)
Basili, R., Zanzotto, F.M.: Parsing engineering and empirical robustness. Natural Language Engineering 8/2-3 (2002)
Collins, M.: Head-driven statistical models for natural language parsing. Computational Linguistics 29(4) (December 2003)
Charniak, C.: A maximum-entropy-inspired parser. In: NAACL, Seattle, Washington (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pennacchiotti, M., Zanzotto, F.M. (2008). Natural Language Processing Across Time: An Empirical Investigation on Italian. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-85287-2_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)