Abstract
In this paper we describe a stemming algorithm for Galician language, which supports, at the same time, the four current orthographic regulations for Galician. The algorithm has already been implemented, and we have started to use it for its improvement. But this stemming algorithm cannot be applied over documents previous to the appearance of the first Galician orthographic regulation in 1977; therefore we have adopted an exhaustive approach, consisting in defining a huge collection of wordsets for allowing systematic word comparisons, to stem documents written before that date. We also describe here a tool to build the wordsets needed in this approach.
This work was partially granted by CICYT (TEL99-0335-C04-02) and the Vicerrectorado de Innovation Tecnoloxica (University of A Coruña).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Biblioteca Virtual Galega. http://bvg.udc.es.
Brisaboa, N. R., Ocaña, E., Penabad, M. R., Places, A. S., Rodríguez, F.J. Biblioteca 7Virtual de Literatura Gallega. In Proc. of IDEAS’2002, pp. 68–77. Cuba, 2002.
Database Lab. http://emilia.dc.fi.udc.es/labBD.
Euromosaic: The production and reproduction of the minority language groups in the European Union, ISBN 92-827-5512-6, Luxembourg 1996.
European Bureau for the Lesser Used Languages, http://www.eblul.org.
Honrado, A., Leon, R., O’Donnell, R. and Sinclair, D. A Word Stemming Algorithm for the Spanish Language. In Proc. of the SPIRE’2000-IEEE Computer Society, pp. 139–145, A Corufia, 2000.
Freixeiro Mato, X. R., Gramática da lingua galega, Laiovento, Santiago de Compostela, 1998-2000 (3 vols.).
Freixeiro Mato, X. R., Lingua galega: normalidade e conflito, Laiovento, Santiago de Compostela, 2000.
Galician Stemmer Rules. http://bvg.udc.es/recursos_lingua/stemming.html.
Kraaij, W., Pohlmami, R. Porter’s stemming algorithm for Dutch. In L.G.M. Noordman and W.A.M. de Vroomen, editors, Informatiewetenschap 1994: Wetenschappelijke bijdragen aan de derde STINFON Conferentie, pp. 167–180, Tilburg, 1994.
López, J.R., Iglesias, EX., Brisaboa, N.R., Paramá, J.R., Penabad, M.R. BBDD documental para el estudio del español del S. de Oro. In Proc. of CIICC’97, pp. 3–14. Mexico, 1997.
López, J.R., Iglesias, EX., Brisaboa, N.R., Paramá, J.R., Penabad, M.R. BBDD documental para el estudio del español antiguo. In Proc. of INFONOR’ 97, pp. 2–8., Chile, 1997.
Moreira, V., Huyck, C. A Stemming Algorithm for the Portuguese Language. In Proc. ofSPIRE’2001-IEEE Computer Society, pp. 186–193, Chile, 2001.
Portas, M., Língua e sociedade na Galiza, Bahía, A Coruña.
Smith, P.D. and Barnes, G.M. Files and Databases: An introduction. Addison-Wesley, 1987.
Snowball Project, http://snowball.sourceforge.net.
Wechsler, M., Sheridan, P., Schäuble, P. Multi-Language Text Indexing for Internet Retrieval. In the Proc. of the 5 th RIAO Conference. Montreal, Canada, 1997.
Wurm, Stephen A. Atlas of the World’s Languages in Danger of Disappearing. UNESCO Publishing, ISBN 92-3-103798-6.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brisaboa, N.R., Callón, C., López, JR., Places, Á.S., Sanmartín, G. (2002). Stemming Galician Texts. In: Laender, A.H.F., Oliveira, A.L. (eds) String Processing and Information Retrieval. SPIRE 2002. Lecture Notes in Computer Science, vol 2476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45735-6_9
Download citation
DOI: https://doi.org/10.1007/3-540-45735-6_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44158-8
Online ISBN: 978-3-540-45735-0
eBook Packages: Springer Book Archive