Skip to main content

Stemming Galician Texts

  • Conference paper
  • First Online:
Book cover String Processing and Information Retrieval (SPIRE 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2476))

Included in the following conference series:

Abstract

In this paper we describe a stemming algorithm for Galician language, which supports, at the same time, the four current orthographic regulations for Galician. The algorithm has already been implemented, and we have started to use it for its improvement. But this stemming algorithm cannot be applied over documents previous to the appearance of the first Galician orthographic regulation in 1977; therefore we have adopted an exhaustive approach, consisting in defining a huge collection of wordsets for allowing systematic word comparisons, to stem documents written before that date. We also describe here a tool to build the wordsets needed in this approach.

This work was partially granted by CICYT (TEL99-0335-C04-02) and the Vicerrectorado de Innovation Tecnoloxica (University of A Coruña).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Biblioteca Virtual Galega. http://bvg.udc.es.

  2. Brisaboa, N. R., Ocaña, E., Penabad, M. R., Places, A. S., Rodríguez, F.J. Biblioteca 7Virtual de Literatura Gallega. In Proc. of IDEAS’2002, pp. 68–77. Cuba, 2002.

    Google Scholar 

  3. Database Lab. http://emilia.dc.fi.udc.es/labBD.

  4. Euromosaic: The production and reproduction of the minority language groups in the European Union, ISBN 92-827-5512-6, Luxembourg 1996.

    Google Scholar 

  5. European Bureau for the Lesser Used Languages, http://www.eblul.org.

  6. Honrado, A., Leon, R., O’Donnell, R. and Sinclair, D. A Word Stemming Algorithm for the Spanish Language. In Proc. of the SPIRE’2000-IEEE Computer Society, pp. 139–145, A Corufia, 2000.

    Google Scholar 

  7. Freixeiro Mato, X. R., Gramática da lingua galega, Laiovento, Santiago de Compostela, 1998-2000 (3 vols.).

    Google Scholar 

  8. Freixeiro Mato, X. R., Lingua galega: normalidade e conflito, Laiovento, Santiago de Compostela, 2000.

    Google Scholar 

  9. Galician Stemmer Rules. http://bvg.udc.es/recursos_lingua/stemming.html.

  10. Kraaij, W., Pohlmami, R. Porter’s stemming algorithm for Dutch. In L.G.M. Noordman and W.A.M. de Vroomen, editors, Informatiewetenschap 1994: Wetenschappelijke bijdragen aan de derde STINFON Conferentie, pp. 167–180, Tilburg, 1994.

    Google Scholar 

  11. López, J.R., Iglesias, EX., Brisaboa, N.R., Paramá, J.R., Penabad, M.R. BBDD documental para el estudio del español del S. de Oro. In Proc. of CIICC’97, pp. 3–14. Mexico, 1997.

    Google Scholar 

  12. López, J.R., Iglesias, EX., Brisaboa, N.R., Paramá, J.R., Penabad, M.R. BBDD documental para el estudio del español antiguo. In Proc. of INFONOR’ 97, pp. 2–8., Chile, 1997.

    Google Scholar 

  13. Moreira, V., Huyck, C. A Stemming Algorithm for the Portuguese Language. In Proc. ofSPIRE’2001-IEEE Computer Society, pp. 186–193, Chile, 2001.

    Google Scholar 

  14. Portas, M., Língua e sociedade na Galiza, Bahía, A Coruña.

    Google Scholar 

  15. Smith, P.D. and Barnes, G.M. Files and Databases: An introduction. Addison-Wesley, 1987.

    Google Scholar 

  16. Snowball Project, http://snowball.sourceforge.net.

  17. Wechsler, M., Sheridan, P., Schäuble, P. Multi-Language Text Indexing for Internet Retrieval. In the Proc. of the 5 th RIAO Conference. Montreal, Canada, 1997.

    Google Scholar 

  18. Wurm, Stephen A. Atlas of the World’s Languages in Danger of Disappearing. UNESCO Publishing, ISBN 92-3-103798-6.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brisaboa, N.R., Callón, C., López, JR., Places, Á.S., Sanmartín, G. (2002). Stemming Galician Texts. In: Laender, A.H.F., Oliveira, A.L. (eds) String Processing and Information Retrieval. SPIRE 2002. Lecture Notes in Computer Science, vol 2476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45735-6_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-45735-6_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44158-8

  • Online ISBN: 978-3-540-45735-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics