Skip to main content

Using Stemming Algorithms on a Grid Environment

  • Conference paper
  • 1121 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5336))

Abstract

Stemming algorithms are commonly used in Information Retrieval with the goal of reducing the number of the words which are in the same morpho-logical variant in a common representation. Stemming analysis is one of the tasks of the pre-processing phase on text mining that consumes a lot of time. This study proposes a model of distributed stemming analysis on a grid environment to reduce the stemming processing time; this speeds up the text preparation. This model can be integrated into grid-based text mining tool, helping to improve the overall performance of the text mining process.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hearst, M.A.: Untangling text data mining. In: Proceedings of the 37th Annual Meeting on Computational Linguistics, pp. 3–10. Association for Computational Linguistics (1999)

    Google Scholar 

  2. Konchady, M.: Text Mining Application Programming. Charles River Media (2006)

    Google Scholar 

  3. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York (1983)

    MATH  Google Scholar 

  4. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press Books, New York (1999)

    Google Scholar 

  5. Kao, A., Poteet, S.R.: Natural Language Processing and Text Mining. Springer, Heidelberg (2007)

    Book  MATH  Google Scholar 

  6. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)

    MATH  Google Scholar 

  7. Manning, C.D., Schuetze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    Google Scholar 

  8. Lovins, J.B.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11(1/2), 22–31 (1968)

    Google Scholar 

  9. Lennon, M., Peirce, D.S., Tarry, B.D.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 3(4), 177–183 (1981)

    Google Scholar 

  10. Porter, M.F., Tait, J.I.: Charting a new course: Natural language processing and information retrieval. In: Essays in Honour of Karen Spärck Jones. pp. 39–68. Springer, Heidelberg (2005)

    Google Scholar 

  11. Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. In: ACM SIGIR Forum., vol. 37, pp. 26–30 (2003)

    Google Scholar 

  12. Harman, D.: How effective is suffixing? Journal of the American Society for Information Science 42(1), 7–15 (1991)

    Article  Google Scholar 

  13. Hull, D.A.: Stemming algorithms: a case study for detailed evaluation. Journal of the American Society for Information Science 47(1), 70–84 (1996)

    Article  Google Scholar 

  14. Krovetz, B.: Viewing morphology as an inference process. Artificial Intelligence 118(1/2), 277–294 (2000)

    Article  MATH  Google Scholar 

  15. Porter, M.F.: An algorithm for suffix stripping. Program (July 1980)

    Google Scholar 

  16. Porter, M.F.: The Porter stemming algorithm, http://tartarus.org/~martin/PorterStemmer/index.html

  17. Paice, C.D.: Another stemmer. SIGIR Forum. 24(3), 56–61 (1990)

    Article  Google Scholar 

  18. Qi, L., Jin, H., Foster, I., Gawor, J.: Hand: Highly available dynamic deployment infrastructure for globus toolkit 4, http://www.globus.org/alliance/~publications/papers/HANDSubmitted.pdf

  19. Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable virtual organizations. Intl. J. Supercomputer Applications 15(3), 20 (2001)

    Article  Google Scholar 

  20. The Globus Toolkit, http://www.globus.org/toolkit/

  21. GT4 Data Management, http://www.globus.org/toolkit/docs/4.0/data/

  22. The WS-Resource Framework, http://www.globus.org/wsrf/

  23. Replica Location Service, http://www.globus.org/toolkit/data/rls/

  24. LIGO Scientific Collaboration Research Group: Ligo Data Replicator., http://www.lsc-group.phys.uwm.edu/LDR/

  25. Chervenak, A., Schuler, R., Kesselman, C., Koranda, S., Moe, B.: Wide area data replication for scientific collaborations. In: Proceedings of 6th IEEE/ACM International Workshop on Grid Computing (Grid 2005) (November 2005)

    Google Scholar 

  26. Metadata Catalog Service, http://www.globus.org/grid_software/data/mcs.php

  27. GT 4.0: Security: Pre-Web Services Authentication and Authorization, http://www.globus.org/toolkit/docs/4.0/security/prewsaa/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Roncero, V.G., Costa, M.C.A., Ebecken, N.F.F. (2008). Using Stemming Algorithms on a Grid Environment. In: Palma, J.M.L.M., Amestoy, P.R., Daydé, M., Mattoso, M., Lopes, J.C. (eds) High Performance Computing for Computational Science - VECPAR 2008. VECPAR 2008. Lecture Notes in Computer Science, vol 5336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92859-1_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-92859-1_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-92858-4

  • Online ISBN: 978-3-540-92859-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics