Abstract
Stemming algorithms are commonly used in Information Retrieval with the goal of reducing the number of the words which are in the same morpho-logical variant in a common representation. Stemming analysis is one of the tasks of the pre-processing phase on text mining that consumes a lot of time. This study proposes a model of distributed stemming analysis on a grid environment to reduce the stemming processing time; this speeds up the text preparation. This model can be integrated into grid-based text mining tool, helping to improve the overall performance of the text mining process.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Hearst, M.A.: Untangling text data mining. In: Proceedings of the 37th Annual Meeting on Computational Linguistics, pp. 3–10. Association for Computational Linguistics (1999)
Konchady, M.: Text Mining Application Programming. Charles River Media (2006)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York (1983)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press Books, New York (1999)
Kao, A., Poteet, S.R.: Natural Language Processing and Text Mining. Springer, Heidelberg (2007)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)
Manning, C.D., Schuetze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Lovins, J.B.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11(1/2), 22–31 (1968)
Lennon, M., Peirce, D.S., Tarry, B.D.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 3(4), 177–183 (1981)
Porter, M.F., Tait, J.I.: Charting a new course: Natural language processing and information retrieval. In: Essays in Honour of Karen Spärck Jones. pp. 39–68. Springer, Heidelberg (2005)
Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. In: ACM SIGIR Forum., vol. 37, pp. 26–30 (2003)
Harman, D.: How effective is suffixing? Journal of the American Society for Information Science 42(1), 7–15 (1991)
Hull, D.A.: Stemming algorithms: a case study for detailed evaluation. Journal of the American Society for Information Science 47(1), 70–84 (1996)
Krovetz, B.: Viewing morphology as an inference process. Artificial Intelligence 118(1/2), 277–294 (2000)
Porter, M.F.: An algorithm for suffix stripping. Program (July 1980)
Porter, M.F.: The Porter stemming algorithm, http://tartarus.org/~martin/PorterStemmer/index.html
Paice, C.D.: Another stemmer. SIGIR Forum. 24(3), 56–61 (1990)
Qi, L., Jin, H., Foster, I., Gawor, J.: Hand: Highly available dynamic deployment infrastructure for globus toolkit 4, http://www.globus.org/alliance/~publications/papers/HANDSubmitted.pdf
Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable virtual organizations. Intl. J. Supercomputer Applications 15(3), 20 (2001)
The Globus Toolkit, http://www.globus.org/toolkit/
GT4 Data Management, http://www.globus.org/toolkit/docs/4.0/data/
The WS-Resource Framework, http://www.globus.org/wsrf/
Replica Location Service, http://www.globus.org/toolkit/data/rls/
LIGO Scientific Collaboration Research Group: Ligo Data Replicator., http://www.lsc-group.phys.uwm.edu/LDR/
Chervenak, A., Schuler, R., Kesselman, C., Koranda, S., Moe, B.: Wide area data replication for scientific collaborations. In: Proceedings of 6th IEEE/ACM International Workshop on Grid Computing (Grid 2005) (November 2005)
Metadata Catalog Service, http://www.globus.org/grid_software/data/mcs.php
GT 4.0: Security: Pre-Web Services Authentication and Authorization, http://www.globus.org/toolkit/docs/4.0/security/prewsaa/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Roncero, V.G., Costa, M.C.A., Ebecken, N.F.F. (2008). Using Stemming Algorithms on a Grid Environment. In: Palma, J.M.L.M., Amestoy, P.R., Daydé, M., Mattoso, M., Lopes, J.C. (eds) High Performance Computing for Computational Science - VECPAR 2008. VECPAR 2008. Lecture Notes in Computer Science, vol 5336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92859-1_52
Download citation
DOI: https://doi.org/10.1007/978-3-540-92859-1_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92858-4
Online ISBN: 978-3-540-92859-1
eBook Packages: Computer ScienceComputer Science (R0)