Abstract
Stemming algorithms (stemmers) are used to convert the words to their root form (stem), this process is used in the pre-processing stage of the Information Retrieval Systems. The Stemmers affect the indexing time by reducing the size of index file and improving the performance of the retrieval process. There are several stemming algorithms, the most widely used is porter stemming algorithm because of its efficiency, simplicity, speed, and also it easily handles exceptions. However there are some drawbacks, although many attempts were made to improve its structure but it was incomplete. This paper provides an efficient information retrieval technique as well as proposes a new stemming algorithm called Enhanced Porter’s Stemming Algorithm (EPSA). The objective of this technique is to overcome the drawbacks of the porter algorithm and improve the web searching. The EPSA was applied to two datasets to measure its performance. The result shows improvement of the precision over the original porter algorithm while realizing approximately the same recall percentage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Singhal, A.: Modern Information Retrieval: A Brief Overview. IEEE Data Engineering Bulletin 24(4), 35–43 (2011)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)
Yamout, F., Demachkieh, R., Hamdan, G., Sabra, R.: Further Enhancement to the Porter’s Stemming Algorithm. In: Machine Learning and Interaction for Text based Information Retrieval, Germany, pp. 7– 23 (2004)
Maurya, V., Pandey, P., Maurya, L.S.: Effective Information Retrieval System. International Journal of Emerging Technology and Advanced Engineering 3(4), 787–792 (2013)
Sembok, T., Abu Ata, B., Bakar, Z.: A Rule and Template Based Stemming Algorithm for Arabic Language. International Journal of Mathemtical models and Methods in Applied Sciences 5(5), 974–981 (2011)
Lovins, J.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11, 22–31 (1968)
Bijal, D., Sanket, S.: Overview of Stemming Algorithms for Indian and Non-Indian Languages. International Journal of Computer Science and Information Technologies (IJCSIT) 5(2), 1144–1146 (2014)
Jivani, A.: A Comparative Study of Stemming Algorithms. Int. J. Comp. Tech. Appl. 2(6), 1930–1938 (2011)
Paice, C.: Another stemmer. ACM SIGIR Forum 24(3), 56–61 (1990)
Sharma, D.: Stemming Algorithms: A Comparative Study and their Analysis. International Journal of Applied Information Systems 4(3), 7–12 (2012)
Smirnov, I.: Overview of Stemming Algorithms, http://the-smirnovs.org/info/stemming.pdf
Dawson, J.: Suffix removal and word conflation. ALLC Bulletin 2(3), 33–46 (1974)
Willett, P.: The Porter stemming algorithm: then and now. Program: Electronic Library and Information Systems 40(3), 219–223 (2006)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Srinivasan, S., Thambidurai, P.: STANS Algorithm for Root Word Stemming. Information Technology Journal 5(4), 685–688 (2006)
Megala, S., Kavitha, A., Marimuthu, A.: Improvised Stemming Algorithm – TWIG. International Journal of Advanced Research in Computer Science and Software Engineering 3(7), 168–171 (2013)
Karaa, W.: A New Stemmer To Improve Information Retrieval. International Journal of Network Security & Its Applications (IJNSA) 5(4), 143–154 (2013)
Moral, C., Antonio, A., Imbert, R., Rmirez, J.: A survey of stemming algorithms in information retrieval. Information Research 19(1) (2014)
Paice, C.D.: An evaluation method for stemming algorithms. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–50. ACM, Dublin (1994)
Karaa, W.B.A., Gribâa, N.: Information Retrieval with Porter Stemmer: A New Version for English. In: Nagamalai, D., Kumar, A., Annamalai, A. (eds.) CCSEIT-2013. AISC, vol. 225, pp. 243–254. Springer, Heidelberg (2013)
The Porter Stemming Algorithm, http://tartarus.org/~martin/PorterStemmer/index.html
Common IR Test Collection, http://web.eecs.utk.edu/research/lsi/corpa.html
Hassanien, A.E., Suraj, Z., Slezak, D., Lingras, P.: Rough computing: Theories, technologies and applications. IGI Publishing Hershey, PA (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hajeer, S.I., Ismail, R.M., Badr, N.L., Tolba, M.F. (2014). An Adaptive Information Retrieval System for Efficient Web Searching. In: Hassanien, A.E., Tolba, M.F., Taher Azar, A. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2014. Communications in Computer and Information Science, vol 488. Springer, Cham. https://doi.org/10.1007/978-3-319-13461-1_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-13461-1_44
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13460-4
Online ISBN: 978-3-319-13461-1
eBook Packages: Computer ScienceComputer Science (R0)