Abstract
In this study we used domain engineering as a method for gaining deeper formal understanding of a class of algorithms. Specifically, we analyzed 6 stemming algorithms from 4 different sub-domains of the conflation algorithms domain and developed formal domain models and generators based on these models. The application generator produces source code for not only affix removal but also successor variety, table lookup, and n-gram stemmers. The performance of the generated stemmers was compared with the stemmers developed manually in terms of stem similarity, source, and executable sizes, and development and execution times. Five of the stemmers generated by the application generator produced more than 99.9% identical stems with the manually developed stemmers. Some of the generated stemmers were as efficient as their manual equivalents and some were not.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
(n.d.) New Yorker Magazine (retrieved April 12, 2007), http://www.newyorker.com
(n.d.) Sample Corpus of Professional Spoken English (retrieved April 12, 2007), http://www.athel.com/sample.html
(n.d.) Harpers Magazine (retrieved April 12, 2007), http://www.harpers.com
(n.d.) Washington Post New Paper (retrieved April 12, 2007), http://www.washingtonpost.com
Adamson, G., Boreham, J.: The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Information Storage and Retrieval, 253–260 (1974)
Dawson, J.L.: Suffix removal and word conflation. ALLC Bulletin, 33–46 (1974)
Fox, B., Fox, C.J.: Efficient Stemmer generation. Information Processing and Management: an International Journal, 547–558 (2002)
Frakes, W.B.: Stemming Algorithms. In: Frakes, W.B.-Y. (ed.) Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs (1992)
Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. SIGIR Forum, 26–30 (2003)
Frakes, W.: A Method for Bounding Domains. In: IASTED International Conference Software Engineering and Applications 2000, Las Vegas, NV (2000)
Frakes, W., Kang, K.: Software Reuse Research: Status and Future. IEEE Transactions on Software Engineering, 529–536 (2005)
Frakes, W., Prieto-Diaz, R., Fox, C.J.: DARE: Domain analysis and reuse environment. Annals of Software Engineering, 125–141 (1998)
Hafer, M., Weiss, S.: Word segmentation by letter successor varieties. Information Storage and Retrieval, 371–385 (1974)
Harman, D.: How Effective is Suffixing? Journal of the American Society for Information Science, 7–15 (1991)
Krovetz, R.: Viewing morphology as an inference process. In: 16th ACM SIGIR conference, Pittsburgh, PA, pp. 191–202 (1993)
Lovins, J.B.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 22–31 (1968)
Paice, C.D.: Another Stemmer. SIGIR Forum, 56–61 (1990)
Porter, M.: An algorithm for suffix stripping. Program, 130–137 (1980)
Salton, G.: Automatic information organization and retrieval. Mc Graw Hill, New York (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yilmaz, O., Frakes, W.B. (2009). A Case Study of Using Domain Engineering for the Conflation Algorithms Domain. In: Edwards, S.H., Kulczycki, G. (eds) Formal Foundations of Reuse and Domain Engineering. ICSR 2009. Lecture Notes in Computer Science, vol 5791. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04211-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-04211-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04210-2
Online ISBN: 978-3-642-04211-9
eBook Packages: Computer ScienceComputer Science (R0)