A Case Study of Using Domain Engineering for the Conflation Algorithms Domain

Yilmaz, Okan; Frakes, William B.

doi:10.1007/978-3-642-04211-9_9

Okan Yilmaz¹⁸ &
William B. Frakes¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5791))

Included in the following conference series:

International Conference on Software Reuse

626 Accesses
2 Citations

Abstract

In this study we used domain engineering as a method for gaining deeper formal understanding of a class of algorithms. Specifically, we analyzed 6 stemming algorithms from 4 different sub-domains of the conflation algorithms domain and developed formal domain models and generators based on these models. The application generator produces source code for not only affix removal but also successor variety, table lookup, and n-gram stemmers. The performance of the generated stemmers was compared with the stemmers developed manually in terms of stem similarity, source, and executable sizes, and development and execution times. Five of the stemmers generated by the application generator produced more than 99.9% identical stems with the manually developed stemmers. Some of the generated stemmers were as efficient as their manual equivalents and some were not.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Developing a Stemmer for German Based on a Comparative Analysis of Publicly Available Stemmers

Statistical Stemmers: A Reproducibility Study

A systematic review of text stemming techniques

Article 01 August 2016

References

(n.d.) New Yorker Magazine (retrieved April 12, 2007), http://www.newyorker.com
(n.d.) Sample Corpus of Professional Spoken English (retrieved April 12, 2007), http://www.athel.com/sample.html
(n.d.) Harpers Magazine (retrieved April 12, 2007), http://www.harpers.com
(n.d.) Washington Post New Paper (retrieved April 12, 2007), http://www.washingtonpost.com
Adamson, G., Boreham, J.: The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Information Storage and Retrieval, 253–260 (1974)
Google Scholar
Dawson, J.L.: Suffix removal and word conflation. ALLC Bulletin, 33–46 (1974)
Google Scholar
Fox, B., Fox, C.J.: Efficient Stemmer generation. Information Processing and Management: an International Journal, 547–558 (2002)
Google Scholar
Frakes, W.B.: Stemming Algorithms. In: Frakes, W.B.-Y. (ed.) Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs (1992)
Google Scholar
Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. SIGIR Forum, 26–30 (2003)
Google Scholar
Frakes, W.: A Method for Bounding Domains. In: IASTED International Conference Software Engineering and Applications 2000, Las Vegas, NV (2000)
Google Scholar
Frakes, W., Kang, K.: Software Reuse Research: Status and Future. IEEE Transactions on Software Engineering, 529–536 (2005)
Google Scholar
Frakes, W., Prieto-Diaz, R., Fox, C.J.: DARE: Domain analysis and reuse environment. Annals of Software Engineering, 125–141 (1998)
Google Scholar
Hafer, M., Weiss, S.: Word segmentation by letter successor varieties. Information Storage and Retrieval, 371–385 (1974)
Google Scholar
Harman, D.: How Effective is Suffixing? Journal of the American Society for Information Science, 7–15 (1991)
Google Scholar
Krovetz, R.: Viewing morphology as an inference process. In: 16th ACM SIGIR conference, Pittsburgh, PA, pp. 191–202 (1993)
Google Scholar
Lovins, J.B.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 22–31 (1968)
Google Scholar
Paice, C.D.: Another Stemmer. SIGIR Forum, 56–61 (1990)
Google Scholar
Porter, M.: An algorithm for suffix stripping. Program, 130–137 (1980)
Google Scholar
Salton, G.: Automatic information organization and retrieval. Mc Graw Hill, New York (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Virginia Tech, USA
Okan Yilmaz & William B. Frakes

Authors

Okan Yilmaz
View author publications
You can also search for this author in PubMed Google Scholar
William B. Frakes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Virginia Tech, VA 24061, Blacksburg, USA
Stephen H. Edwards
Computer Science Department, Virginia Tech, VA 22043, Falls Church, USA
Gregory Kulczycki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yilmaz, O., Frakes, W.B. (2009). A Case Study of Using Domain Engineering for the Conflation Algorithms Domain. In: Edwards, S.H., Kulczycki, G. (eds) Formal Foundations of Reuse and Domain Engineering. ICSR 2009. Lecture Notes in Computer Science, vol 5791. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04211-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-04211-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04210-2
Online ISBN: 978-3-642-04211-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics