Abstract
This paper reports a detailed evaluation of the effectiveness of a system that has been developed for the identification and retrieval of morphological variants in searches of Latin text databases. A user of the retrieval system enters the principal parts of the search term (two parts for a noun or adjective, three parts for a deponent verb, and four parts for other verbs), this enabling the identification of the type of word that is to be processed and of the rules that are to be followed in determining the morphological variants that should be retrieved. Two different search algorithms are described. The algorithms are applied to the Latin portion of the Hartlib Papers Collection and to a range of classical, vulgar and medieval Latin texts drawn from the Patrologia Latina and from the PHI Disk 5.3 datasets. The effectiveness of these searches demonstrates the effectiveness of our procedures in providing access to the full range of classical and post-classical Latin text databases.
Similar content being viewed by others
References
Ahmad F., M. Yusoff and T.M.T. Sembok "Experiments with a Malay Stemming Algorithm". Journal of the American Society for Information Science, 47 (1996), 909–918.
Frakes, W.B. "Stemming Algorithms". In Information Retrieval: Data Structures and Algorithms. Eds. W.B. Frakes and R. Baeza-Yates, Englewood Cliffs: Prentice-Hall, 1992.
Greengrass, M. "The Hartlib Papers Project. An Electronic Edition of the Past for the Future". In Changing Patterns of Online Information. UKOLUG State-Of-The-Art Conference1994. Eds. C.J. Armstrong and R.J. Hartley. Oxford: Learned Information Limited, 1994.
Kalamboukis, T.Z. "Suffix Stripping with Modern Greek". Program, 29 (1995), 313–321.
Kraaij,W. and R. Pohlmann "Evaluation of a Dutch Stemming Algorithm". New Review of Document and Text Management, 1 (1995), 25–43.
Lennon, M., D.S. Peirce, B.D. Tarry and P. Willett "An Evaluation of some Conflation Algorithms for Information Retrieval". Journal of Information Science, 3 (1981), 177–183.
Leslie, M. "The Hartlib Papers Project: Text Retrieval in Large Datasets". Literary and Linguistic Computing, 5 (1990), 58–69.
Lovins, J.B. "Error Evaluation for Stemming Algorithms as Clustering Algorithms". Journal of the American Society for Information Science, 22 (1971), 28–40.
Paice, C.D. "An Evaluation Method for Stemming Algorithms". In Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Eds. W.B. Croft and C.J. van Rijsbergen, London: Springer-Verlag, 1994.
Popovic, M. and P. Willett "The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data". Journal of the American Society for Information Science, 43 (1992), 384–390.
Savoy, J. "Stemming of French Words Based on Grammatical Categories". Journal of the American Society for Information Science, 44 (1993), 1–9.
Schinke, R., M. Greengrass, A.M. Robertson and P. Willett "A Stemming Algorithm for Latin Text Databases". Journal of Documentation, 52 (1996), 172–187.
Solak, A. and K. Oflazer "Design and Implementation of a Spelling Checker for Turkish". Literary and Linguistic Computing, 8 (1993), 113–130.
Sparck Jones, K. and P. Willett, Eds. Readings in Information Retrieval. San Francisco: Morgan Kaufman, 1997.
Sproat, R. Morphology and Computation. Cambridge MA: MIT Press, 1992.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schinke, R., Greengrass, M., Robertson, A.M. et al. Retrieval Of Morphological Variants In Searches Of Latin Text Databases. Computers and the Humanities 31, 409–432 (1997). https://doi.org/10.1023/A:1000996413558
Issue Date:
DOI: https://doi.org/10.1023/A:1000996413558