skip to main content
10.1145/75334.75355acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free Access

Full text indexing based on lexical relations an application: software libraries

Authors Info & Claims
Published:01 May 1989Publication History

ABSTRACT

In contrast to other kinds of libraries, software libraries need to be conceptually organized. When looking for a component, the main concern of users is the functionality of the desired component; implementation details are secondary. Software reuse would be enhanced with conceptually organized large libraries of software components. In this paper, we present GURU, a tool that allows automatical building of such large software libraries from documented software components. We focus here on GURU's indexing component which extracts conceptual attributes from natural language documentation. This indexing method is based on words' co-occurrences. It first uses EXTRACT, a co-occurrence knowledge compiler for extracting potential attributes from textual documents. Conceptually relevant collocations are then selected according to their resolving power, which scales down the noise due to context words. This fully automated indexing tool thus goes further than keyword-based tools in the understanding of a document without the brittleness of knowledge based tools. The indexing component of GURU is fully implemented, and some results are given in the paper.

References

  1. Ash 65.R.B. Ash, Information Theory. Interscience Tracts in Pure and Appl_ied Mathematics, No. 19, Interscience Publishers, New York, 1965.Google ScholarGoogle Scholar
  2. Benson 86.M. Benson, E. Benson, R. Ilson, The BBI Combinatory Dictionary of English, A Guide to Word Combinations. Johrt Benjamin Publishing Company, Amsterdam/Philadelphia, 1986.Google ScholarGoogle Scholar
  3. Blair 85.D.C. Blair and M.E. Maron, An Evaluation of Retrieval Effectiveness }or a Full- Text Document-retrieval System. Communications of the ACM 28:3, pp 289-299, March 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Choueka 88.Y. Choueka, Looking }or Needles in a Haystack. In Proceedings of the I#IAO, p:609-623, 1988.Google ScholarGoogle Scholar
  5. Flass 85.P.R. Flass, Technical Correspondence. Communications of the ACM, 28(11), pp 1238, November 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Garside 87.R. Garside, G. Leech and G. Sampson, (eds), The Computational Analysis of English: A Corpus Based Approach. Longman, London, 1987.Google ScholarGoogle Scholar
  7. Halliday 66.M.A.K. Halliday, Lexis as a Linguistic Level. In C.E. Bazell, J.C. Catford, M.A.K Halliday and R.H. Robins (eds.), In memory o} J.R. Firth, Longmans Linguistics Library, pp 148-162, London, 1966.Google ScholarGoogle Scholar
  8. Horowitz 84.E. Horowitz and J. Munson, An Expensive View of Software Reuse. IEEE Transactions on Software Engineering, Vol SE- 10, September 1984.Google ScholarGoogle Scholar
  9. Huddleston 84.R. Huddleston, lrttroduclion to Ihe Grammar of English. Cambridge Textbooks in Linguistics, Cambridge U.,#_:versity Press, 1984.Google ScholarGoogle Scholar
  10. Luhn 58.M. Luhn, The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, Vol. 2, No. 2, pp 159-165, April 1958.Google ScholarGoogle Scholar
  11. Maarek 87.Y.S. Maarek and G.E. Kaiser, On the Use of Conceptual Clustering .for Classifying Reusable Ada Code. ACM SigAda international Conference on the Ada Programruing Language, pp 208-215, Boston, MA, December 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Maarek 88.Y.S. Maarek, Using Cluster Analysis for Assisting Maintenance of Large Software Systems. In Proceedings of the IEEE Israel Conference on Computer Systems and Software Engineering, pp 178-186, Tel Avlv, israel, June 1988.Google ScholarGoogle Scholar
  13. Maarek 89.Y.S. Maarek, Using Structural lnforma. tion for Managing Very Large Software Systems. D#c. Dissertat_#on, Computer Science Department, Technion, Israel Institute of Technology, Israel, January 1989.Google ScholarGoogle Scholar
  14. Martin 83.W.J.R. Martin, B.P.F. Al and P.J.G van Sterkenburg, On the processing of a text corpus: .from textual data to lexicographical inIormalion. Lexicography: Principles and Practice, Ed. R.R.K Hartmann, Applied Language Studies Series, Academic Press, London, 1983.Google ScholarGoogle Scholar
  15. Mauldin 86.M. Mauldin, Information Retrieval by Text Skimming. Thesis Proposal, Carnegie-Mellon University, Pittsburgh, May 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mauldin 87.M. Mauldin, J. Carbonell and R. Thomason, Knowledge-Based Information Retrieval. In Proceedings of the 29th Annual Corfference of the National Federation of Abstracting and Information Services, Elsevier Press, 1987.Google ScholarGoogle Scholar
  17. Mel’c¸uk 73.I.A. Mel'#,uk, Ler, ical Funclions in Lexieographic Descriplion. In Proceedings of the Berkeley Linguistics Society, 8, 1973.Google ScholarGoogle Scholar
  18. Rodale 47.J.I. Rodale, and Staff, The Word Finder. Rodale Books, Inc. Emmaus, Pennsylvania, 1947.Google ScholarGoogle Scholar
  19. Salton 71.G. Salton, The SMART Retrieval System - experiment'in Automatic Document processing. Prentice-Hall, New Jersey, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Salton 83.G. Salton and M.J. McGill, introduction lo Modern Information Retrieval. Mc Graw Hill Computer Series, Mc Graw Hill, New York, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Saussure 49.F. De Saussure, Cours de Linguislique Generale, Qualri#rne edition. Librairie Payot, Paris, France, 1949.Google ScholarGoogle Scholar
  22. Smadja 89.F.A. Smadja, Lea:ical Co-occurrence: The Missirtg link. To appear in the Journal 0f the Association for Literary and Linguistic computing, 1989.Google ScholarGoogle Scholar
  23. Sparck Jones 86.K. Sparck Jones, Synonymy and Semantic Classification. Edinburgh University Press, Scotland, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Full text indexing based on lexical relations an application: software libraries

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                SIGIR '89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
                May 1989
                257 pages
                ISBN:0897913213
                DOI:10.1145/75334
                • cover image ACM SIGIR Forum
                  ACM SIGIR Forum  Volume 23, Issue SI
                  Special issue: Proceedings of the 12th annual international ACMSIGIR conference on Research and development in information retrieval, N.J. Belkin and C.J. van Rijsbergen (Eds.), June 25-28, 1989, Cambridge, MA.
                  June 1989
                  243 pages
                  ISSN:0163-5840
                  DOI:10.1145/75335
                  Issue’s Table of Contents

                Copyright © 1989 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 May 1989

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                Overall Acceptance Rate792of3,983submissions,20%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader