ABSTRACT
In contrast to other kinds of libraries, software libraries need to be conceptually organized. When looking for a component, the main concern of users is the functionality of the desired component; implementation details are secondary. Software reuse would be enhanced with conceptually organized large libraries of software components. In this paper, we present GURU, a tool that allows automatical building of such large software libraries from documented software components. We focus here on GURU's indexing component which extracts conceptual attributes from natural language documentation. This indexing method is based on words' co-occurrences. It first uses EXTRACT, a co-occurrence knowledge compiler for extracting potential attributes from textual documents. Conceptually relevant collocations are then selected according to their resolving power, which scales down the noise due to context words. This fully automated indexing tool thus goes further than keyword-based tools in the understanding of a document without the brittleness of knowledge based tools. The indexing component of GURU is fully implemented, and some results are given in the paper.
- Ash 65.R.B. Ash, Information Theory. Interscience Tracts in Pure and Appl_ied Mathematics, No. 19, Interscience Publishers, New York, 1965.Google Scholar
- Benson 86.M. Benson, E. Benson, R. Ilson, The BBI Combinatory Dictionary of English, A Guide to Word Combinations. Johrt Benjamin Publishing Company, Amsterdam/Philadelphia, 1986.Google Scholar
- Blair 85.D.C. Blair and M.E. Maron, An Evaluation of Retrieval Effectiveness }or a Full- Text Document-retrieval System. Communications of the ACM 28:3, pp 289-299, March 1985. Google ScholarDigital Library
- Choueka 88.Y. Choueka, Looking }or Needles in a Haystack. In Proceedings of the I#IAO, p:609-623, 1988.Google Scholar
- Flass 85.P.R. Flass, Technical Correspondence. Communications of the ACM, 28(11), pp 1238, November 1985. Google ScholarDigital Library
- Garside 87.R. Garside, G. Leech and G. Sampson, (eds), The Computational Analysis of English: A Corpus Based Approach. Longman, London, 1987.Google Scholar
- Halliday 66.M.A.K. Halliday, Lexis as a Linguistic Level. In C.E. Bazell, J.C. Catford, M.A.K Halliday and R.H. Robins (eds.), In memory o} J.R. Firth, Longmans Linguistics Library, pp 148-162, London, 1966.Google Scholar
- Horowitz 84.E. Horowitz and J. Munson, An Expensive View of Software Reuse. IEEE Transactions on Software Engineering, Vol SE- 10, September 1984.Google Scholar
- Huddleston 84.R. Huddleston, lrttroduclion to Ihe Grammar of English. Cambridge Textbooks in Linguistics, Cambridge U.,#_:versity Press, 1984.Google Scholar
- Luhn 58.M. Luhn, The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, Vol. 2, No. 2, pp 159-165, April 1958.Google Scholar
- Maarek 87.Y.S. Maarek and G.E. Kaiser, On the Use of Conceptual Clustering .for Classifying Reusable Ada Code. ACM SigAda international Conference on the Ada Programruing Language, pp 208-215, Boston, MA, December 1987. Google ScholarDigital Library
- Maarek 88.Y.S. Maarek, Using Cluster Analysis for Assisting Maintenance of Large Software Systems. In Proceedings of the IEEE Israel Conference on Computer Systems and Software Engineering, pp 178-186, Tel Avlv, israel, June 1988.Google Scholar
- Maarek 89.Y.S. Maarek, Using Structural lnforma. tion for Managing Very Large Software Systems. D#c. Dissertat_#on, Computer Science Department, Technion, Israel Institute of Technology, Israel, January 1989.Google Scholar
- Martin 83.W.J.R. Martin, B.P.F. Al and P.J.G van Sterkenburg, On the processing of a text corpus: .from textual data to lexicographical inIormalion. Lexicography: Principles and Practice, Ed. R.R.K Hartmann, Applied Language Studies Series, Academic Press, London, 1983.Google Scholar
- Mauldin 86.M. Mauldin, Information Retrieval by Text Skimming. Thesis Proposal, Carnegie-Mellon University, Pittsburgh, May 1986. Google ScholarDigital Library
- Mauldin 87.M. Mauldin, J. Carbonell and R. Thomason, Knowledge-Based Information Retrieval. In Proceedings of the 29th Annual Corfference of the National Federation of Abstracting and Information Services, Elsevier Press, 1987.Google Scholar
- Mel’c¸uk 73.I.A. Mel'#,uk, Ler, ical Funclions in Lexieographic Descriplion. In Proceedings of the Berkeley Linguistics Society, 8, 1973.Google Scholar
- Rodale 47.J.I. Rodale, and Staff, The Word Finder. Rodale Books, Inc. Emmaus, Pennsylvania, 1947.Google Scholar
- Salton 71.G. Salton, The SMART Retrieval System - experiment'in Automatic Document processing. Prentice-Hall, New Jersey, 1971. Google ScholarDigital Library
- Salton 83.G. Salton and M.J. McGill, introduction lo Modern Information Retrieval. Mc Graw Hill Computer Series, Mc Graw Hill, New York, 1983. Google ScholarDigital Library
- Saussure 49.F. De Saussure, Cours de Linguislique Generale, Qualri#rne edition. Librairie Payot, Paris, France, 1949.Google Scholar
- Smadja 89.F.A. Smadja, Lea:ical Co-occurrence: The Missirtg link. To appear in the Journal 0f the Association for Literary and Linguistic computing, 1989.Google Scholar
- Sparck Jones 86.K. Sparck Jones, Synonymy and Semantic Classification. Edinburgh University Press, Scotland, 1986. Google ScholarDigital Library
Index Terms
- Full text indexing based on lexical relations an application: software libraries
Recommendations
Full text indexing based on lexical relations an application: software libraries
Special issue: Proceedings of the 12th annual international ACMSIGIR conference on Research and development in information retrieval, N.J. Belkin and C.J. van Rijsbergen (Eds.), June 25-28, 1989, Cambridge, MA.In contrast to other kinds of libraries, software libraries need to be conceptually organized. When looking for a component, the main concern of users is the functionality of the desired component; implementation details are secondary. Software reuse ...
Concurrency and Recovery in Full-Text Indexing
SPIRE '99: Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on GroupwareAn important feature of a document database system is that the documents can be retrieved by searching for words from their contents. In a full-text index, each word of the stored documents can be used as a search key. Inserting a new document into the ...
Capturing paradigmatic and syntagmatic lexical relations: towards accurate Chinese part-of-speech tagging
ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured ...
Comments