Full text indexing based on lexical relations an application: software libraries

Authors:
Y. S. Maarek

IBM Thomas J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY

IBM Thomas J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY
View Profile

,
F. Z. Smadja

Department of Computer Science, Columbia University, New York, NY

Department of Computer Science, Columbia University, New York, NY
View Profile

SIGIR '89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrievalMay 1989Pages 198–206https://doi.org/10.1145/75334.75355

Published:01 May 1989Publication History

SIGIR '89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 198–206

ABSTRACT

In contrast to other kinds of libraries, software libraries need to be conceptually organized. When looking for a component, the main concern of users is the functionality of the desired component; implementation details are secondary. Software reuse would be enhanced with conceptually organized large libraries of software components. In this paper, we present GURU, a tool that allows automatical building of such large software libraries from documented software components. We focus here on GURU's indexing component which extracts conceptual attributes from natural language documentation. This indexing method is based on words' co-occurrences. It first uses EXTRACT, a co-occurrence knowledge compiler for extracting potential attributes from textual documents. Conceptually relevant collocations are then selected according to their resolving power, which scales down the noise due to context words. This fully automated indexing tool thus goes further than keyword-based tools in the understanding of a document without the brittleness of knowledge based tools. The indexing component of GURU is fully implemented, and some results are given in the paper.

References

Ash 65.R.B. Ash, Information Theory. Interscience Tracts in Pure and Appl_ied Mathematics, No. 19, Interscience Publishers, New York, 1965.Google Scholar
Benson 86.M. Benson, E. Benson, R. Ilson, The BBI Combinatory Dictionary of English, A Guide to Word Combinations. Johrt Benjamin Publishing Company, Amsterdam/Philadelphia, 1986.Google Scholar
Blair 85.D.C. Blair and M.E. Maron, An Evaluation of Retrieval Effectiveness }or a Full- Text Document-retrieval System. Communications of the ACM 28:3, pp 289-299, March 1985. Google ScholarDigital Library
Choueka 88.Y. Choueka, Looking }or Needles in a Haystack. In Proceedings of the I#IAO, p:609-623, 1988.Google Scholar
Flass 85.P.R. Flass, Technical Correspondence. Communications of the ACM, 28(11), pp 1238, November 1985. Google ScholarDigital Library
Garside 87.R. Garside, G. Leech and G. Sampson, (eds), The Computational Analysis of English: A Corpus Based Approach. Longman, London, 1987.Google Scholar
Halliday 66.M.A.K. Halliday, Lexis as a Linguistic Level. In C.E. Bazell, J.C. Catford, M.A.K Halliday and R.H. Robins (eds.), In memory o} J.R. Firth, Longmans Linguistics Library, pp 148-162, London, 1966.Google Scholar
Horowitz 84.E. Horowitz and J. Munson, An Expensive View of Software Reuse. IEEE Transactions on Software Engineering, Vol SE- 10, September 1984.Google Scholar
Huddleston 84.R. Huddleston, lrttroduclion to Ihe Grammar of English. Cambridge Textbooks in Linguistics, Cambridge U.,#_:versity Press, 1984.Google Scholar
Luhn 58.M. Luhn, The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, Vol. 2, No. 2, pp 159-165, April 1958.Google Scholar
Maarek 87.Y.S. Maarek and G.E. Kaiser, On the Use of Conceptual Clustering .for Classifying Reusable Ada Code. ACM SigAda international Conference on the Ada Programruing Language, pp 208-215, Boston, MA, December 1987. Google ScholarDigital Library
Maarek 88.Y.S. Maarek, Using Cluster Analysis for Assisting Maintenance of Large Software Systems. In Proceedings of the IEEE Israel Conference on Computer Systems and Software Engineering, pp 178-186, Tel Avlv, israel, June 1988.Google Scholar
Maarek 89.Y.S. Maarek, Using Structural lnforma. tion for Managing Very Large Software Systems. D#c. Dissertat_#on, Computer Science Department, Technion, Israel Institute of Technology, Israel, January 1989.Google Scholar
Martin 83.W.J.R. Martin, B.P.F. Al and P.J.G van Sterkenburg, On the processing of a text corpus: .from textual data to lexicographical inIormalion. Lexicography: Principles and Practice, Ed. R.R.K Hartmann, Applied Language Studies Series, Academic Press, London, 1983.Google Scholar
Mauldin 86.M. Mauldin, Information Retrieval by Text Skimming. Thesis Proposal, Carnegie-Mellon University, Pittsburgh, May 1986. Google ScholarDigital Library
Mauldin 87.M. Mauldin, J. Carbonell and R. Thomason, Knowledge-Based Information Retrieval. In Proceedings of the 29th Annual Corfference of the National Federation of Abstracting and Information Services, Elsevier Press, 1987.Google Scholar
Mel’c¸uk 73.I.A. Mel'#,uk, Ler, ical Funclions in Lexieographic Descriplion. In Proceedings of the Berkeley Linguistics Society, 8, 1973.Google Scholar
Rodale 47.J.I. Rodale, and Staff, The Word Finder. Rodale Books, Inc. Emmaus, Pennsylvania, 1947.Google Scholar
Salton 71.G. Salton, The SMART Retrieval System - experiment'in Automatic Document processing. Prentice-Hall, New Jersey, 1971. Google ScholarDigital Library
Salton 83.G. Salton and M.J. McGill, introduction lo Modern Information Retrieval. Mc Graw Hill Computer Series, Mc Graw Hill, New York, 1983. Google ScholarDigital Library
Saussure 49.F. De Saussure, Cours de Linguislique Generale, Qualri#rne edition. Librairie Payot, Paris, France, 1949.Google Scholar
Smadja 89.F.A. Smadja, Lea:ical Co-occurrence: The Missirtg link. To appear in the Journal 0f the Association for Literary and Linguistic computing, 1989.Google Scholar
Sparck Jones 86.K. Sparck Jones, Synonymy and Semantic Classification. Edinburgh University Press, Scotland, 1986. Google ScholarDigital Library

Index Terms

Recommendations

Full text indexing based on lexical relations an application: software libraries
Special issue: Proceedings of the 12th annual international ACMSIGIR conference on Research and development in information retrieval, N.J. Belkin and C.J. van Rijsbergen (Eds.), June 25-28, 1989, Cambridge, MA.

In contrast to other kinds of libraries, software libraries need to be conceptually organized. When looking for a component, the main concern of users is the functionality of the desired component; implementation details are secondary. Software reuse ...
Read More
Concurrency and Recovery in Full-Text Indexing
SPIRE '99: Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware

An important feature of a document database system is that the documents can be retrieved by searching for words from their contents. In a full-text index, each word of the stored documents can be used as a search key. Inserting a new document into the ...
Read More
Capturing paradigmatic and syntagmatic lexical relations: towards accurate Chinese part-of-speech tagging
ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
May 1989
257 pages
ISBN:0897913213
DOI:10.1145/75334
Editors:
N. J. Belkin,
C. J. van Rijsbergen
ACM SIGIR Forum Volume 23, Issue SI
Special issue: Proceedings of the 12th annual international ACMSIGIR conference on Research and development in information retrieval, N.J. Belkin and C.J. van Rijsbergen (Eds.), June 25-28, 1989, Cambridge, MA.
June 1989
243 pages
ISSN:0163-5840
DOI:10.1145/75335
Editors:
Vijay Raghavan
University of Southwestern Louisiana, Lafayette, LA
,
William B. Frakes
AT&T Bell Laboratories, Holmdel, NJ
,
N. J. Belkin,
C. J. van Rijsbergen
Issue’s Table of Contents
Copyright © 1989 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1989
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 64
  Total Citations
  View Citations
- 885
  Total Downloads
- Downloads (Last 12 months)103
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full text indexing based on lexical relations an application: software libraries

SIGIR '89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Full text indexing based on lexical relations an application: software libraries

Concurrency and Recovery in Full-Text Indexing

Capturing paradigmatic and syntagmatic lexical relations: towards accurate Chinese part-of-speech tagging