Skip to main content
Log in

Enhancing Concept-Based Retrieval Based on Minimal Term Sets

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

There is considerable interest in bridging the terminological gap that exists between the way users prefer to specify their information needs and the way queries are expressed in terms of keywords or text expressions that occur in documents. One of the approaches proposed for bridging this gap is based on technologies for expert systems. The central idea of such an approach was introduced in the context of a system called Rule Based Information Retrieval by Computer (RUBRIC). In RUBRIC, user query topics (or concepts) are captured in a rule base represented by an AND/OR tree. The evaluation of AND/OR tree is essentially based on minimum and maximum weights of query terms for conjunctions and disjunctions, respectively. The time to generate the retrieval output of AND/OR tree for a given query topic is exponential in number of conjunctions in the DNF expression associated with the query topic. In this paper, we propose a new approach for computing the retrieval output. The proposed approach involves preprocessing of the rule base to generate Minimal Term Sets (MTSs) that speed up the retrieval process. The computational complexity of the on-line query evaluation following the preprocessing is polynomial in m. We show that the computation and use of MTSs allows a user to choose query topics that best suit their needs and to use retrieval functions that yield a more refined and controlled retrieval output than is possible with the AND/OR tree when document terms are binary. We incorporate p-Norm model into the process of evaluating MTSs to handle the case where weights of both documents and query terms are non-binary.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Alsaffar, A.H., Deogun, J.S., Raghavan, V.V., and Sever, H. (1999). Concept based retrieval by minimal term sets. In Z.W. Ras and A. Skowron (Eds.), Foundations of Intelligent Systems: Eleventh Int'l Symposium on Methodologies for Intelligent Systems, ISMIS'99 Proceedings (pp. 114–122). Warsaw, Poland: Springer Verlag.

    Google Scholar 

  • Croft, W.B. (1977). Clustering Large Files of Documents Using the Single Link Method, Journal of the American Society in Information Science (JASIS), 28(6), 341–344.

    Google Scholar 

  • McCune, B.P., Tong, R.M., Dean, J.S., and Shapiro, D.G. (1985). RUBRIC: A System for Rule-Based Information Retrieval, IEEE Transactions on Software Engineering, 11(9), 939–944.

    Google Scholar 

  • Noreault, T., Koll, M., and McGill, M.J. (1981). Automatic Ranked Output from Boolean Searches in SIRE, Journal of the American Society in Information Science, 32(4), 275–279.

    Google Scholar 

  • Raghavan, V.V. and Sever, H. (1995). On the Reuse of Past Optimal Queries. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 344–350).

  • Raghavan, V.V. and Yu, C.T. (1979). Experiments on the Determination of the Relationships Between Terms, ACM Transactions on Database Systems, 4(2), 240–260.

    Google Scholar 

  • Salton, G. (1980). Automatic Term Class Construction Using Relevance-A Summary of Work in Automatic Pseudoclassification, Information Processing and Management, 16(1), 1–15.

    Google Scholar 

  • Salton, G. (1989). Automatic Text Processing. The Transformation and Retrieval of Information by Computer, Reading, Massachusetts: Addison-Wesely Publishing Co.

    Google Scholar 

  • Salton, G., Allan, J., and Buckley, C. (1994). Automatic Structuring and Retrieval of Large Text Files, Communications of the ACM, 37(2), 97–108.

    Google Scholar 

  • Salton, G. and Buckley, C. (1990). Improving Retrieval Performance by Relevance Feedback, Journal of the American Society for Information Science, 41(4), 288–297.

    Google Scholar 

  • Salton, G., Fox, E.A., and Wu, H. (1983). Extended Boolean Information Retrieval, Communications of the ACM, 26(11), 1022–1036.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alsaffar, A., Deogun, J., Raghavan, V. et al. Enhancing Concept-Based Retrieval Based on Minimal Term Sets. Journal of Intelligent Information Systems 14, 155–173 (2000). https://doi.org/10.1023/A:1008783718847

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008783718847

Navigation