Abstract
There is considerable interest in bridging the terminological gap that exists between the way users prefer to specify their information needs and the way queries are expressed in terms of keywords or text expressions that occur in documents. One of the approaches proposed for bridging this gap is based on technologies for expert systems. The central idea of such an approach was introduced in the context of a system called Rule Based Information Retrieval by Computer (RUBRIC). In RUBRIC, user query topics (or concepts) are captured in a rule base represented by an AND/OR tree. The evaluation of AND/OR tree is essentially based on minimum and maximum weights of query terms for conjunctions and disjunctions, respectively. The time to generate the retrieval output of AND/OR tree for a given query topic is exponential in number of conjunctions in the DNF expression associated with the query topic. In this paper, we propose a new approach for computing the retrieval output. The proposed approach involves preprocessing of the rule base to generate Minimal Term Sets (MTSs) that speed up the retrieval process. The computational complexity of the on-line query evaluation following the preprocessing is polynomial in m. We show that the computation and use of MTSs allows a user to choose query topics that best suit their needs and to use retrieval functions that yield a more refined and controlled retrieval output than is possible with the AND/OR tree when document terms are binary. We incorporate p-Norm model into the process of evaluating MTSs to handle the case where weights of both documents and query terms are non-binary.
Similar content being viewed by others
References
Alsaffar, A.H., Deogun, J.S., Raghavan, V.V., and Sever, H. (1999). Concept based retrieval by minimal term sets. In Z.W. Ras and A. Skowron (Eds.), Foundations of Intelligent Systems: Eleventh Int'l Symposium on Methodologies for Intelligent Systems, ISMIS'99 Proceedings (pp. 114–122). Warsaw, Poland: Springer Verlag.
Croft, W.B. (1977). Clustering Large Files of Documents Using the Single Link Method, Journal of the American Society in Information Science (JASIS), 28(6), 341–344.
McCune, B.P., Tong, R.M., Dean, J.S., and Shapiro, D.G. (1985). RUBRIC: A System for Rule-Based Information Retrieval, IEEE Transactions on Software Engineering, 11(9), 939–944.
Noreault, T., Koll, M., and McGill, M.J. (1981). Automatic Ranked Output from Boolean Searches in SIRE, Journal of the American Society in Information Science, 32(4), 275–279.
Raghavan, V.V. and Sever, H. (1995). On the Reuse of Past Optimal Queries. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 344–350).
Raghavan, V.V. and Yu, C.T. (1979). Experiments on the Determination of the Relationships Between Terms, ACM Transactions on Database Systems, 4(2), 240–260.
Salton, G. (1980). Automatic Term Class Construction Using Relevance-A Summary of Work in Automatic Pseudoclassification, Information Processing and Management, 16(1), 1–15.
Salton, G. (1989). Automatic Text Processing. The Transformation and Retrieval of Information by Computer, Reading, Massachusetts: Addison-Wesely Publishing Co.
Salton, G., Allan, J., and Buckley, C. (1994). Automatic Structuring and Retrieval of Large Text Files, Communications of the ACM, 37(2), 97–108.
Salton, G. and Buckley, C. (1990). Improving Retrieval Performance by Relevance Feedback, Journal of the American Society for Information Science, 41(4), 288–297.
Salton, G., Fox, E.A., and Wu, H. (1983). Extended Boolean Information Retrieval, Communications of the ACM, 26(11), 1022–1036.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Alsaffar, A., Deogun, J., Raghavan, V. et al. Enhancing Concept-Based Retrieval Based on Minimal Term Sets. Journal of Intelligent Information Systems 14, 155–173 (2000). https://doi.org/10.1023/A:1008783718847
Issue Date:
DOI: https://doi.org/10.1023/A:1008783718847