Using stem rules to refine document retrieval queries

Liu, Ye; Chen, Hanxiong; Yu, Jeffrey Xu; Ohbo, Nobuo

doi:10.1007/BFb0056006

Ye Liu¹,
Hanxiong Chen²,
Jeffrey Xu Yu³ &
…
Nobuo Ohbo¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1495))

Included in the following conference series:

International Conference on Flexible Query Answering Systems

106 Accesses
3 Citations
3 Altmetric

Abstract

In this paper, a data mining approach for query refinement is proposed using Association Rules (ARs) among keywords being extracted from a document database. When a query is under-specified or contains ambiguous keywords, a set of association rules will be displayed to assist the user to choose additional keywords in order to refine his/her original query. To the best of our knowledge, no reported study has discussed on how to screen the number of documents being retrieved using ARs. The issues we are concerned in this paper are as follows. First, an AR, X ⟹ Y, with high confidence will intend to show that the number of documents that contain both sets of keywords X and Y is large. Therefore, the effectiveness of using minimum support and minimum confidence to screen documents can be little. To address this issue, maximum support and maximum confidence are used. Second, a large number of rules will be stored in a rule base, and will be displayed at run time in response to a user query. In order to reduce the number of rules, in this paper, we introduce two co-related concepts: “stem rule” and “coverage”. The stem rules are the rules by which other rules can be derived. A set of keywords is said to be a coverage of a set of documents if these documents can be retrieved using the same set of keywords. A minimum coverage can reduce the number of keywords to cover a certain number of documents, and therefore can assist to reduce the number of rules to be managed. In order to demonstrate the applicability of the proposed method, we have built an interactive interface, and a mediumsized document database is maintained. The effectiveness of using ARs to screen will be addressed in this paper as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, T.Imielinski and A.Swami: Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD'93, pp.207–216, Washington, DC, USA.
Google Scholar
J.Allan: Relevance Feedback With Too Much Data. ACM SIGIR'95, pp.337–343, Seattll, WA, USA.
Google Scholar
T. Andreasen, H. L. Larsen, & H. Christiansen: Term Associations and Flexible Querying. Proc. FQAS'98, International Conference on Flexible Query Answering Systems, May 13–15, 1998, Roskilde, Danmark. Lecture Notes in Artificial Intelligence, Springer-Verlag 1998 (this volume).
Google Scholar
C. Buckley et al. Automatic query expansion using SMART: TREC 3. In D. K. Harman ed. Overview of the 3rd Text REtrieval Conference. NIST Special Publication, 1995.
Google Scholar
C.M.Chen and N.Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. ACM SIGMOD'94, pp.161–172, Minneapolis, Minnesota, USA.
Google Scholar
H. Chen, Y. Liu & N. Ohbo: Keyword Document Retrieval by Data Mining. IPSJ SIG Notes, Vol.97(64), pp.227–232, Sapporo, Japan, 1997 (in Japanese)
Google Scholar
U.Fayyad, G.Piatestsky & P.Smyth: From Data Mining to Knowledge Discovery in Databases. The 3rd Knowledge Discovery and Data Mining, pp.37–53, California, USA, 1996.
Google Scholar
J.Han and Y.Fu: Discovery of Multiple-Level Association Rules from Large Databases. 21st VLDB, pp.420–431, Zurich, Swizerland, 1995.
Google Scholar
M. Nagao et al. ed. Encyclopedic Dictionary of Computer Science. ISBN4-00-080074-4, pp.215, 1990(in Japanese).
Google Scholar
H.J. Peat and P. Willett: The Limitations of Term Co-Occurrence Data for Data for Query Expansion in Document Retrieval Systems. Journal of The American Society for Information Science, vol.42(5), pp.378–383, 1991.
Article Google Scholar
A.Savasere, E.Omiecinski and S.Navathe: An Efficient Algorithm for Mining Association Rules in Large Databases. 21st VLDB, pp.432–444, Zurich, Swizerland, 1995.
Google Scholar
G. Salton and C. Buckley: Improving Retrieval Performance By Relevance Feedback. Journal of The American Society for Information Science, vol.41(4), pp.288–297, 1990.
Article Google Scholar
R.Srikant and R.Agrawal: Mining Quantitative Association Rules in Large Relational Tables. ACM SIGMOD'96, pp.1–12, Montreal, Canada, 1996.
Google Scholar
Jinxi Xu and W.Bruce Croft: Query Expansion Using Local and Global Document Analysis. ACM SIGIR '96, pp.4–11, Zurich, Switzerland, 1996.
Google Scholar
B. Vélez, et al: Fast and Effective Query Refinement. ACM SIGIR'97, pp.6–15, Philadelphia, PA, USA 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Electronic & Information Science, University of Tsukuba, 305, Tsukuba, Japan
Ye Liu & Nobuo Ohbo
Tsukuba International University, Manabe 6, 300, Tsuchiura, Japan
Hanxiong Chen
Department of Computer Science, Australian National University, 0200, Canberra, ACT, Australia
Jeffrey Xu Yu

Authors

Ye Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hanxiong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Xu Yu
View author publications
You can also search for this author in PubMed Google Scholar
Nobuo Ohbo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Troels Andreasen Henning Christiansen Henrik Legind Larsen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Chen, H., Yu, J.X., Ohbo, N. (1998). Using stem rules to refine document retrieval queries. In: Andreasen, T., Christiansen, H., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 1998. Lecture Notes in Computer Science, vol 1495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0056006

Download citation

DOI: https://doi.org/10.1007/BFb0056006
Published: 31 May 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65082-9
Online ISBN: 978-3-540-49655-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics