Skip to main content

Using stem rules to refine document retrieval queries

  • Conference paper
  • First Online:
Flexible Query Answering Systems (FQAS 1998)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1495))

Included in the following conference series:

Abstract

In this paper, a data mining approach for query refinement is proposed using Association Rules (ARs) among keywords being extracted from a document database. When a query is under-specified or contains ambiguous keywords, a set of association rules will be displayed to assist the user to choose additional keywords in order to refine his/her original query. To the best of our knowledge, no reported study has discussed on how to screen the number of documents being retrieved using ARs. The issues we are concerned in this paper are as follows. First, an AR, X ⟹ Y, with high confidence will intend to show that the number of documents that contain both sets of keywords X and Y is large. Therefore, the effectiveness of using minimum support and minimum confidence to screen documents can be little. To address this issue, maximum support and maximum confidence are used. Second, a large number of rules will be stored in a rule base, and will be displayed at run time in response to a user query. In order to reduce the number of rules, in this paper, we introduce two co-related concepts: “stem rule” and “coverage”. The stem rules are the rules by which other rules can be derived. A set of keywords is said to be a coverage of a set of documents if these documents can be retrieved using the same set of keywords. A minimum coverage can reduce the number of keywords to cover a certain number of documents, and therefore can assist to reduce the number of rules to be managed. In order to demonstrate the applicability of the proposed method, we have built an interactive interface, and a mediumsized document database is maintained. The effectiveness of using ARs to screen will be addressed in this paper as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, T.Imielinski and A.Swami: Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD'93, pp.207–216, Washington, DC, USA.

    Google Scholar 

  2. J.Allan: Relevance Feedback With Too Much Data. ACM SIGIR'95, pp.337–343, Seattll, WA, USA.

    Google Scholar 

  3. T. Andreasen, H. L. Larsen, & H. Christiansen: Term Associations and Flexible Querying. Proc. FQAS'98, International Conference on Flexible Query Answering Systems, May 13–15, 1998, Roskilde, Danmark. Lecture Notes in Artificial Intelligence, Springer-Verlag 1998 (this volume).

    Google Scholar 

  4. C. Buckley et al. Automatic query expansion using SMART: TREC 3. In D. K. Harman ed. Overview of the 3rd Text REtrieval Conference. NIST Special Publication, 1995.

    Google Scholar 

  5. C.M.Chen and N.Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. ACM SIGMOD'94, pp.161–172, Minneapolis, Minnesota, USA.

    Google Scholar 

  6. H. Chen, Y. Liu & N. Ohbo: Keyword Document Retrieval by Data Mining. IPSJ SIG Notes, Vol.97(64), pp.227–232, Sapporo, Japan, 1997 (in Japanese)

    Google Scholar 

  7. U.Fayyad, G.Piatestsky & P.Smyth: From Data Mining to Knowledge Discovery in Databases. The 3rd Knowledge Discovery and Data Mining, pp.37–53, California, USA, 1996.

    Google Scholar 

  8. J.Han and Y.Fu: Discovery of Multiple-Level Association Rules from Large Databases. 21st VLDB, pp.420–431, Zurich, Swizerland, 1995.

    Google Scholar 

  9. M. Nagao et al. ed. Encyclopedic Dictionary of Computer Science. ISBN4-00-080074-4, pp.215, 1990(in Japanese).

    Google Scholar 

  10. H.J. Peat and P. Willett: The Limitations of Term Co-Occurrence Data for Data for Query Expansion in Document Retrieval Systems. Journal of The American Society for Information Science, vol.42(5), pp.378–383, 1991.

    Article  Google Scholar 

  11. A.Savasere, E.Omiecinski and S.Navathe: An Efficient Algorithm for Mining Association Rules in Large Databases. 21st VLDB, pp.432–444, Zurich, Swizerland, 1995.

    Google Scholar 

  12. G. Salton and C. Buckley: Improving Retrieval Performance By Relevance Feedback. Journal of The American Society for Information Science, vol.41(4), pp.288–297, 1990.

    Article  Google Scholar 

  13. R.Srikant and R.Agrawal: Mining Quantitative Association Rules in Large Relational Tables. ACM SIGMOD'96, pp.1–12, Montreal, Canada, 1996.

    Google Scholar 

  14. Jinxi Xu and W.Bruce Croft: Query Expansion Using Local and Global Document Analysis. ACM SIGIR '96, pp.4–11, Zurich, Switzerland, 1996.

    Google Scholar 

  15. B. Vélez, et al: Fast and Effective Query Refinement. ACM SIGIR'97, pp.6–15, Philadelphia, PA, USA 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Troels Andreasen Henning Christiansen Henrik Legind Larsen

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, Y., Chen, H., Yu, J.X., Ohbo, N. (1998). Using stem rules to refine document retrieval queries. In: Andreasen, T., Christiansen, H., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 1998. Lecture Notes in Computer Science, vol 1495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0056006

Download citation

  • DOI: https://doi.org/10.1007/BFb0056006

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65082-9

  • Online ISBN: 978-3-540-49655-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics