Skip to main content

Term weighting in query-based document clustering

Extended abstract

  • Regular Papers
  • Conference paper
  • First Online:
Book cover Advances in Databases and Information Systems (ADBIS 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1475))

  • 164 Accesses

Abstract

Search agents in the World-Wide Web are able to find documents which match user-submitted queries. As the number of matching documents returned by the agent is often very large, it would be easier for the user to browse through clusters rather than individual documents when searching for the documents which really satisfy his information needs. In this paper we introduce a new approach to term weighting for the document similarity measures needed in clustering. The approach is based on term occurrence frequencies within the set of documents which match an initial query. We suggest that this approach might perhaps be useful in noise reduction in the very rich and heterogenous environment of the Web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. M. A. Hearst & J. O. Pedersen: “Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results”. In Proc. of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Zurich, Switzerland 1996.

    Google Scholar 

  2. K. Korpimies & E. Ukkonen: “Searching for General Documents”. In Proc. of the International Conference on Flexible Query Answering Systems (FQAS’98). Lecture Notes in Artificial Intelligence, Springer 1998.

    Google Scholar 

  3. Gerald Salton: “Automatic text processing: the transformation, analysis, and retrieval of information by computer”. Addison-Wesley, Reading, MA, 1989.

    Google Scholar 

  4. O. Zamir, O. Etzioni, O. Madani & R. M. Karp: “Fast and Intuitive Clustering of Web Documents”. In Proc. the Third International Conference on Knowledge Discovery and Data Mining. Newport Beach, CA, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Witold Litwin Tadeusz Morzy Gottfried Vossen

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Korpimies, K., Ukkonen, E. (1998). Term weighting in query-based document clustering. In: Litwin, W., Morzy, T., Vossen, G. (eds) Advances in Databases and Information Systems. ADBIS 1998. Lecture Notes in Computer Science, vol 1475. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0057726

Download citation

  • DOI: https://doi.org/10.1007/BFb0057726

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64924-3

  • Online ISBN: 978-3-540-68309-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics