An On-Line Document Clustering Method Based on Forgetting Factors

Ishikawa, Yoshiharu; Chen, Yibing; Kitagawa, Hiroyuki

doi:10.1007/3-540-44796-2_28

Yoshiharu Ishikawa⁷,
Yibing Chen⁸^nAff9 &
Hiroyuki Kitagawa⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2163))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

855 Accesses

Abstract

With the rapid development of on-line information services, information technologies for on-line information processing have been receiving much attention recently. Clustering plays important roles in various on-line applications such as extraction of useful information from news feeding services and selection of relevant documents from the incoming scientific articles in digital libraries. In on-line environments, users generally have interests on newer documents than older ones and have no interests on obsolete old documents.

Based on this observation, we propose an on-line document clustering method F ²ICM (Forgetting-Factor-based Incremental Clustering Method) that incorporates the notion of a forgetting factor to calculate document similarities. The idea is that every document gradually losses its weight (or memory) as time passes according to this factor. Since F²ICM generates clusters using a document similarity measure based on the forgetting factor, newer documents have much effects on the resulting cluster structure than older ones. In this paper, we present the fundamental idea of the F²ICM method and describe its details such as the similarity measure and the clustering algorithm. Also, we show an efficient incremental statistics maintenance method of F²ICM which is indispensable for on-line dynamic environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Semi-supervised Document Clustering via Loci

SMGKM: An Efficient Incremental Algorithm for Clustering Document Collections

A New Evolving Tree-Based Model with Local Re-learning for Document Clustering and Visualization

Article 06 February 2017

References

J.R. Anderson (ed.), Rules of the Mind, Lawrence Erlbaum Associates, Hillsdale, NJ, 1993.
Google Scholar
R. Baeza-Yates and B. Ribeiro-Neto. (eds.), Modern Information Retrieval, Addison-Wesley, 1999.
Google Scholar
F. Can, “Incremental Clustering for Dynamic Information Processing”, ACM TOIS, 11(2), pp. 143–164, 1993.
Article Google Scholar
D.R. Cutting, D.R. Karger, J.O. Pedersen, “Constraint Interaction-Time Scatter/Gather Browsing of Very Large Document Collections”, Proc. ACM SIGIR, pp. 126–134, 1993.
Google Scholar
W.B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structure & Algorithms, Prentice-Hall, 1992.
Google Scholar
Y. Ishikawa, Y. Chen, and H. Kitagawa, “An Online Document Clustering Method Based on Forgetting Factors (long version)”, available from http://www.kde.is.tsukuba.ac.jp/~ishikawa/ecdl01-long.pdf.
A.K. Jain, M.N. Murty, P.J. Flynn, “Data Clustering: A Review”, ACM Computing Surveys, 31(3), 1999.
Google Scholar
G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983.
Google Scholar
C.J. van Rijsbergen, Information Retrieval (2nd ed.), Butterworth, 1979.
Google Scholar
Y. Yang, J.G. Carbonell, R.D. Brown, T. Pierce, B.T. Archibald, X. Liu, “Learning Approaches for Detecting and Tracking News Events”, IEEE Intelligent Systems, 14(4), 1999.
Google Scholar

Download references

Author information

Yibing Chen
Present address: Yamatake Building System Co. Ltd., Japan

Authors and Affiliations

Institute of Information Sciences and Electronics, University of Tsukuba, Japan
Yoshiharu Ishikawa & Hiroyuki Kitagawa
Master’s Program in Science and Engineering, University of Tsukuba, Japan
Yibing Chen

Authors

Yoshiharu Ishikawa
View author publications
You can also search for this author in PubMed Google Scholar
Yibing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Kitagawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Crete, Leof. Knossou, P.O. Box 1470, 71409, Heraklion, Greece
Panos Constantopoulos
Foundation for Research and Technology - Hellas, Institute of Computer Science, Vassilika Vouton, P.O. Box 1385, 71110, Heraklion, Greece
Panos Constantopoulos
Department of Computer and Information Science, The Norwegian University of Science and Technology, 7491, Trondheim, Norway
Ingeborg T. Sølvberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ishikawa, Y., Chen, Y., Kitagawa, H. (2001). An On-Line Document Clustering Method Based on Forgetting Factors. In: Constantopoulos, P., Sølvberg, I.T. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2001. Lecture Notes in Computer Science, vol 2163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44796-2_28

Download citation

DOI: https://doi.org/10.1007/3-540-44796-2_28
Published: 30 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42537-3
Online ISBN: 978-3-540-44796-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics