Abstract
With the rapid development of on-line information services, information technologies for on-line information processing have been receiving much attention recently. Clustering plays important roles in various on-line applications such as extraction of useful information from news feeding services and selection of relevant documents from the incoming scientific articles in digital libraries. In on-line environments, users generally have interests on newer documents than older ones and have no interests on obsolete old documents.
Based on this observation, we propose an on-line document clustering method F 2ICM (Forgetting-Factor-based Incremental Clustering Method) that incorporates the notion of a forgetting factor to calculate document similarities. The idea is that every document gradually losses its weight (or memory) as time passes according to this factor. Since F2ICM generates clusters using a document similarity measure based on the forgetting factor, newer documents have much effects on the resulting cluster structure than older ones. In this paper, we present the fundamental idea of the F2ICM method and describe its details such as the similarity measure and the clustering algorithm. Also, we show an efficient incremental statistics maintenance method of F2ICM which is indispensable for on-line dynamic environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
J.R. Anderson (ed.), Rules of the Mind, Lawrence Erlbaum Associates, Hillsdale, NJ, 1993.
R. Baeza-Yates and B. Ribeiro-Neto. (eds.), Modern Information Retrieval, Addison-Wesley, 1999.
F. Can, “Incremental Clustering for Dynamic Information Processing”, ACM TOIS, 11(2), pp. 143–164, 1993.
D.R. Cutting, D.R. Karger, J.O. Pedersen, “Constraint Interaction-Time Scatter/Gather Browsing of Very Large Document Collections”, Proc. ACM SIGIR, pp. 126–134, 1993.
W.B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structure & Algorithms, Prentice-Hall, 1992.
Y. Ishikawa, Y. Chen, and H. Kitagawa, “An Online Document Clustering Method Based on Forgetting Factors (long version)”, available from http://www.kde.is.tsukuba.ac.jp/~ishikawa/ecdl01-long.pdf.
A.K. Jain, M.N. Murty, P.J. Flynn, “Data Clustering: A Review”, ACM Computing Surveys, 31(3), 1999.
G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983.
C.J. van Rijsbergen, Information Retrieval (2nd ed.), Butterworth, 1979.
Y. Yang, J.G. Carbonell, R.D. Brown, T. Pierce, B.T. Archibald, X. Liu, “Learning Approaches for Detecting and Tracking News Events”, IEEE Intelligent Systems, 14(4), 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ishikawa, Y., Chen, Y., Kitagawa, H. (2001). An On-Line Document Clustering Method Based on Forgetting Factors. In: Constantopoulos, P., Sølvberg, I.T. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2001. Lecture Notes in Computer Science, vol 2163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44796-2_28
Download citation
DOI: https://doi.org/10.1007/3-540-44796-2_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42537-3
Online ISBN: 978-3-540-44796-2
eBook Packages: Springer Book Archive