Abstract
As document collections accummulate over time, some of the discussion subjects in them become outfashioned, while new ones emerge. Then, old classification schemes should be updated. In this paper, we address the challenge of finding emerging and persistent “themes”, i.e. subjects that live long enough to be incorporated into a taxonomy or ontology describing the document collection. We focus on the identification of cluster labels that “survive” changes in the constitution of the underlying population of documents, including changes in the feature space of dominant words, because the terminology of the document archive also changes over time. We have conducted a set of promising experiments on the identification of themes that manifested themselves in section H2.8 of the ACM digital library and juxtapose them with the classes foreseen in the ACM taxonomy for this section.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, C.: On change diagnosis in evolving data streams. IEEE TKDE 17(5), 587–600 (2005)
Allan, J.: Introduction to Topic Detection and Tracking. Kluwer Academic Publishers, Dordrecht (2002)
Borgelt, C., Nürnberger, A.: Experiments in Document Clustering using Cluster Specific Term Weights. In: Proc. Workshop Machine Learning and Interaction for Text-based Information Retrieval (TIR 2004), Germany, pp. 55–68. University of Ulm (2004)
Ganti, V., Gehrke, J., Ramakrishnan, R.: A Framework for Measuring Changes in Data Characteristics. In: Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Philadelphia, Pennsylvania, May 1999, pp. 126–137. ACM Press, New York (1999)
Kontostathis, A., Galitsky, L., Pottenger, W.M., Roy, S., Phelps, D.J.: A Survey of Emerging Trend Detection in Textual Data Mining. Springer, Heidelberg (2003)
Moringa, S., Yamanishi, K.: Tracking Dynamics of Topic Trends Using a Finite Mixture Model. In: Kohavi, R., Gehrke, J., DuMouchel, W., Ghosh, J. (eds.) Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, August 2004, pp. 811–816. ACM Press, New York (2004)
Mei, Q., Zhai, C.: Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining. In: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, Chicago, Illinois, USA, August 2005, pp. 198–207. ACM Press, New York (2005)
Neill, D., Moore, A., Sabhnani, M., Daniel, K.: Detection of emerging space-time clusters. In: Proc. of KDD 2005, Chicago, IL, August 2005, pp. 218–227 (2005)
Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., Schult, R.: Monic – modeling and monitoring cluster transitions. In: Proc. of 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2006), Philadelphia, USA, August 2006, pages. 6. ACM Press, New York (2006)
Schult, R., Spiliopoulou, M.: Expanding the Taxonomies of Bibliographic Archives with Persistent Long-Term Themes. In: SAC 2006, ACM Press, New York (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schult, R., Spiliopoulou, M. (2006). Discovering Emerging Topics in Unlabelled Text Collections. In: Manolopoulos, Y., Pokorný, J., Sellis, T.K. (eds) Advances in Databases and Information Systems. ADBIS 2006. Lecture Notes in Computer Science, vol 4152. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11827252_27
Download citation
DOI: https://doi.org/10.1007/11827252_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37899-0
Online ISBN: 978-3-540-37900-3
eBook Packages: Computer ScienceComputer Science (R0)