Abstract
Incremental learning of topic hierarchies is very useful to organize and manage growing text collections, thereby summarizing the implicit knowledge from textual data. However, currently available methods have some limitations to perform the incremental learning phase. In particular, when the initial topic hierarchy is not suitable for modeling the data, new documents are inserted into inappropriate topics and this error gets propagated into future hierarchy updates, thus decreasing the quality of the knowledge extraction process. We introduce a method for obtaining more robust initial topic hierarchies by using consensus clustering. Experimental results on several text collections show that our method significantly reduces the degradation of the topic hierarchies during the incremental learning compared to a traditional method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Zhai, C.: A survey of text clustering algorithms. In: Mining Text Data, pp. 77–128. Springer (2012)
Ayad, H., Kamel, M.S.: Topic Discovery from Text Using Aggregation of Different Clustering Methods. In: Cohen, R., Spencer, B. (eds.) Canadian AI 2002. LNCS (LNAI), vol. 2338, pp. 161–175. Springer, Heidelberg (2002)
Bouchachia, A.: Incremental learning. In: Encyclopedia of Data Warehousing and Mining, pp. 1006–1012. IGI Global (2008)
Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys (CSUR) 41(3), 17:1–17:38 (2009)
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: a cluster-based approach to browsing large document collections. In: 15th ACM SIGIR Int. Conf. on Research and Development in Inf. Retrieval, pp. 318–329 (1992)
Grossman, D.A., Frieder, O.: Information Retrieval: Algorithms and Heuristics. Springer, Secaucus (2004)
Guha, S., Rastogi, R., Shim, K.: Cure: an efficient clustering algorithm for large databases. ACM SIGMOD Record 27(2), 73–84 (1998)
Hofmann, T.: The cluster-abstraction model: unsupervised learning of topic hierarchies from text data. In: 16th IJCAI International Joint Conference on Artificial Intelligence, pp. 682–687 (1999)
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31(8), 651–666 (2010)
Ke, W., Sugimoto, C.R., Mostafa, J.: Dynamicity vs. effectiveness: studying online clustering for scatter/gather. In: 32nd ACM SIGIR Int. Conf. on Research and Development in Inf. Retrieval, pp. 19–26 (2009)
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999)
Liu, B.: Unsupervised Learning. In: Web Data Mining - Exploring Hyperlinks, Contents, and Usage Data, 2nd edn., pp. 133–166. Springer, Heidelberg (2011)
Marcacini, R.M., Rezende, S.O.: Incremental construction of topic hierarchies using hierarchical term clustering. In: 22nd International Conference on Software Engineering and Knowledge Engineering (SEKE), pp. 553–558. KSI Press (2010)
Muhr, M., Kern, R., Granitzer, M.: Analysis of structural relationships for hierarchical cluster labeling. In: 33rd ACM SIGIR Int. Conf. on Research and Development in Inf. Retrieval, pp. 178–185 (2010)
Pons-Porrata, A., Berlanga-Llavori, R., Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques. Inf. Process. Manage. 43(3), 752–768 (2007)
Rokach, L.: A survey of clustering algorithms. In: Data Mining and Knowledge Discovery Handbook, 2nd edn., pp. 269–298. Springer (2010)
Sánchez, D., Moreno, A.: Creating Topic Hierarchies for Large Medical Libraries. In: Riaño, D., ten Teije, A., Miksch, S., Peleg, M. (eds.) KR4HC 2009. LNCS, vol. 5943, pp. 1–13. Springer, Heidelberg (2010)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2003)
Zavitsanos, E., Paliouras, G., Vouros, G.A.: Non-parametric estimation of topic hierarchies from texts with hierarchical dirichlet processes. Journal of Machine Learning Research 9, 2749–2775 (2011)
Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery 10(2), 141–168 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Marcacini, R.M., Hruschka, E.R., Rezende, S.O. (2012). On the Use of Consensus Clustering for Incremental Learning of Topic Hierarchies. In: Barros, L.N., Finger, M., Pozo, A.T., Gimenénez-Lugo, G.A., Castilho, M. (eds) Advances in Artificial Intelligence - SBIA 2012. SBIA 2012. Lecture Notes in Computer Science(), vol 7589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34459-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-34459-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34458-9
Online ISBN: 978-3-642-34459-6
eBook Packages: Computer ScienceComputer Science (R0)