Skip to main content

On the Use of Consensus Clustering for Incremental Learning of Topic Hierarchies

  • Conference paper
Book cover Advances in Artificial Intelligence - SBIA 2012 (SBIA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7589))

Included in the following conference series:

Abstract

Incremental learning of topic hierarchies is very useful to organize and manage growing text collections, thereby summarizing the implicit knowledge from textual data. However, currently available methods have some limitations to perform the incremental learning phase. In particular, when the initial topic hierarchy is not suitable for modeling the data, new documents are inserted into inappropriate topics and this error gets propagated into future hierarchy updates, thus decreasing the quality of the knowledge extraction process. We introduce a method for obtaining more robust initial topic hierarchies by using consensus clustering. Experimental results on several text collections show that our method significantly reduces the degradation of the topic hierarchies during the incremental learning compared to a traditional method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Zhai, C.: A survey of text clustering algorithms. In: Mining Text Data, pp. 77–128. Springer (2012)

    Google Scholar 

  2. Ayad, H., Kamel, M.S.: Topic Discovery from Text Using Aggregation of Different Clustering Methods. In: Cohen, R., Spencer, B. (eds.) Canadian AI 2002. LNCS (LNAI), vol. 2338, pp. 161–175. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  3. Bouchachia, A.: Incremental learning. In: Encyclopedia of Data Warehousing and Mining, pp. 1006–1012. IGI Global (2008)

    Google Scholar 

  4. Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys (CSUR) 41(3), 17:1–17:38 (2009)

    Article  Google Scholar 

  5. Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: a cluster-based approach to browsing large document collections. In: 15th ACM SIGIR Int. Conf. on Research and Development in Inf. Retrieval, pp. 318–329 (1992)

    Google Scholar 

  6. Grossman, D.A., Frieder, O.: Information Retrieval: Algorithms and Heuristics. Springer, Secaucus (2004)

    Book  Google Scholar 

  7. Guha, S., Rastogi, R., Shim, K.: Cure: an efficient clustering algorithm for large databases. ACM SIGMOD Record 27(2), 73–84 (1998)

    Article  Google Scholar 

  8. Hofmann, T.: The cluster-abstraction model: unsupervised learning of topic hierarchies from text data. In: 16th IJCAI International Joint Conference on Artificial Intelligence, pp. 682–687 (1999)

    Google Scholar 

  9. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31(8), 651–666 (2010)

    Article  Google Scholar 

  10. Ke, W., Sugimoto, C.R., Mostafa, J.: Dynamicity vs. effectiveness: studying online clustering for scatter/gather. In: 32nd ACM SIGIR Int. Conf. on Research and Development in Inf. Retrieval, pp. 19–26 (2009)

    Google Scholar 

  11. Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999)

    Google Scholar 

  12. Liu, B.: Unsupervised Learning. In: Web Data Mining - Exploring Hyperlinks, Contents, and Usage Data, 2nd edn., pp. 133–166. Springer, Heidelberg (2011)

    Google Scholar 

  13. Marcacini, R.M., Rezende, S.O.: Incremental construction of topic hierarchies using hierarchical term clustering. In: 22nd International Conference on Software Engineering and Knowledge Engineering (SEKE), pp. 553–558. KSI Press (2010)

    Google Scholar 

  14. Muhr, M., Kern, R., Granitzer, M.: Analysis of structural relationships for hierarchical cluster labeling. In: 33rd ACM SIGIR Int. Conf. on Research and Development in Inf. Retrieval, pp. 178–185 (2010)

    Google Scholar 

  15. Pons-Porrata, A., Berlanga-Llavori, R., Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques. Inf. Process. Manage. 43(3), 752–768 (2007)

    Article  Google Scholar 

  16. Rokach, L.: A survey of clustering algorithms. In: Data Mining and Knowledge Discovery Handbook, 2nd edn., pp. 269–298. Springer (2010)

    Google Scholar 

  17. Sánchez, D., Moreno, A.: Creating Topic Hierarchies for Large Medical Libraries. In: Riaño, D., ten Teije, A., Miksch, S., Peleg, M. (eds.) KR4HC 2009. LNCS, vol. 5943, pp. 1–13. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  18. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2003)

    MATH  MathSciNet  Google Scholar 

  19. Zavitsanos, E., Paliouras, G., Vouros, G.A.: Non-parametric estimation of topic hierarchies from texts with hierarchical dirichlet processes. Journal of Machine Learning Research 9, 2749–2775 (2011)

    MathSciNet  Google Scholar 

  20. Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery 10(2), 141–168 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Marcacini, R.M., Hruschka, E.R., Rezende, S.O. (2012). On the Use of Consensus Clustering for Incremental Learning of Topic Hierarchies. In: Barros, L.N., Finger, M., Pozo, A.T., Gimenénez-Lugo, G.A., Castilho, M. (eds) Advances in Artificial Intelligence - SBIA 2012. SBIA 2012. Lecture Notes in Computer Science(), vol 7589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34459-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34459-6_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34458-9

  • Online ISBN: 978-3-642-34459-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics