skip to main content
10.1145/2396761.2398483acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Hierarchical topic integration through semi-supervised hierarchical topic modeling

Published: 29 October 2012 Publication History

Abstract

Lots of document collections are well organized in hierarchical structure, and such structure can help users browse and understand these collections. Meanwhile, there are a large number of plain document collections loosely organized, and it is difficult for users to understand them effectively. In this paper we study how to automatically integrate latent topics in a plain collection with the topics in a hierarchical structured collection. We propose to use semi-supervised topic modeling to solve the problem in a principled way. The experiments show that the proposed method can generate both meaningful latent topics and expand high quality hierarchical topic structures.

References

[1]
D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum. Hierarchical topic models and the nested chinese restaurant process. Advances in neural information processing systems, 16:106, 2004.
[2]
D. Blei and J. McAuliffe. Supervised topic models. In Proceeding of the Neural Information Processing Systems(nips), 2007.
[3]
D. Blei and J. McAuliffe. Supervised topic models. Arxiv preprint arXiv:1003.0783, 2010.
[4]
D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003.
[5]
C. Chemudugunta, A. Holloway, P. Smyth, and M. Steyvers. Modeling documents by combining semantic concepts with unsupervised statistical learning. The Semantic Web-ISWC 2008, pages 229--244, 2008.
[6]
C. Chemudugunta, P. Smyth, and M. Steyvers. Combining concept hierarchies and statistical topic models. In Proceeding of the 17th ACM conference on Information and knowledge management, pages 1469--1470. ACM, 2008.
[7]
C. Chemudugunta, P. Smyth, and M. Steyvers. Text modeling using unsupervised topic models and concept hierarchies. Arxiv preprint arXiv:0808.0973, 2008.
[8]
T. Hofmann. Probabilistic latent semantic analysis. In Proc. of Uncertainty in Artificial Intelligence, UAI'99, page 21. Citeseer, 1999.
[9]
G. Karypis. Cluto: Software for clustering high dimensional datasets. Internet Website (last accessed, June 2008), http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview, 2005.
[10]
S. Lacoste-Julien, F. Sha, and M. Jordan. ndisclda: Discriminative learning for dimensionality reduction and classification. Advances in Neural Information Processing Systems, 21, 2008.
[11]
W. Li and A. McCallum. Pachinko allocation: Dag-structured mixture models of topic correlations. In Proceedings of the 23rd international conference on Machine learning, pages 577--584. ACM, 2006.
[12]
C. Manning, P. Raghavan, and H. Schutze. Introduction to information retrieval, volume 1. Cambridge University Press Cambridge, 2008.
[13]
X. Mao, Z. Ming, T. Chua, S. Li, H. Yan, and X. Li. Sshlda: A semi-supervised hierarchical topic model. Conference on Empirical Methods on Natural Language Processing, 2012.
[14]
D. Mimno, W. Li, and A. McCallum. Mixtures of hierarchical topics with pachinko allocation. In Proceedings of the 24th international conference on Machine learning, pages 633--640. ACM, 2007.
[15]
T. Minka. Estimating a dirichlet distribution. Annals of Physics, 2000(8):1--13, 2003.
[16]
Y. Petinot, K. McKeown, and K. Thadani. A hierarchical model of web summaries. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2, pages 670--675. Association for Computational Linguistics, 2011.
[17]
D. Ramage, P. Heymann, C. Manning, and H. Garcia-Molina. Clustering the tagged web. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 54--63. ACM, 2009.
[18]
D. Ramage, C. Manning, and S. Dumais. Partially labeled topic models for interpretable text mining. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 457--465. ACM, 2011.
[19]
M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In Proceedings of the 20th conference on Uncertainty in artificial intelligence, pages 487--494. AUAI Press, 2004.
[20]
T. Rubin, A. Chambers, P. Smyth, and M. Steyvers. Statistical topic models for multi-label document classification. Arxiv preprint arXiv:1107.2462, 2011.
[21]
Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(476):1566--1581, 2006.

Cited By

View all
  • (2019)Hierarchical Topic Models for Expanding Category Hierarchies2019 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BIGCOMP.2019.8679402(1-8)Online publication date: Feb-2019
  • (2017)A Topic Trend on P2P Based Social MediaAdvances in Network-Based Information Systems10.1007/978-3-319-65521-5_105(1136-1143)Online publication date: 24-Aug-2017
  • (2014)Constrained-hLDA for Topic Discovery in Chinese MicroblogsAdvances in Knowledge Discovery and Data Mining10.1007/978-3-319-06605-9_50(608-619)Online publication date: 2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
October 2012
2840 pages
ISBN:9781450311564
DOI:10.1145/2396761
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hierarchical topic modeling
  2. topical integration

Qualifiers

  • Short-paper

Conference

CIKM'12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Hierarchical Topic Models for Expanding Category Hierarchies2019 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BIGCOMP.2019.8679402(1-8)Online publication date: Feb-2019
  • (2017)A Topic Trend on P2P Based Social MediaAdvances in Network-Based Information Systems10.1007/978-3-319-65521-5_105(1136-1143)Online publication date: 24-Aug-2017
  • (2014)Constrained-hLDA for Topic Discovery in Chinese MicroblogsAdvances in Knowledge Discovery and Data Mining10.1007/978-3-319-06605-9_50(608-619)Online publication date: 2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media