skip to main content
10.1145/3308558.3313623acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

A Novel Generative Topic Embedding Model by Introducing Network Communities

Authors Info & Claims
Published:13 May 2019Publication History

ABSTRACT

Topic models have many important applications in fields such as Natural Language Processing. Topic embedding modelling aims at introducing word and topic embeddings into topic models to describe correlations between topics. Existing topic embedding methods use documents alone, which suffer from the topical fuzziness problem brought by the introduction of embeddings of semantic fuzzy words, e.g. polysemous words or some misleading academic terms. Links often exist between documents which form document networks. The use of links may alleviate this semantic fuzziness, but they are sparse and noisy which may meanwhile mislead topics. In this paper, we utilize community structure to solve these problems. It can not only alleviate the topical fuzziness of topic embeddings since communities are often believed to be topic related, but also can overcome the drawbacks brought by the sparsity and noise of networks (because community is a high-order network information). We give a new generative topic embedding model which incorporates documents (with topics) and network (with communities) together, and uses probability transition to describe the relationship between topics and communities to make it robust when topics and communities do not match. An efficient variational inference algorithm is then proposed to learn the model. We validate the superiority of our new approach on two tasks, document classifications and visualization of topic embeddings, respectively.

References

  1. David Blei and John Lafferty. 2006. Correlated topic models. Advances in neural information processing systems 18 (2006), 147.Google ScholarGoogle Scholar
  2. David M Blei and John D Lafferty. 2007. A correlated topic model of science. The Annals of Applied Statistics 1, 1 (2007), 17-35.Google ScholarGoogle ScholarCross RefCross Ref
  3. David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993-1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jonathan Chang and David Blei. 2009. Relational topic models for document networks. In Artificial Intelligence and Statistics. 81-88.Google ScholarGoogle Scholar
  5. Jianfei Chen, Jun Zhu, Zi Wang, Xun Zheng, and Bo Zhang. 2013. Scalable inference for logistic-normal topic models. In Advances in Neural Information Processing Systems. 2445-2453. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Pin-Yu Chen and Alfred O Hero. 2015. Deep community detection. IEEE Transactions on Signal Processing 63, 21 (2015), 5706-5719.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Thomas L Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proceedings of the National academy of Sciences 101, suppl 1(2004), 5228-5235.Google ScholarGoogle ScholarCross RefCross Ref
  8. Dongxiao He, Zhiyong Feng, Di Jin, Xiaobao Wang, and Weixiong Zhang. 2017. Joint identification of network communities and semantics via integrative modeling of network topologies and node contents. In Thirty-First AAAI Conference on Artificial Intelligence. AAAI Press, 116-124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Junxian He, Zhiting Hu, Taylor Berg-Kirkpatrick, Ying Huang, and Eric P Xing. 2017. Efficient correlated topic modeling with topic embedding. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 225-233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yuan He, Cheng Wang, and Changjun Jiang. 2018. Discovering canonical correlations between topical and topological information in document networks. IEEE Transactions on Knowledge and Data Engineering (TKDE) 30, 3(2018), 460-473.Google ScholarGoogle ScholarCross RefCross Ref
  11. Thomas Hofmann. 1999. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 289-296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Xin Huang and Laks VS Lakshmanan. 2017. Attribute-driven community search. Proceedings of the VLDB Endowment 10, 9 (2017), 949-960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Seungil Huh and Stephen E Fienberg. 2012. Discriminative topic modeling based on manifold learning. ACM Transactions on Knowledge Discovery from Data (TKDD) 5, 4(2012), 20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Di Jiang, Lei Shi, Rongzhong Lian, and Hua Wu. 2016. Latent topic embedding. In Proceedings of the 26th International Conference on Computational Linguistics. 2689-2698.Google ScholarGoogle Scholar
  15. Di Jin, Xiaobao Wang, Ruifang He, Dongxiao He, Jianwu Dang, and Weixiong Zhang. 2018. Robust detection of link communities in large social networks by exploiting link semantics. In Thirty-Second AAAI Conference on Artificial Intelligence. AAAI Press, 314-321.Google ScholarGoogle ScholarCross RefCross Ref
  16. Indika Kahanda and Jennifer Neville. 2009. Using Transactional Information to Predict Link Strength in Online Social Networks.The International AAAI Conference on Web and Social Media 9 (2009), 74-81.Google ScholarGoogle Scholar
  17. Brian Karrer and Mark EJ Newman. 2011. Stochastic blockmodels and community structure in networks. Physical Review E 83, 1 (2011), 016107.Google ScholarGoogle ScholarCross RefCross Ref
  18. Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188-1196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Shaohua Li, Tat-Seng Chua, Jun Zhu, and Chunyan Miao. 2016. Generative topic embedding: a continuous representation of documents. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 666-675.Google ScholarGoogle ScholarCross RefCross Ref
  20. Shaohua Li, Jun Zhu, and Chunyan Miao. 2015. A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1599-1609.Google ScholarGoogle ScholarCross RefCross Ref
  21. Yang Liu, Zhiyuan Liu, Tat-Seng Chua, and Maosong Sun. 2015. Topical Word Embeddings.. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2418-2424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sharad Nandanwar and M Narasimha Murty. 2016. Structural neighborhood based classification of nodes in a network. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1085-1094. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Dat Quoc Nguyen, Richard Billingsley, Lan Du, and Mark Johnson. 2015. Improving topic models with latent feature word representations. Transactions of the Association for Computational Linguistics 3 (2015), 299-313.Google ScholarGoogle ScholarCross RefCross Ref
  24. Guo-Jun Qi, Charu Aggarwal, Qi Tian, Heng Ji, and Thomas Huang. 2012. Exploring context and content links in social media: A latent space method. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 5(2012), 850-862. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 990-998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mirwaes Wahabzada, Zhao Xu, and Kristian Kersting. 2010. Topic models conditioned on relations. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 402-417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xiaojun Wan, Jianwu Yang, and Jianguo Xiao. 2007. Manifold-Ranking Based Topic-Focused Multi-Document Summarization.. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, Vol. 7. 2903-2908. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Suhang Wang, Jiliang Tang, Charu Aggarwal, and Huan Liu. 2016. Linked document embedding for classification. In Proceedings of the 25th ACM international on conference on information and knowledge management. ACM, 115-124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Joyce Jiyoung Whang, David F Gleich, and Inderjit S Dhillon. 2016. Overlapping community detection using neighborhood-inflated seed expansion. IEEE Transactions on Knowledge and Data Engineering 28, 5(2016), 1272-1284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wenhui Wu, Sam Kwong, Yu Zhou, Yuheng Jia, and Wei Gao. 2018. Nonnegative matrix factorization with mixed hypergraph regularization for community detection. Information Sciences 435(2018), 263-281.Google ScholarGoogle ScholarCross RefCross Ref
  31. Ge Zhang, Di Jin, Jian Gao, Pengfei Jiao, Francoise Fogelman-Soulie´, and Xin Huang. 2018. Finding communities with hierarchical semantics by distinguishing general and specialized topics. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, 3648-3654. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    WWW '19: The World Wide Web Conference
    May 2019
    3620 pages
    ISBN:9781450366748
    DOI:10.1145/3308558

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 May 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,899of8,196submissions,23%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format