ABSTRACT
Topic models have many important applications in fields such as Natural Language Processing. Topic embedding modelling aims at introducing word and topic embeddings into topic models to describe correlations between topics. Existing topic embedding methods use documents alone, which suffer from the topical fuzziness problem brought by the introduction of embeddings of semantic fuzzy words, e.g. polysemous words or some misleading academic terms. Links often exist between documents which form document networks. The use of links may alleviate this semantic fuzziness, but they are sparse and noisy which may meanwhile mislead topics. In this paper, we utilize community structure to solve these problems. It can not only alleviate the topical fuzziness of topic embeddings since communities are often believed to be topic related, but also can overcome the drawbacks brought by the sparsity and noise of networks (because community is a high-order network information). We give a new generative topic embedding model which incorporates documents (with topics) and network (with communities) together, and uses probability transition to describe the relationship between topics and communities to make it robust when topics and communities do not match. An efficient variational inference algorithm is then proposed to learn the model. We validate the superiority of our new approach on two tasks, document classifications and visualization of topic embeddings, respectively.
- David Blei and John Lafferty. 2006. Correlated topic models. Advances in neural information processing systems 18 (2006), 147.Google Scholar
- David M Blei and John D Lafferty. 2007. A correlated topic model of science. The Annals of Applied Statistics 1, 1 (2007), 17-35.Google ScholarCross Ref
- David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993-1022. Google ScholarDigital Library
- Jonathan Chang and David Blei. 2009. Relational topic models for document networks. In Artificial Intelligence and Statistics. 81-88.Google Scholar
- Jianfei Chen, Jun Zhu, Zi Wang, Xun Zheng, and Bo Zhang. 2013. Scalable inference for logistic-normal topic models. In Advances in Neural Information Processing Systems. 2445-2453. Google ScholarDigital Library
- Pin-Yu Chen and Alfred O Hero. 2015. Deep community detection. IEEE Transactions on Signal Processing 63, 21 (2015), 5706-5719.Google ScholarDigital Library
- Thomas L Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proceedings of the National academy of Sciences 101, suppl 1(2004), 5228-5235.Google ScholarCross Ref
- Dongxiao He, Zhiyong Feng, Di Jin, Xiaobao Wang, and Weixiong Zhang. 2017. Joint identification of network communities and semantics via integrative modeling of network topologies and node contents. In Thirty-First AAAI Conference on Artificial Intelligence. AAAI Press, 116-124. Google ScholarDigital Library
- Junxian He, Zhiting Hu, Taylor Berg-Kirkpatrick, Ying Huang, and Eric P Xing. 2017. Efficient correlated topic modeling with topic embedding. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 225-233. Google ScholarDigital Library
- Yuan He, Cheng Wang, and Changjun Jiang. 2018. Discovering canonical correlations between topical and topological information in document networks. IEEE Transactions on Knowledge and Data Engineering (TKDE) 30, 3(2018), 460-473.Google ScholarCross Ref
- Thomas Hofmann. 1999. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 289-296. Google ScholarDigital Library
- Xin Huang and Laks VS Lakshmanan. 2017. Attribute-driven community search. Proceedings of the VLDB Endowment 10, 9 (2017), 949-960. Google ScholarDigital Library
- Seungil Huh and Stephen E Fienberg. 2012. Discriminative topic modeling based on manifold learning. ACM Transactions on Knowledge Discovery from Data (TKDD) 5, 4(2012), 20. Google ScholarDigital Library
- Di Jiang, Lei Shi, Rongzhong Lian, and Hua Wu. 2016. Latent topic embedding. In Proceedings of the 26th International Conference on Computational Linguistics. 2689-2698.Google Scholar
- Di Jin, Xiaobao Wang, Ruifang He, Dongxiao He, Jianwu Dang, and Weixiong Zhang. 2018. Robust detection of link communities in large social networks by exploiting link semantics. In Thirty-Second AAAI Conference on Artificial Intelligence. AAAI Press, 314-321.Google ScholarCross Ref
- Indika Kahanda and Jennifer Neville. 2009. Using Transactional Information to Predict Link Strength in Online Social Networks.The International AAAI Conference on Web and Social Media 9 (2009), 74-81.Google Scholar
- Brian Karrer and Mark EJ Newman. 2011. Stochastic blockmodels and community structure in networks. Physical Review E 83, 1 (2011), 016107.Google ScholarCross Ref
- Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188-1196. Google ScholarDigital Library
- Shaohua Li, Tat-Seng Chua, Jun Zhu, and Chunyan Miao. 2016. Generative topic embedding: a continuous representation of documents. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 666-675.Google ScholarCross Ref
- Shaohua Li, Jun Zhu, and Chunyan Miao. 2015. A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1599-1609.Google ScholarCross Ref
- Yang Liu, Zhiyuan Liu, Tat-Seng Chua, and Maosong Sun. 2015. Topical Word Embeddings.. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2418-2424. Google ScholarDigital Library
- Sharad Nandanwar and M Narasimha Murty. 2016. Structural neighborhood based classification of nodes in a network. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1085-1094. Google ScholarDigital Library
- Dat Quoc Nguyen, Richard Billingsley, Lan Du, and Mark Johnson. 2015. Improving topic models with latent feature word representations. Transactions of the Association for Computational Linguistics 3 (2015), 299-313.Google ScholarCross Ref
- Guo-Jun Qi, Charu Aggarwal, Qi Tian, Heng Ji, and Thomas Huang. 2012. Exploring context and content links in social media: A latent space method. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 5(2012), 850-862. Google ScholarDigital Library
- Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 990-998. Google ScholarDigital Library
- Mirwaes Wahabzada, Zhao Xu, and Kristian Kersting. 2010. Topic models conditioned on relations. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 402-417. Google ScholarDigital Library
- Xiaojun Wan, Jianwu Yang, and Jianguo Xiao. 2007. Manifold-Ranking Based Topic-Focused Multi-Document Summarization.. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, Vol. 7. 2903-2908. Google ScholarDigital Library
- Suhang Wang, Jiliang Tang, Charu Aggarwal, and Huan Liu. 2016. Linked document embedding for classification. In Proceedings of the 25th ACM international on conference on information and knowledge management. ACM, 115-124. Google ScholarDigital Library
- Joyce Jiyoung Whang, David F Gleich, and Inderjit S Dhillon. 2016. Overlapping community detection using neighborhood-inflated seed expansion. IEEE Transactions on Knowledge and Data Engineering 28, 5(2016), 1272-1284. Google ScholarDigital Library
- Wenhui Wu, Sam Kwong, Yu Zhou, Yuheng Jia, and Wei Gao. 2018. Nonnegative matrix factorization with mixed hypergraph regularization for community detection. Information Sciences 435(2018), 263-281.Google ScholarCross Ref
- Ge Zhang, Di Jin, Jian Gao, Pengfei Jiao, Francoise Fogelman-Soulie´, and Xin Huang. 2018. Finding communities with hierarchical semantics by distinguishing general and specialized topics. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, 3648-3654. Google ScholarDigital Library
Recommendations
Efficient Correlated Topic Modeling with Topic Embedding
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningCorrelated topic modeling has been limited to small model and problem sizes due to their high computational cost and poor scaling. In this paper, we propose a new model which learns compact topic embeddings and captures topic correlations through the ...
Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementAspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are ...
Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementSentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Comments