research-article

A Novel Generative Topic Embedding Model by Introducing Network Communities

Authors:
Di Jin

College of Intelligence and Computing, Tianjin University, China

College of Intelligence and Computing, Tianjin University, China
View Profile

,
Jiantao Huang

College of Intelligence and Computing, Tianjin University, China

College of Intelligence and Computing, Tianjin University, China
View Profile

,
Pengfei Jiao

College of Intelligence and Computing, Tianjin University, China

College of Intelligence and Computing, Tianjin University, China
View Profile

,
Liang Yang

Hebei University of Technology, China

Hebei University of Technology, China
View Profile

,
Dongxiao He

College of Intelligence and Computing, Tianjin University, China

College of Intelligence and Computing, Tianjin University, China
View Profile

,
Françoise Soulie-Fogelman

College of Intelligence and Computing, Tianjin University, France

College of Intelligence and Computing, Tianjin University, France
View Profile

,
Yuxiao Huang

George Washington University, China

George Washington University, China
View Profile

Authors Info & Claims

WWW '19: The World Wide Web ConferenceMay 2019Pages 2886–2892https://doi.org/10.1145/3308558.3313623

Published:13 May 2019Publication History

WWW '19: The World Wide Web Conference

Pages 2886–2892

ABSTRACT

Topic models have many important applications in fields such as Natural Language Processing. Topic embedding modelling aims at introducing word and topic embeddings into topic models to describe correlations between topics. Existing topic embedding methods use documents alone, which suffer from the topical fuzziness problem brought by the introduction of embeddings of semantic fuzzy words, e.g. polysemous words or some misleading academic terms. Links often exist between documents which form document networks. The use of links may alleviate this semantic fuzziness, but they are sparse and noisy which may meanwhile mislead topics. In this paper, we utilize community structure to solve these problems. It can not only alleviate the topical fuzziness of topic embeddings since communities are often believed to be topic related, but also can overcome the drawbacks brought by the sparsity and noise of networks (because community is a high-order network information). We give a new generative topic embedding model which incorporates documents (with topics) and network (with communities) together, and uses probability transition to describe the relationship between topics and communities to make it robust when topics and communities do not match. An efficient variational inference algorithm is then proposed to learn the model. We validate the superiority of our new approach on two tasks, document classifications and visualization of topic embeddings, respectively.

References

David Blei and John Lafferty. 2006. Correlated topic models. Advances in neural information processing systems 18 (2006), 147.Google Scholar
David M Blei and John D Lafferty. 2007. A correlated topic model of science. The Annals of Applied Statistics 1, 1 (2007), 17-35.Google ScholarCross Ref
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993-1022. Google ScholarDigital Library
Jonathan Chang and David Blei. 2009. Relational topic models for document networks. In Artificial Intelligence and Statistics. 81-88.Google Scholar
Jianfei Chen, Jun Zhu, Zi Wang, Xun Zheng, and Bo Zhang. 2013. Scalable inference for logistic-normal topic models. In Advances in Neural Information Processing Systems. 2445-2453. Google ScholarDigital Library
Pin-Yu Chen and Alfred O Hero. 2015. Deep community detection. IEEE Transactions on Signal Processing 63, 21 (2015), 5706-5719.Google ScholarDigital Library
Thomas L Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proceedings of the National academy of Sciences 101, suppl 1(2004), 5228-5235.Google ScholarCross Ref
Dongxiao He, Zhiyong Feng, Di Jin, Xiaobao Wang, and Weixiong Zhang. 2017. Joint identification of network communities and semantics via integrative modeling of network topologies and node contents. In Thirty-First AAAI Conference on Artificial Intelligence. AAAI Press, 116-124. Google ScholarDigital Library
Junxian He, Zhiting Hu, Taylor Berg-Kirkpatrick, Ying Huang, and Eric P Xing. 2017. Efficient correlated topic modeling with topic embedding. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 225-233. Google ScholarDigital Library
Yuan He, Cheng Wang, and Changjun Jiang. 2018. Discovering canonical correlations between topical and topological information in document networks. IEEE Transactions on Knowledge and Data Engineering (TKDE) 30, 3(2018), 460-473.Google ScholarCross Ref
Thomas Hofmann. 1999. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 289-296. Google ScholarDigital Library
Xin Huang and Laks VS Lakshmanan. 2017. Attribute-driven community search. Proceedings of the VLDB Endowment 10, 9 (2017), 949-960. Google ScholarDigital Library
Seungil Huh and Stephen E Fienberg. 2012. Discriminative topic modeling based on manifold learning. ACM Transactions on Knowledge Discovery from Data (TKDD) 5, 4(2012), 20. Google ScholarDigital Library
Di Jiang, Lei Shi, Rongzhong Lian, and Hua Wu. 2016. Latent topic embedding. In Proceedings of the 26th International Conference on Computational Linguistics. 2689-2698.Google Scholar
Di Jin, Xiaobao Wang, Ruifang He, Dongxiao He, Jianwu Dang, and Weixiong Zhang. 2018. Robust detection of link communities in large social networks by exploiting link semantics. In Thirty-Second AAAI Conference on Artificial Intelligence. AAAI Press, 314-321.Google ScholarCross Ref
Indika Kahanda and Jennifer Neville. 2009. Using Transactional Information to Predict Link Strength in Online Social Networks.The International AAAI Conference on Web and Social Media 9 (2009), 74-81.Google Scholar
Brian Karrer and Mark EJ Newman. 2011. Stochastic blockmodels and community structure in networks. Physical Review E 83, 1 (2011), 016107.Google ScholarCross Ref
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188-1196. Google ScholarDigital Library
Shaohua Li, Tat-Seng Chua, Jun Zhu, and Chunyan Miao. 2016. Generative topic embedding: a continuous representation of documents. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 666-675.Google ScholarCross Ref
Shaohua Li, Jun Zhu, and Chunyan Miao. 2015. A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1599-1609.Google ScholarCross Ref
Yang Liu, Zhiyuan Liu, Tat-Seng Chua, and Maosong Sun. 2015. Topical Word Embeddings.. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2418-2424. Google ScholarDigital Library
Sharad Nandanwar and M Narasimha Murty. 2016. Structural neighborhood based classification of nodes in a network. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1085-1094. Google ScholarDigital Library
Dat Quoc Nguyen, Richard Billingsley, Lan Du, and Mark Johnson. 2015. Improving topic models with latent feature word representations. Transactions of the Association for Computational Linguistics 3 (2015), 299-313.Google ScholarCross Ref
Guo-Jun Qi, Charu Aggarwal, Qi Tian, Heng Ji, and Thomas Huang. 2012. Exploring context and content links in social media: A latent space method. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 5(2012), 850-862. Google ScholarDigital Library
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 990-998. Google ScholarDigital Library
Mirwaes Wahabzada, Zhao Xu, and Kristian Kersting. 2010. Topic models conditioned on relations. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 402-417. Google ScholarDigital Library
Xiaojun Wan, Jianwu Yang, and Jianguo Xiao. 2007. Manifold-Ranking Based Topic-Focused Multi-Document Summarization.. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, Vol. 7. 2903-2908. Google ScholarDigital Library
Suhang Wang, Jiliang Tang, Charu Aggarwal, and Huan Liu. 2016. Linked document embedding for classification. In Proceedings of the 25th ACM international on conference on information and knowledge management. ACM, 115-124. Google ScholarDigital Library
Joyce Jiyoung Whang, David F Gleich, and Inderjit S Dhillon. 2016. Overlapping community detection using neighborhood-inflated seed expansion. IEEE Transactions on Knowledge and Data Engineering 28, 5(2016), 1272-1284. Google ScholarDigital Library
Wenhui Wu, Sam Kwong, Yu Zhou, Yuheng Jia, and Wei Gao. 2018. Nonnegative matrix factorization with mixed hypergraph regularization for community detection. Information Sciences 435(2018), 263-281.Google ScholarCross Ref
Ge Zhang, Di Jin, Jian Gao, Pengfei Jiao, Francoise Fogelman-Soulie´, and Xin Huang. 2018. Finding communities with hierarchical semantics by distinguishing general and specialized topics. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, 3648-3654. Google ScholarDigital Library

Recommendations

Efficient Correlated Topic Modeling with Topic Embedding
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Correlated topic modeling has been limited to small model and problem sizes due to their high computational cost and poor scaling. In this paper, we propose a new model which learns compact topic embeddings and captures topic correlations through the ...
Read More
Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Aspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are ...
Read More
Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Editors:
Ling Liu
Georgia Tech, USA
,
Ryen White
Microsoft Research, USA
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 May 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Community structure
Document networks
Topic embedding
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 354
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Novel Generative Topic Embedding Model by Introducing Network Communities

WWW '19: The World Wide Web Conference

ABSTRACT

References

Cited By

Recommendations

Efficient Correlated Topic Modeling with Topic Embedding

Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon

Joint sentiment/topic model for sentiment analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A Novel Generative Topic Embedding Model by Introducing Network Communities

WWW '19: The World Wide Web Conference

ABSTRACT

References

Cited By

Recommendations

Efficient Correlated Topic Modeling with Topic Embedding

Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon

Joint sentiment/topic model for sentiment analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media