ABSTRACT
Given a large-scale linked document collection, such as a collection of blog posts or a research literature archive, there are two fundamental problems that have generated a lot of interest in the research community. One is to identify a set of high-level topics covered by the documents in the collection; the other is to uncover and analyze the social network of the authors of the documents. So far these problems have been viewed as separate problems and considered independently from each other. In this paper we argue that these two problems are in fact inter-dependent and should be addressed together. We develop a Bayesian hierarchical approach that performs topic modeling and author community discovery in one unified framework. The effectiveness of our model is demonstrated on two blog data sets in different domains and one research paper citation data from CiteSeer.
- Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Proc. of Int. Conf. on Mach. Learn. (ICML'06) (pp. 113--120). Google ScholarDigital Library
- Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. J. Mach. Learn. Res., 3, 993--1022. Google ScholarCross Ref
- Chakrabarti, D., & Faloutsos, C. (2006). Graph mining: Laws, generators, and algorithms. ACM Comput. Surv., 38, 2. Google ScholarDigital Library
- Chang, J., & Blei, D. (2009). Relational topic models for document networks. Proc. of Conf. on AI and Statistics (AISTATS'09).Google Scholar
- Cohn, D., & Hofmann, T. (2001). The missing link - a probabilistic model of document content and hypertext connectivity. Proc. of Conf. on Neural Information Processing Systems (NIPS'01) (pp. 430--436).Google Scholar
- Dietz, L., Bickel, S., & Scheffer, T. (2007). Unsupervised prediction of citation influences. Proc. of Int. Conf. on Mach. Learn. (ICML'07) (pp. 233--240). Google ScholarDigital Library
- Erosheva, E., Fienberg, S., & Lafferty, J. (2004). Mixed membership models of scientific publications. Proc. Nat. Acad. Sci., 101, 5220--5227.Google ScholarCross Ref
- Gibson, D., Kleinberg, J. M., & Raghavan, P. (1998). Inferring web communities from link topology. UK Conference on Hypertext (pp. 225--234). Google ScholarDigital Library
- Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proc. Nat. Acad. Sci., 101, 5228--5235.Google ScholarCross Ref
- Jaakkola, T. (1997). Variational methods for inference and estimation in graphical models. PhD thesis, MIT. Google ScholarDigital Library
- Jordan, M. I., Ghahramani, Z., Jaakkola, T., & Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine Learning, 37, 183--233. Google ScholarDigital Library
- Mccallum, A., Corrada-Emmanuel, A., & Wang, X. (2005). Topic and role discovery in social networks. Proc. of Int. Joint Conf. on Articial Intelligence (IJCAI'05) (pp. 786--791). Google ScholarDigital Library
- McCallum, A., Nigam, K., Rennie, J., & Seymore, K. (2000). Automating the construction of internet portals with machine learning. Information Retrieval Journal, 3, 127--163. Google ScholarDigital Library
- Mei, Q., Cai, D., Zhang, D., & Zhai, C. (2008). Topic modeling with network regularization. Proc. of Int. World Wide Web Conf. (WWW'08) (pp. 101--110). Google ScholarDigital Library
- Nallapati, R., & Cohen, W. (2008). Link-plsa-lda: A new unsupervised model for topics and influence in blogs. Proc. of Int. Conf. on Weblogs and Social Media (ICWSM'08) (pp. 84--92).Google Scholar
- Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. Proc. of Conf. on Uncertainty in Artificial Intelligence (UAI'04) (pp. 487--494). Google ScholarDigital Library
- Xu, Z., Tresp, V., Yu, K., Yu, S., & Kriegel, H.-P. (2005). Dirichlet enhanced relational learning. Proc. of Int. Conf. on Mach. Learn. (pp. 1004--1011). Google ScholarDigital Library
- Yu, K., Chu, W., Yu, S., Tresp, V., & Xu, Z. (2006). Stochastic relational models for discriminative link prediction. Proc. of Conf. on Neural Information Processing Systems (NIPS'06) (pp. 1553--1560).Google Scholar
Index Terms
- Topic-link LDA: joint models of topic and author community
Recommendations
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Blog topic analysis using TF smoothing and LDA
ICUIMC '13: Proceedings of the 7th International Conference on Ubiquitous Information Management and CommunicationIn the era of Web 2.0, the number of blogs has explosively increased. With the appearance of social network services, blogs has become the places for sharing professional knowledge and personal branding. So, in order to understand the trends of topics ...
Multi-aspect Blog sentiment analysis based on LDA topic model and hownet lexicon
WISM'11: Proceedings of the 2011 international conference on Web information systems and mining - Volume Part IIBlog is an important web2.0 application, which attracts many users to express their subjective reviews about financial events, political events and other objects. Usually a Blog page includes more than one theme. However the existing researches of multi-...
Comments