skip to main content
10.1145/2396761.2398485acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Incorporating word correlation into tag-topic model for semantic knowledge acquisition

Published: 29 October 2012 Publication History

Abstract

This paper presents a tag-topic model with Dirichlet Forest prior (TTM-DF) for semantic knowledge acquisition from blog. The TTM-DF model extends the tag-topic model (TTM) by replacing the Dirichlet prior with the Dirichlet Forest prior over the topic-word multinomial. The correlation between words are calculated to generate a set of Must-Links and Cannot-Links, then the structures of Dirichlet trees are obtained though encoding the constraints of Must-Links and Cannot-Links. Words under the same subtrees are expected to be more correlated than words under different subtrees. We conduct experiments on a synthetic and a blog dataset. Both of the experimental results show that the TTM-DF model performs much better than the TTM model. It can improve the coherence of the underlying topics and the tag-topic distributions, and capture semantic knowledge effectively.

References

[1]
Tingting He, Fang Li. 2012. Semantic Knowledge Acquisition from Blogs with Tag-Topic Model. China Communications, 2012, 9(3): 38--48.
[2]
D. M. Blei, A. Y. Ng, M. I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003, 3(4-5): 993--1022.
[3]
David Andrzejewski, Xiaojin Zhu, and Mark Craven. 2009. Incorporating Domain knowledge into topic modeling via Dirichlet Forest priors. In Proc. of ICML 2009, 25--32.
[4]
Basu, S., Davidson, I., & Wagstaff, K. (Eds.). 2008. Constrained clustering: Advances in algorithms, theory, and applications. Chapman & Hall/CRC.
[5]
S. P. Ponzetto, M. Strube. 2007. Deriving a large-scale taxonomy from Wikipedia. In Proc. of AAAI07, 1440--1445.
[6]
Suchanek, F. M., G. Kasneci & G.Weikum. 2007. YAGO: A core of semantic knowledge. In Proc. of WWW-07, 2007.
[7]
Michael Strube and Simon Paolo Ponzetto. 2006. WikiRelate! Computing semantic relatedness using Wikipedia. In Proc. of AI-06, Boston, Massachusetts, USA, 2006.
[8]
Xinhui Tu, Tingting He, Jing Luo and Long Chen. 2010. Wikipedia-based semantic smoothing for the language modeling approach to information retrieval. In Proc. of ECIR-2010, 370--381.
[9]
Marius Pasca. 2004. Acquisition of Categorized Named Entities for Web Search. In Proc. of CIKM 2004. USA.
[10]
Keiji Shinzato and Kentaro Torisawa. 2005. A Simple WWW-based Method for Semantic Word Class Acquisition. In Proc. of RANLP-05, 2005.
[11]
Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proc. of SIGIR-99, 50--57.
[12]
X. Wei and W. B. Croft. 2006. LDA-based document models for ad-hoc retrieval. In Proc. of SIGIR-06, 178--185.
[13]
R. Arora and B. Ravindran. 2008. Latent dirichlet allocation based multi-document summarization. In Proc. of AND-08, 91--97.
[14]
Jordan Boyd-Graber, David M. Blei, and Xiaojin Zhu. 2007. A topic model for word sense disambiguation. In Proc. of EMNLP 2007.
[15]
David Andrzejewski, Xiaojin Zhu. 2009. Latent Dirichlet Allocation with Topic-in-Set Knowledge. Proceedings of the NAACL HLT Workshop on Semi-supervised Learning for Natural Language Processing, 43--48.
[16]
Jagadeesh Jagarlamudi, Hal Daume III, Raghavendra Udupa. 2010. Incorporating Lexical Priors into Topic Models. The 5th Annual Machine Learning Symposium, 2010.
[17]
Minka, T. P. 1999. The Dirichlet-tree distribution (Technical Report). http://research.microsoft.com/~minka/papers/dirichle-t/minka-dirtree.pdf.
[18]
T. L. Griffiths, and M. Steyvers. 2004, Finding scientific topics. Proceedings of National Academy of Sciences of the United States of America 101, 2004, 5228--5235.
[19]
Qun, L. I. U., Sujian, L. I. 2002. Word Similarity Computing Based on How-net, Computational Linguistics and Chinese Language Processing, 2002.
[20]
Xinhui Tu, Tingting He, Hongchun Zhang and Kunfeng Zhou. 2012. Extracting Structured Information from Chinese Wikipedia and Measuring Relatedness between Words. Journal of Chinese Information Process, 2012.

Cited By

View all
  • (2017)A Semantic Graph-Based Approach for Mining Common Topics from Multiple Asynchronous Text StreamsProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052630(1201-1209)Online publication date: 3-Apr-2017
  • (2017)Guided HTMIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.262579029:2(330-343)Online publication date: 1-Feb-2017
  • (2016)A Semantic Graph based Topic Model for Question Retrieval in Community Question AnsweringProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835809(287-296)Online publication date: 8-Feb-2016
  • Show More Cited By

Index Terms

  1. Incorporating word correlation into tag-topic model for semantic knowledge acquisition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
    October 2012
    2840 pages
    ISBN:9781450311564
    DOI:10.1145/2396761
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. blog
    2. dirichlet forest prior
    3. tag
    4. topic model

    Qualifiers

    • Short-paper

    Conference

    CIKM'12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)A Semantic Graph-Based Approach for Mining Common Topics from Multiple Asynchronous Text StreamsProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052630(1201-1209)Online publication date: 3-Apr-2017
    • (2017)Guided HTMIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.262579029:2(330-343)Online publication date: 1-Feb-2017
    • (2016)A Semantic Graph based Topic Model for Question Retrieval in Community Question AnsweringProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835809(287-296)Online publication date: 8-Feb-2016
    • (2016)A Hybrid Approach for Question Retrieval in Community Question AnswerinThe Computer Journal10.1093/comjnl/bxw036Online publication date: 8-Sep-2016
    • (2016)Quality models for venue recommendation in location-based social networkMultimedia Tools and Applications10.1007/s11042-014-2339-x75:20(12521-12534)Online publication date: 1-Oct-2016
    • (2016)Probabilistic Topic Modelling with Semantic GraphAdvances in Information Retrieval10.1007/978-3-319-30671-1_18(240-251)Online publication date: 2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media