skip to main content
10.1145/2348283.2348423acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
abstract

Exploiting temporal topic models in social media retrieval

Published:12 August 2012Publication History

ABSTRACT

Many of user generated contents in the Web 2.0 center around real-world incidents such as Japanese tsunami, or general concerns such as recent economic downturn. Such type of information is always of interest to users. For instance, when a user reads a news article about a tsunami in Japan, she wants to see related Flickr photos or more tweets about it. Conventional keyword-based search is inappropriate, since it is not always trivial to formulate ad-hoc interests about the event and material. In some cases, the user might want to explore emerging topics that dominate different sources. Present systems fail to connect topically documents across media, and the user has to examine individual sources to infer the topics herself.

In this work, we address a special type of user information need, temporal topic, which refers to any abstract matter active within some points or periods of time. A temporal topic can be a real-world event, e.g. the Arab Spring revolution, but can also be a less conceivable subject, e.g. the study of vacuum tube computers in 1950s. Topics can also be recurrent such as the US presidency campaigns. There are extensive studies on how to detect topics from a collection of documents, but little uses temporal topics as part of user interest to retrieve documents. We believe that temporal topic-based retrieval is a one solution to improve user experience of present IR systems, as well as to benefit other applications (e.g. topic-sensitive online advertisement).

Our research goal can be defined in three research questions. The first question involves finding latent temporal topics in a social media stream, where documents are well equipped with meta-data (timestamps, geo-spatial data, etc.). Following mixture models such as LDA, we treat each document as a mix of different temporal topic models, each model is incorporated with time. A temporal topic consists of at least two types of attributes - time and representing words, as similar to [4]. The dynamics of temporal topics can be characterized in a timeline fashion [4], or using hierarchical structures [1]. The challenge lies in devising a model flexible enough to diverse and rapidly changing data without many parameter assumptions. For this, we see Bayesian nonpara-metrics [3] as one promising solution, and will extend it to temporal dimension.

The second research question is how to retrieve and rank documents from different social media sites, based on their relevance to one or several given temporal topics. We identify some following challenges. The first one is representing temporal topics as queries: although there have been attempts using keywords and time window separately [2], we aim to unify time and (topical) words in a single query model. The second challenge is integrating temporal topic models into ranking models. Inspired by our previous work [4], we will use language models to capture the relevance scores between documents and topics, and investigate advanced methods to index the scores effectively.

Our last question involves connecting a given document to documents in other sources (data streams or corpora) that shared one of its latent temporal topics. This task does not only provide unified insight into different social media sites, but also help improve the quality of models by data in diverse sources. However, formalizing the semantics of "similarity" for documents in different settings based on temporal topcis is tricky. One baseline method is to apply Kullback-Leibler divergence on comparable features (TF-IDF, n-grams, photo tags, timestamps,..). We can also use language models [5] to construct a language model for each candidate document, then estimate how likely it generates the document of interest within a given temporal topic.

References

  1. A. Ahmed, Q. Ho, J. Eisenstein, E. Xing, A. J. Smola, and C. H. Teo. Unified analysis of streaming news. In WWW, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Haghani, S. Michel, and K. Aberer. Efficient monitoring of personalized hot news over web 2.0 streams. Journal of Computer Science, Vol. 27(1), February 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Orbanz and Y. W. Teh. Bayesian nonparametric models. Encyclopedia of Machine Learning, 2010.Google ScholarGoogle Scholar
  4. T. A. Tran, S. Elbassuoni, N. Preda, and G. Weikum. Cate: context-aware timeline for entity illustration. In WWW, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Tsagkias, M. de Rijke, and W. Weerkamp. Linking online news and social media. In WSDM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting temporal topic models in social media retrieval

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
          August 2012
          1236 pages
          ISBN:9781450314725
          DOI:10.1145/2348283

          Copyright © 2012 Author

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 August 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • abstract

          Acceptance Rates

          Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader