Abstract
Although topic models designed for textual collections annotated with geographical meta-data have been previously shown to be effective at capturing vocabulary preferences of people living in different geographical regions, little is known about their utility for information retrieval in general or microblog retrieval in particular. In this work, we propose simple and scalable geographical latent variable generative models and a method to improve the accuracy of retrieval from collections of geo-tagged documents through document expansion that is based on the topics identified by the proposed models. In particular, we experimentally compare the retrieval effectiveness of four geographical latent variable models: two geographical variants of post-hoc LDA, latent variable model without hidden topics and a topic model that can separate background from geographically-specific topics. The experiments conducted on TREC microblog datasets demonstrate significant improvement in search accuracy of the proposed method over both the traditional probabilistic retrieval model and retrieval models utilizing geographical post-hoc variants of LDA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amati, G., Amodeo, G., Gaibisso, C.: Survival analysis for freshness in microblogging search. In: Proceedings of CIKM 2012, pp. 2483–2486 (2012)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Choi, J., Croft, W.B.: Temporal models for microblogs. In: Proceedings of CIKM 2012, pp. 2491–2494 (2012)
Choi, J., Croft, W.B., Kim, J.Y.: Quality models for microblog retrieval. In: Proceedings of CIKM 2012, pp. 1834–1838 (2012)
Efron, M.: Information search and retrieval in microblogs. ASIS&T 62(6), 996–1008 (2011)
Efron, M., Lin, J., He, J., de Vries, A.: Temporal feedback for tweet search with non-parametric density estimation. In: Proceedings of SIGIR 2014, pp. 33–42 (2014)
Efron, M., Organisciak, P., Fenlon, K.: Improving retrieval of short texts through document expansion. In: Proceedings of SIGIR 2012, pp. 911–920 (2012)
Eisenstein, J., O’Connor, B., Smith, N.A., Xing, E.P.: A latent variable model for geographic lexical variation. In: Proceedings of EMNLP 2010, pp. 1277–1287 (2010)
Hong, L., Ahmed, A., Gurumurthy, S., Smola, A., Tsioutsiouliklis, K.: Discovering geographical topics in the twitter stream. In: Proceedings of WWW 2012, pp. 769–778 (2012)
Keikha, M., Gerani, S., Crestani, F.: Time-based relevance models. In: Proceedings of SIGIR 2011, pp. 1087–1088 (2011)
Kotov, A., Agichtein, E.: The importance of being socially-savvy: Quantifying the influence of social networks on microblog retrieval. In: Proceedings of CIKM 2013, pp. 1905–1908 (2013)
Kotov, A., Wang, Y., Agichtein, E.: Leveraging geographical metadata to improve search over social media. In: Proceedings of WWW 2013, pp. 151–152 (2013)
Mei, Q., Liu, C., Su, H., Zhai, C.: A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In: Proceedings of WWW 2006, pp. 533–542 (2006)
Miyanishi, T., Seki, K., Uehara, K.: Combining recency and topic-dependent temporal variation for microblog search. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 331–343. Springer, Heidelberg (2013)
Miyanishi, T., Seki, K., Uehara, K.: Improving pseudo-relevance feedback via tweet selection. In: Proceedings of CIKM 2013, pp. 439–448 (2013)
Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the trec-2011 microblog track. In: Proceedings of TREC 2011 (2011)
Teevan, J., Ramage, D., Morris, M.R.: #twittersearch: A comparison of microblog search and web search. In: Proceedings of ACM WSDM 2011, pp. 35–44 (2011)
Wei, X., Croft, W.B.: Lda-based document models for ad-hoc retrieval. In: Proceedings of ACM SIGIR 2006, pp. 178–185 (2006)
Wing, B.P., Baldridge, J.: Simple supervised document geolocation with geodesic grids. In: Proceedings of the ACL 2011, pp. 955–964 (2011)
Yi, X., Allan, J.: A comparative study of utilizing topic models for information retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 29–41. Springer, Heidelberg (2009)
Yin, Z., Cao, L., Han, J., Zhai, C., Huang, T.: Geographical topics dsicovery and comparison. In: Proceedings of WWW 2011, pp. 247–256 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kotov, A., Rakesh, V., Agichtein, E., Reddy, C.K. (2015). Geographical Latent Variable Models for Microblog Retrieval. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds) Advances in Information Retrieval. ECIR 2015. Lecture Notes in Computer Science, vol 9022. Springer, Cham. https://doi.org/10.1007/978-3-319-16354-3_70
Download citation
DOI: https://doi.org/10.1007/978-3-319-16354-3_70
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16353-6
Online ISBN: 978-3-319-16354-3
eBook Packages: Computer ScienceComputer Science (R0)