Abstract
Typical pseudo-relevance feedback models assume that the first-pass documents are the most relevant and use those documents to select feedback terms for query expansion. In real applications, however, short documents, such as microblogs, may not have enough information about the searched topic, thus increasing the chance that irrelevant documents will be included in the initial set of retrieved documents. This situation reduces a feedback model’s ability to capture information that is relevant to users’ needs, which makes determining the best documents for relevant feedback without requiring extra effort from the user a critical challenge. In this paper, we propose an innovative mechanism to automatically select useful feedback documents using a topic modeling technique to improve the effectiveness of pseudo-relevance feedback models. The main idea behind the proposed model is to discover the latent topics in the top-ranked documents that allow for the exploitation of the correlation between terms in relevant topics. To capture discriminative terms for query expansion, we incorporated topical features into a relevance model that focuses on the temporal information in the selected set of documents. Experimental results on TREC 2011–2013 microblog datasets illustrate that the proposed model significantly outperforms all state-of-the-art baseline models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdul-Jaleel, N., Allan, J., Croft, W.B., Diaz, F., Larkey, L., Li, X., Smucker, M.D., Wade, C.: UMass at TREC 2004: Novelty and hard. In: TREC (2004)
Albakour, M., Macdonald, C., Ounis, I., et al.: On sparsity and drift for effective real-time filtering in microblogs. In: Proceedings of CIKM, pp. 419–428 (2013)
Albishre, K., Albathan, M., Li, Y.: Effective 20 newsgroups dataset cleaning. In: Proceedings of the WI-IAT, vol. 3, pp. 98–101 (2015)
Albishre, K., Li, Y., Xu, Y.: Effective pseudo-relevance for microblog retrieval. In: Proceedings of ACSW, p. 51 (2017)
Algarni, A., Li, Y., Xu, Y.: Selected new training documents to update user profile. In: Proceedings of CIKM, pp. 799–808. ACM (2010)
Andrzejewski, D., Buttler, D.: Latent topic feedback for information retrieval. In: Proceedings of KDD, pp. 600–608 (2011)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. CSUR 44(1), 1 (2012)
Choi, J., Croft, W.B.: Temporal models for microblogs. In: Proceedings of CIKM, pp. 2491–2494 (2012)
Choi, J., Croft, W.B., Kim, J.Y.: Quality models for microblog retrieval. In: Proceedings of CIKM, pp. 1834–1838 (2012)
Chuang, J., Gupta, S., Manning, C., Heer, J.: Topic model diagnostics: assessing domain relevance via topical alignment. In: Proceedings of ICML, pp. 612–620 (2013)
Dong, A., Zhang, R., Kolari, P., Bai, J., Diaz, F., Chang, Y., Zheng, Z., Zha, H.: Time is of the essence: improving recency ranking using twitter data. In: Proceedings of WWW, pp. 331–340 (2010)
Efron, M., Golovchinsky, G.: Estimation methods for ranking recent information. In: Proceedings of SIGIR, pp. 495–504 (2011)
Efron, M., Lin, J., He, J., De Vries, A.: Temporal feedback for tweet search with non-parametric density estimation. In: Proceedings of SIGIR, pp. 33–42 (2014)
Fan, F., Qiang, R., Lv, C., Yang, J.: Improving microblog retrieval with feedback entity model. In: Proceedings of CIKM, pp. 573–582 (2015)
Gao, Y., Xu, Y., Li, Y.: Pattern-based topics for document modelling in information filtering. IEEE Trans. Knowl. Data Eng. 27(6), 1629–1642 (2015)
Kotov, A., Wang, Y., Agichtein, E.: Leveraging geographical metadata to improve search over social media. In: Proceedings of WWW, pp. 151–152 (2013)
Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proceedings of SIGIR, pp. 120–127 (2001)
Li, X., Croft, W.B.: Time-based language models. In: Proceedings of CIKM, pp. 469–475 (2003)
Li, Y., Algarni, A., Albathan, M., Shen, Y., Bijaksana, M.A.: Relevance feature discovery for text mining. IEEE Trans. Knowl. Data Eng. 27(6), 1656–1669 (2015)
Li, Y., Algarni, A., Zhong, N.: Mining positive and negative patterns for relevance feature discovery. In: Proceedings of KDD, pp. 753–762 (2010)
Li, Y., Zhou, X., Bruza, P., Xu, Y., Lau, R.Y.: A two-stage decision model for information filtering. Decis. Support Syst. 52(3), 706–716 (2012)
Liang, S., Yilmaz, E., Kanoulas, E.: Dynamic clustering of streaming short documents. In: Proceedings of KDD, pp. 995–1004 (2016)
Lin, C., Lin, C., Li, J., Wang, D., Chen, Y., Li, T.: Generating event storylines from microblogs. In: Proceedings of CIKM, pp. 175–184 (2012)
Lin, J., Efron, M.: Overview of the TREC-2013 microblog track. In: TREC (2013)
Lv, C., Qiang, R., Fan, F., Yang, J.: Knowledge-based query expansion in real-time microblog search. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds.) AIRS 2015. LNCS, vol. 9460, pp. 43–55. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-28940-3_4
Lv, Y., Zhai, C.: Adaptive relevance feedback in information retrieval. In: Proceedings of CIKM, pp. 255–264 (2009)
Metzler, D., Croft, W.B.: Latent concept expansion using markov random fields. In: Proceedings of SIGIR, pp. 311–318 (2007)
Miao, J., Huang, J.X., Zhao, J.: TopPRF: a probabilistic framework for integrating topic space into pseudo relevance feedback. TOIS 34(4), 22 (2016)
Miyanishi, T., Seki, K., Uehara, K.: Improving pseudo-relevance feedback via tweet selection. In: Proceedings of CIKM, pp. 439–448 (2013)
Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the TREC-2011 microblog track. In: TREC (2011)
Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed gibbs sampling for latent dirichlet allocation. In: Proceedings of KDD, pp. 569–577 (2008)
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at trec-3. NIST Special Publication SP 109, 109 (1995)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Song, Y., Wang, H., Chen, W., Wang, S.: Transfer understanding from head queries to tail queries. In: Proceedings of CIKM, pp. 1299–1308 (2014)
Wang, Y., Huang, H., Feng, C.: Query expansion based on a feedback concept model for microblog retrieval. In: roceedings of WWW, pp. 559–568 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Albishre, K., Li, Y., Xu, Y. (2018). Query-Based Automatic Training Set Selection for Microblog Retrieval. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-93037-4_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93036-7
Online ISBN: 978-3-319-93037-4
eBook Packages: Computer ScienceComputer Science (R0)