Query-Based Automatic Training Set Selection for Microblog Retrieval

Albishre, Khaled; Li, Yuefeng; Xu, Yue

doi:10.1007/978-3-319-93037-4_26

Khaled Albishre^19,20,
Yuefeng Li¹⁹ &
Yue Xu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10938))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2036 Accesses
3 Citations

Abstract

Typical pseudo-relevance feedback models assume that the first-pass documents are the most relevant and use those documents to select feedback terms for query expansion. In real applications, however, short documents, such as microblogs, may not have enough information about the searched topic, thus increasing the chance that irrelevant documents will be included in the initial set of retrieved documents. This situation reduces a feedback model’s ability to capture information that is relevant to users’ needs, which makes determining the best documents for relevant feedback without requiring extra effort from the user a critical challenge. In this paper, we propose an innovative mechanism to automatically select useful feedback documents using a topic modeling technique to improve the effectiveness of pseudo-relevance feedback models. The main idea behind the proposed model is to discover the latent topics in the top-ranked documents that allow for the exploitation of the correlation between terms in relevant topics. To capture discriminative terms for query expansion, we incorporated topical features into a relevance model that focuses on the temporal information in the selected set of documents. Experimental results on TREC 2011–2013 microblog datasets illustrate that the proposed model significantly outperforms all state-of-the-art baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Abdul-Jaleel, N., Allan, J., Croft, W.B., Diaz, F., Larkey, L., Li, X., Smucker, M.D., Wade, C.: UMass at TREC 2004: Novelty and hard. In: TREC (2004)
Google Scholar
Albakour, M., Macdonald, C., Ounis, I., et al.: On sparsity and drift for effective real-time filtering in microblogs. In: Proceedings of CIKM, pp. 419–428 (2013)
Google Scholar
Albishre, K., Albathan, M., Li, Y.: Effective 20 newsgroups dataset cleaning. In: Proceedings of the WI-IAT, vol. 3, pp. 98–101 (2015)
Google Scholar
Albishre, K., Li, Y., Xu, Y.: Effective pseudo-relevance for microblog retrieval. In: Proceedings of ACSW, p. 51 (2017)
Google Scholar
Algarni, A., Li, Y., Xu, Y.: Selected new training documents to update user profile. In: Proceedings of CIKM, pp. 799–808. ACM (2010)
Google Scholar
Andrzejewski, D., Buttler, D.: Latent topic feedback for information retrieval. In: Proceedings of KDD, pp. 600–608 (2011)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. CSUR 44(1), 1 (2012)
Article Google Scholar
Choi, J., Croft, W.B.: Temporal models for microblogs. In: Proceedings of CIKM, pp. 2491–2494 (2012)
Google Scholar
Choi, J., Croft, W.B., Kim, J.Y.: Quality models for microblog retrieval. In: Proceedings of CIKM, pp. 1834–1838 (2012)
Google Scholar
Chuang, J., Gupta, S., Manning, C., Heer, J.: Topic model diagnostics: assessing domain relevance via topical alignment. In: Proceedings of ICML, pp. 612–620 (2013)
Google Scholar
Dong, A., Zhang, R., Kolari, P., Bai, J., Diaz, F., Chang, Y., Zheng, Z., Zha, H.: Time is of the essence: improving recency ranking using twitter data. In: Proceedings of WWW, pp. 331–340 (2010)
Google Scholar
Efron, M., Golovchinsky, G.: Estimation methods for ranking recent information. In: Proceedings of SIGIR, pp. 495–504 (2011)
Google Scholar
Efron, M., Lin, J., He, J., De Vries, A.: Temporal feedback for tweet search with non-parametric density estimation. In: Proceedings of SIGIR, pp. 33–42 (2014)
Google Scholar
Fan, F., Qiang, R., Lv, C., Yang, J.: Improving microblog retrieval with feedback entity model. In: Proceedings of CIKM, pp. 573–582 (2015)
Google Scholar
Gao, Y., Xu, Y., Li, Y.: Pattern-based topics for document modelling in information filtering. IEEE Trans. Knowl. Data Eng. 27(6), 1629–1642 (2015)
Article Google Scholar
Kotov, A., Wang, Y., Agichtein, E.: Leveraging geographical metadata to improve search over social media. In: Proceedings of WWW, pp. 151–152 (2013)
Google Scholar
Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proceedings of SIGIR, pp. 120–127 (2001)
Google Scholar
Li, X., Croft, W.B.: Time-based language models. In: Proceedings of CIKM, pp. 469–475 (2003)
Google Scholar
Li, Y., Algarni, A., Albathan, M., Shen, Y., Bijaksana, M.A.: Relevance feature discovery for text mining. IEEE Trans. Knowl. Data Eng. 27(6), 1656–1669 (2015)
Article Google Scholar
Li, Y., Algarni, A., Zhong, N.: Mining positive and negative patterns for relevance feature discovery. In: Proceedings of KDD, pp. 753–762 (2010)
Google Scholar
Li, Y., Zhou, X., Bruza, P., Xu, Y., Lau, R.Y.: A two-stage decision model for information filtering. Decis. Support Syst. 52(3), 706–716 (2012)
Article Google Scholar
Liang, S., Yilmaz, E., Kanoulas, E.: Dynamic clustering of streaming short documents. In: Proceedings of KDD, pp. 995–1004 (2016)
Google Scholar
Lin, C., Lin, C., Li, J., Wang, D., Chen, Y., Li, T.: Generating event storylines from microblogs. In: Proceedings of CIKM, pp. 175–184 (2012)
Google Scholar
Lin, J., Efron, M.: Overview of the TREC-2013 microblog track. In: TREC (2013)
Google Scholar
Lv, C., Qiang, R., Fan, F., Yang, J.: Knowledge-based query expansion in real-time microblog search. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds.) AIRS 2015. LNCS, vol. 9460, pp. 43–55. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-28940-3_4
Chapter Google Scholar
Lv, Y., Zhai, C.: Adaptive relevance feedback in information retrieval. In: Proceedings of CIKM, pp. 255–264 (2009)
Google Scholar
Metzler, D., Croft, W.B.: Latent concept expansion using markov random fields. In: Proceedings of SIGIR, pp. 311–318 (2007)
Google Scholar
Miao, J., Huang, J.X., Zhao, J.: TopPRF: a probabilistic framework for integrating topic space into pseudo relevance feedback. TOIS 34(4), 22 (2016)
Article Google Scholar
Miyanishi, T., Seki, K., Uehara, K.: Improving pseudo-relevance feedback via tweet selection. In: Proceedings of CIKM, pp. 439–448 (2013)
Google Scholar
Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the TREC-2011 microblog track. In: TREC (2011)
Google Scholar
Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed gibbs sampling for latent dirichlet allocation. In: Proceedings of KDD, pp. 569–577 (2008)
Google Scholar
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at trec-3. NIST Special Publication SP 109, 109 (1995)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Article Google Scholar
Song, Y., Wang, H., Chen, W., Wang, S.: Transfer understanding from head queries to tail queries. In: Proceedings of CIKM, pp. 1299–1308 (2014)
Google Scholar
Wang, Y., Huang, H., Feng, C.: Query expansion based on a feedback concept model for microblog retrieval. In: roceedings of WWW, pp. 559–568 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

School of EECS, Queensland University of Technology (QUT), Brisbane, QLD, Australia
Khaled Albishre, Yuefeng Li & Yue Xu
Umm Al-Qura University, Makkah, Saudi Arabia
Khaled Albishre

Authors

Khaled Albishre
View author publications
You can also search for this author in PubMed Google Scholar
Yuefeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yue Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khaled Albishre .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Albishre, K., Li, Y., Xu, Y. (2018). Query-Based Automatic Training Set Selection for Microblog Retrieval. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-93037-4_26
Published: 20 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93036-7
Online ISBN: 978-3-319-93037-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics