Skip to main content

Query-Based Automatic Training Set Selection for Microblog Retrieval

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10938))

Included in the following conference series:

Abstract

Typical pseudo-relevance feedback models assume that the first-pass documents are the most relevant and use those documents to select feedback terms for query expansion. In real applications, however, short documents, such as microblogs, may not have enough information about the searched topic, thus increasing the chance that irrelevant documents will be included in the initial set of retrieved documents. This situation reduces a feedback model’s ability to capture information that is relevant to users’ needs, which makes determining the best documents for relevant feedback without requiring extra effort from the user a critical challenge. In this paper, we propose an innovative mechanism to automatically select useful feedback documents using a topic modeling technique to improve the effectiveness of pseudo-relevance feedback models. The main idea behind the proposed model is to discover the latent topics in the top-ranked documents that allow for the exploitation of the correlation between terms in relevant topics. To capture discriminative terms for query expansion, we incorporated topical features into a relevance model that focuses on the temporal information in the selected set of documents. Experimental results on TREC 2011–2013 microblog datasets illustrate that the proposed model significantly outperforms all state-of-the-art baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://github.com/shuyo/ldig/.

  2. 2.

    https://github.com/lintool/twitter-tools/wiki/.

  3. 3.

    http://lucene.apache.org/.

  4. 4.

    http://mallet.cs.umass.edu/.

References

  1. Abdul-Jaleel, N., Allan, J., Croft, W.B., Diaz, F., Larkey, L., Li, X., Smucker, M.D., Wade, C.: UMass at TREC 2004: Novelty and hard. In: TREC (2004)

    Google Scholar 

  2. Albakour, M., Macdonald, C., Ounis, I., et al.: On sparsity and drift for effective real-time filtering in microblogs. In: Proceedings of CIKM, pp. 419–428 (2013)

    Google Scholar 

  3. Albishre, K., Albathan, M., Li, Y.: Effective 20 newsgroups dataset cleaning. In: Proceedings of the WI-IAT, vol. 3, pp. 98–101 (2015)

    Google Scholar 

  4. Albishre, K., Li, Y., Xu, Y.: Effective pseudo-relevance for microblog retrieval. In: Proceedings of ACSW, p. 51 (2017)

    Google Scholar 

  5. Algarni, A., Li, Y., Xu, Y.: Selected new training documents to update user profile. In: Proceedings of CIKM, pp. 799–808. ACM (2010)

    Google Scholar 

  6. Andrzejewski, D., Buttler, D.: Latent topic feedback for information retrieval. In: Proceedings of KDD, pp. 600–608 (2011)

    Google Scholar 

  7. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  8. Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. CSUR 44(1), 1 (2012)

    Article  Google Scholar 

  9. Choi, J., Croft, W.B.: Temporal models for microblogs. In: Proceedings of CIKM, pp. 2491–2494 (2012)

    Google Scholar 

  10. Choi, J., Croft, W.B., Kim, J.Y.: Quality models for microblog retrieval. In: Proceedings of CIKM, pp. 1834–1838 (2012)

    Google Scholar 

  11. Chuang, J., Gupta, S., Manning, C., Heer, J.: Topic model diagnostics: assessing domain relevance via topical alignment. In: Proceedings of ICML, pp. 612–620 (2013)

    Google Scholar 

  12. Dong, A., Zhang, R., Kolari, P., Bai, J., Diaz, F., Chang, Y., Zheng, Z., Zha, H.: Time is of the essence: improving recency ranking using twitter data. In: Proceedings of WWW, pp. 331–340 (2010)

    Google Scholar 

  13. Efron, M., Golovchinsky, G.: Estimation methods for ranking recent information. In: Proceedings of SIGIR, pp. 495–504 (2011)

    Google Scholar 

  14. Efron, M., Lin, J., He, J., De Vries, A.: Temporal feedback for tweet search with non-parametric density estimation. In: Proceedings of SIGIR, pp. 33–42 (2014)

    Google Scholar 

  15. Fan, F., Qiang, R., Lv, C., Yang, J.: Improving microblog retrieval with feedback entity model. In: Proceedings of CIKM, pp. 573–582 (2015)

    Google Scholar 

  16. Gao, Y., Xu, Y., Li, Y.: Pattern-based topics for document modelling in information filtering. IEEE Trans. Knowl. Data Eng. 27(6), 1629–1642 (2015)

    Article  Google Scholar 

  17. Kotov, A., Wang, Y., Agichtein, E.: Leveraging geographical metadata to improve search over social media. In: Proceedings of WWW, pp. 151–152 (2013)

    Google Scholar 

  18. Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proceedings of SIGIR, pp. 120–127 (2001)

    Google Scholar 

  19. Li, X., Croft, W.B.: Time-based language models. In: Proceedings of CIKM, pp. 469–475 (2003)

    Google Scholar 

  20. Li, Y., Algarni, A., Albathan, M., Shen, Y., Bijaksana, M.A.: Relevance feature discovery for text mining. IEEE Trans. Knowl. Data Eng. 27(6), 1656–1669 (2015)

    Article  Google Scholar 

  21. Li, Y., Algarni, A., Zhong, N.: Mining positive and negative patterns for relevance feature discovery. In: Proceedings of KDD, pp. 753–762 (2010)

    Google Scholar 

  22. Li, Y., Zhou, X., Bruza, P., Xu, Y., Lau, R.Y.: A two-stage decision model for information filtering. Decis. Support Syst. 52(3), 706–716 (2012)

    Article  Google Scholar 

  23. Liang, S., Yilmaz, E., Kanoulas, E.: Dynamic clustering of streaming short documents. In: Proceedings of KDD, pp. 995–1004 (2016)

    Google Scholar 

  24. Lin, C., Lin, C., Li, J., Wang, D., Chen, Y., Li, T.: Generating event storylines from microblogs. In: Proceedings of CIKM, pp. 175–184 (2012)

    Google Scholar 

  25. Lin, J., Efron, M.: Overview of the TREC-2013 microblog track. In: TREC (2013)

    Google Scholar 

  26. Lv, C., Qiang, R., Fan, F., Yang, J.: Knowledge-based query expansion in real-time microblog search. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds.) AIRS 2015. LNCS, vol. 9460, pp. 43–55. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-28940-3_4

    Chapter  Google Scholar 

  27. Lv, Y., Zhai, C.: Adaptive relevance feedback in information retrieval. In: Proceedings of CIKM, pp. 255–264 (2009)

    Google Scholar 

  28. Metzler, D., Croft, W.B.: Latent concept expansion using markov random fields. In: Proceedings of SIGIR, pp. 311–318 (2007)

    Google Scholar 

  29. Miao, J., Huang, J.X., Zhao, J.: TopPRF: a probabilistic framework for integrating topic space into pseudo relevance feedback. TOIS 34(4), 22 (2016)

    Article  Google Scholar 

  30. Miyanishi, T., Seki, K., Uehara, K.: Improving pseudo-relevance feedback via tweet selection. In: Proceedings of CIKM, pp. 439–448 (2013)

    Google Scholar 

  31. Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the TREC-2011 microblog track. In: TREC (2011)

    Google Scholar 

  32. Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed gibbs sampling for latent dirichlet allocation. In: Proceedings of KDD, pp. 569–577 (2008)

    Google Scholar 

  33. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at trec-3. NIST Special Publication SP 109, 109 (1995)

    Google Scholar 

  34. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)

    Article  Google Scholar 

  35. Song, Y., Wang, H., Chen, W., Wang, S.: Transfer understanding from head queries to tail queries. In: Proceedings of CIKM, pp. 1299–1308 (2014)

    Google Scholar 

  36. Wang, Y., Huang, H., Feng, C.: Query expansion based on a feedback concept model for microblog retrieval. In: roceedings of WWW, pp. 559–568 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khaled Albishre .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Albishre, K., Li, Y., Xu, Y. (2018). Query-Based Automatic Training Set Selection for Microblog Retrieval. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93037-4_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93036-7

  • Online ISBN: 978-3-319-93037-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics