Skip to main content
Log in

LTR-expand: query expansion model based on learning to rank association rules

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Query Expansion (QE) is widely applied to improve the retrieval performance of ad-hoc search, using different techniques and several data sources to find expansion terms. In Information Retrieval literature, selecting expansion terms remains a challenging task that relies on the extraction of term relationships. In this paper, we propose a new learning to rank-based query expansion model. The main idea behind is that, given a query and the set of its related ARs, our model ranks these ARs according to their relevance score regarding to this query and then selects the most suitable ones to be used in the QE process. Experiments are conducted on three test collections, namely: CLEF2003, TREC-Robust and TREC-Microblog, including long, hard and short queries. Results showed that the retrieval performance can be significantly improved when the ARs ranking method is used compared to other state of the art expansion models, especially for hard and long queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. By analogy to the itemsets terminology used in data mining for a set of items.

  2. By analogy to the itemset terminology used in data mining.

  3. In this paper, we denote by |X| the cardinality of the set X.

  4. http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/

  5. Also referred to as preference learning in the literature

  6. www.cs.cornell.edu/people/tj/svm_light/svm_rank.html

  7. http://www.lemurproject.org/

  8. http://www.terrier.org

  9. http://trec.nist.gov/trec_eval/

  10. Available on https://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html#References

  11. This conclusion is consistent with the results obtained with the precision measures P@5,10 and NDCG@5

  12. A topic is considered difficult when the median of the average precision scores of all participants for that topic is below a given threshold (i.e. half of the systems are scored than lower than the threshold), but there exists at least one high outlier. In this context, the most useful metric is geometric mean average precision (GMAP) which uses the geometric mean instead of the arithmetic mean when averaging precision values.

  13. It includes word vectors for a vocabulary of 3 million words and phrases that they trained on roughly 100 billion words from a Google News dataset. The vector length is 300 features.

References

  • Abbache, A., Meziane, F., Belalem, G., Belkredim, F.Z. (2016). Arabic query expansion using wordnet and association rules. IJIIT, 12(3), 51–64.

    Google Scholar 

  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, September 12-15, 1994, Santiago de Chile, Chile (pp. 487–499).

  • Al-Shboul, B., & Myaeng, S.H. (2014). Wikipedia-based query phrase expansion in patent class search. Information Retrieval, 17(5-6), 430–451.

    Article  Google Scholar 

  • Almasri, M., Berrut, C., Chevallet, J. (2016). A comparison of deep learning based query expansion with pseudo-relevance feedback and mutual information. In Advances in information retrieval-38th european conference on IR research, ECIR 2016, Padua, Italy, March 20-23, 2016. Proceedings (pp. 709–715).

  • Amati, G., & Van Rijsbergen, C.J. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions Information Systems, 20(4), 357–389.

    Article  Google Scholar 

  • Bouziri, A., Latiri, C., Gaussier, É., Gelbukh, A.F. (2017). Efficient association rules selecting for automatic query expansion. In Computational linguistics and intelligent text processing - 18th international conference, CICLing 2017, Budapest, Hungary, April 17-23, 2017, Lecture notes in computer science, (Vol. 10762 pp. 563–574): Springer.

  • Buckley, C. (1994). Automatic query expansion using smart : Trec 3. In In proceedings of the third text retrieval conference (TREC-3), pages= 69–80.

  • Cao, G., Nie, J., Gao, J., Robertson, S. (2008). Selecting good expansion terms for pseudo-relevance feedback. In Myaeng, S., Oard, D.W., Sebastiani, F., Chua, T., Leong, M. (Eds.) Proceedings of the 31st annual international ACM SIGIR conference 2008, Singapore, July 20-24, 2008 (pp. 243–250): ACM.

  • Carpineto, C., de Mori, R., Romano, G., Bigi, B. (2001). An information-theoretic approach to automatic query expansion. ACM Transactions Information Systems, 19(1), 1–27.

    Article  Google Scholar 

  • Carpineto, C., & Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys, 44(1), 1–1, 50.

    Article  MATH  Google Scholar 

  • Colace, F., Santo, M.D., Greco, L., Napoletano, P. (2015). Improving relevance feedback-based query expansion by the use of a weighted word pairs approach. JASIST, 66(11), 2223–2234.

    Google Scholar 

  • Crimp, R., & Trotman, A. (2018). Refining query expansion terms using query context. In Proceedings of the 23rd Australasian Document Computing Symposium, ADCS ’18 (pp. 12:1–12:4): ACM.

  • Diaz, F., Mitra, B., Craswell, N. (2016). Query expansion with locally-trained word embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers, pp. 367–377. The Association for Computer Linguistics.

  • Fernández-Reyes, F. C., Hermosillo-Valadez, J., Montes-y-Gomez, M. (2018). A prospect-guided global query expansion strategy using word embeddings. Information Processing & Management, 54(1), 1–13.

    Article  Google Scholar 

  • Ganter, B., & Wille, R. (1999). Formal Concept Analysis. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Houle, M.E., Ma, X., Oria, V., Sun, J. (2017). Query expansion for content-based similarity search using local and global features. ACM Transactions on Multimedia Computing, Communications, and Applications, 13(3), 1–23.

    Article  Google Scholar 

  • Joachims, T. (2006). Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06 (pp. 217–226). ACM.

  • Latiri, C., Haddad, H., Hamrouni, T. (2012). Towards an effective automatic query expansion process using an association rule mining approach. Journal of Intelligent Information System, 39(1), 209–247.

    Article  Google Scholar 

  • Lavrenko, V., & Croft, W.B. (2001). Relevance based language models. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’01 (pp. 120–127). New York: ACM.

  • Li, H. (2014). Learning to rank for information retrieval and natural language processing, second edition. Synthesis Lectures on Human Language Technologies, 7(3), 1–121.

    Article  Google Scholar 

  • Lin, H.C., Wang, L.H., Chen, S.M. (2006). Query expansion for document retrieval based on fuzzy rules and user relevance feedback techniques. Expert Systems with Applications, 31(2), 397–405.

    Article  Google Scholar 

  • Liu, C., Qi, R., Liu, Q. (2013). Query expansion terms based on positive and negative association rules. In 2013 IEEE Third international conference on information science and technology (ICIST) (pp. 802–808).

  • Lv, Y., & Zhai, C. (2014). Revisiting the divergence minimization feedback model. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM ’14 (p. 1863–1866). New York: ACM.

  • Lv, Y., Zhai, C., Chen, W. (2011). A boosting approach to improving pseudo-relevance feedback. In Ma, W., Nie, J., Baeza-Yates, R.A., Chua, T., Croft, W.B. (Eds.) Proceeding of the 34th International ACM SIGIR 2011, Beijing, China, July 25-29, 2011 (pp. 165–174): ACM.

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States (pp. 3111–3119).

  • Ounis, I., Macdonald, C., Lin, J., Soboroff, I. (2011). Overview of the TREC-2011 microblog track. In In Proceedings of TREC 2011.

  • Pal, D., Mitra, M., Datta, K. (2014). Improving query expansion using wordnet. Journal of the Association for Information Science and Technology, 65(12), 2469–2478.

    Article  Google Scholar 

  • Pasquier, N., Bastide, Y., Taouil, R., Stumme, G., Lakhal, L. (2005). Generating a condensed representation for association rules. Journal of Intelligent Information Systems, 24(1), 25–60.

    Article  MATH  Google Scholar 

  • Rungsawang, A., Tangpong, A., Laohawee, P., Khampachua, T. (1999). Novel query expansion technique using apriori algorithm. In Proceedings of the 8th Text REtrieval Conference, TREC 8, pp. 453–456. Gaithersburg, Maryland.

  • Ruthven, I., & Lalmas, M. (2003). A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, 18(2), 95–145.

    Article  Google Scholar 

  • Salton, G., & Buckley, C. (1997). Readings in information retrieval. chap. Improving Retrieval Performance by Relevance Feedback, Morgan Kaufmann Publishers Inc.

  • Sordoni, A., Bengio, Y., Nie, J. (2014). Learning concept embeddings for query expansion by quantum entropy minimization. In Brodley, C.E., & Stone. P. (Eds.) Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Quėbec City, Quėbec, Canada (pp. 1586–1592): AAAI Press.

  • Voorhees, E.M. (2004). Overview of TREC 2004. In Proceedings of the thirteenth text retrieval conference, TREC 2004, Gaithersburg, Maryland, USA, November 16-19, 2004.

  • Xu, B., Lin, H., Lin, Y. (2016). Assessment of learning to rank methods for query expansion. JASIST, 67(6), 1345–1357.

    MathSciNet  Google Scholar 

  • Xu, B., Lin, H., Lin, Y. (2018). Learning to refine expansion terms for biomedical information retrieval using semantic resources. IEEE/ACM Transactions on Computational Biology and Bioinformatics, pp. 1–15.

  • Xu, J., & Croft, W.B. (1996). Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM SIGIR Conference (pp. 4–11). Zurich: ACM Press.

  • Ye, Z., He, B., Huang, X., Lin, H. (2010). Revisiting Rocchio’s Relevance Feedback Algorithm for Probabilistic Models, Springer, Berlin.

  • Zaki, M.J. (2004). Mining non-redundant association rules. Data Mining and Knowledge Discovery, 9(3), 223–248.

    Article  MathSciNet  Google Scholar 

  • Zhai, C., & Lafferty, J. (2001). Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM ’01 (pp. 403–410). ACM.

  • Zhang, Z., Wang, Q., Si, L., Gao, J. (2016). Learning for efficient supervised query expansion via two-stage feature selection. In Perego, R., Sebastiani, F., Aslam, J.A., Ruthven, I., Zobel, J. (Eds.) Proceedings of the 39th International ACM SIGIR conference 2016, Pisa, Italy, July 17-21, 2016 (pp. 265–274): ACM.

  • Zingla, M.A., Latiri, C., Mulhem, P., Berrut, C., Slimani, Y. (2018). Hybrid query expansion model for text and microblog information retrieval. Information Retrieval Journal, 21(4), 337–367.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahlem Bouziri.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bouziri, A., Latiri, C. & Gaussier, E. LTR-expand: query expansion model based on learning to rank association rules. J Intell Inf Syst 55, 261–286 (2020). https://doi.org/10.1007/s10844-020-00596-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-020-00596-8

Keywords

Navigation