Abstract
Learning to rank (LTR) is one of the problems attracting researchers in information retrieval (IR). The LTR problem refers to ranking the retrieved documents for users in search engines, question answering and product recommendation systems. There is a number of LTR approaches based on machine learning and computational intelligence techniques. Most existing LTR methods have limitations, such as being too slow or not being very effective or requiring a huge computer memory to operate. This paper proposes a LTR method that combines a \((1+1)\)-evolutionary strategy with machine learning. Three variants of the method are investigated: ES-Rank, IESR-Rank and IESVM-Rank. They differ on the chromosome initialisation mechanism for the evolutionary process. ES-Rank simply sets all genes in the initial chromosome to the same value. IESR-Rank uses linear regression, and IESVM-Rank uses support vector machine for the initialisation process. Experimental results from comparing the proposed method to fourteen other approaches from the literature show that IESR-Rank achieves the overall highest performance. Ten problem instances are used here, obtained from four datasets: MSLR-WEB10K, LETOR 3 and LETOR 4. Performance is measured at the top-10 query–document pairs retrieved, using five metrics: mean average precision (MAP), root-mean-square error (RMSE), precision (P@10), reciprocal rank (RR@10) and normalized discounted cumulative gain (NDCG@10). The contribution of this paper is proposing an effective and efficient LTR method combining a list-wise evolutionary technique with point-wise and pair-wise machine learning techniques.
Similar content being viewed by others
References
Baeza-Yates RA, Ribeiro-Neto BA (2011) Modern information retrieval–the concepts and technology behind search, 2nd edn. Pearson Education Ltd., Harlow
Beyer H-G, Schwefel H-P (2002) Evolution strategies—a comprehensive introduction. Nat Compu 1:3–52
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Brownlee J (2017) Overfitting and underfitting with machine learning algorithms. https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/
Burges CJC (2010) From RankNet to LambdaRank to LambdaMART: an overview. Technical report, Microsoft Research. http://research.microsoft.com/en-us/um/people/cburges/tech_reports/MSR-TR-2010-82.pdf
Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on machine learning, ICML ’05, pp 89–96, New York, NY. ACM. ISBN:1-59593-180-5. https://doi.org/10.1145/1102351.1102363
Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th international conference on machine learning, ICML ’07, pp 129–136, New York, NY. ACM. ISBN:978-1-59593-793-3. https://doi.org/10.1145/1273496.1273513
Chapelle O, Chang Y (2011) Yahoo! learning to rank challenge overview. In: Proceedings of the Yahoo! Learning to rank challenge, held at ICML 2010, Haifa, Israel, 25 June 2010, pp 1–24. http://www.jmlr.org/proceedings/papers/v14/chapelle11a.html
Dang V (2016) RankLib. http://www.cs.umass.edu/~vdang/ranklib.html
Diaz-Gomez PA, Hougen DF (2007) Initial population for genetic algorithms: a metric approach. In: Proceedings of the 2007 international conference on genetic and evolutionary methods GEM, pp 43–49
Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969. ISSN:1532-4435. http://dl.acm.org/citation.cfm?id=945365.964285
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232. ISSN:00905364. http://www.jstor.org/stable/2699986
Ibrahim OAS, Landa-Silva D (2016) Term frequency with average term occurrences for textual information retrieval. Soft Comput 20(8):3045–3061. https://doi.org/10.1007/s00500-015-1935-7
Ibrahim OAS, Landa-Silva D (2017) Es-rank: evolution strategy learning to rank approach. In: Proceedings of the symposium on applied computing, SAC ’17, pp 944–950, New York, NY. ACM. ISBN:978-1-4503-4486-9. https://doi.org/10.1145/3019612.3019696
Islam MA (2013) Rankgpes: learning to rank for information retrieval using a hybrid genetic programming with evolutionary strategies. Master’s thesis, Computer Science, University of Windsor, Toronto, Canada
Joachims T (2016a) Support vector machine for ranking. https://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html#References
Joachims T (2016b) Svmlight: support vector machine for classification and ranking. http://svmlight.joachims.org/
Li H (2014) Learning to rank for information retrieval and natural language processing, 2nd edn. Morgan & Claypool Publishers, San Rafael ISBN:9781627055857
Lin J-Y, Ke H-R, Chien B-C, Yang W-P (2007) Designing a classifier by a layered multi-population genetic programming approach. Pattern Recognit 40(8):2211–2225
Lin JY, Yeh JY, Liu CC (2012) Learning to rank for information retrieval using layered multi-population genetic programming. In: IEEE international conference on computational intelligence and cybernetics (CyberneticsCom), pp 45–49, July 2012. https://doi.org/10.1109/CyberneticsCom.2012.6381614
Liu T-Y (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331. https://doi.org/10.1561/1500000016
Liu T-Y (2011) Learning to rank for information retrieval. In: The LETOR datasets. Springer, Berlin, pp 133–143. ISBN:978-3-642-14267-3. https://doi.org/10.1007/978-3-642-14267-3_10
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York ISBN 0521865719, 9780521865715
Metzler D, Bruce CW (2007) Linear feature-based models for information retrieval. Inf Retr 10(3):257–274. https://doi.org/10.1007/s10791-006-9019-z
Mick J-YL (2016). http://people.cs.nctu.edu.tw/~jylin/lagep/lagep.html
Miller SJ (2006) The method of least squares. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.710.4069
Mohan A, Chen Z, Weinberger KQ (2011) Web-search ranking with initialized gradient boosted regression trees. J Mach Learn Res Workshop Conf Proc 14:77–89
Qin T, Liu T-Y (2013) Introducing LETOR 4.0 datasets. CoRR. arxiv:1306.2597
Qin T, Liu T-Y, Xu J, Li H (2010) Letor: a benchmark collection for research on learning to rank for information retrieval. Inf Retr 13(4):346–374. https://doi.org/10.1007/s10791-009-9123-y
Sculley D (2010) Combined regression and ranking. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10, pp 979–988, New York, NY, 2010. ACM. ISBN:978-1-4503-0055-1. https://doi.org/10.1145/1835804.1835928
Tonon A, Demartini G, Cudr-Mauroux P (2015) Pooling-based continuous evaluation of information retrieval systems. Inf Retr J 18(5):445–472. https://doi.org/10.1007/s10791-015-9266-y
Urbano J (2016) Test collection reliability: a study of bias and robustness to statistical assumptions via stochastic simulation. Inf Retr J 19(3):313–350. https://doi.org/10.1007/s10791-015-9274-y
Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07, pp 391–398, New York, NY. ACM. ISBN:978-1-59593-597-7. https://doi.org/10.1145/1277741.1277809
Yan X, Su XG (2009) Linear regression analysis: theory and computing. World Scientific Publishing Co Inc., River Edge (ISBN:9789812834102, 9812834109)
Acknowledgements
Osman Ibrahim would like to thank Minia University in Egypt for their support to pursue the research in this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by P. Angelov, F. Chao.
Rights and permissions
About this article
Cite this article
Ibrahim, O.A.S., Landa-Silva, D. An evolutionary strategy with machine learning for learning to rank in information retrieval. Soft Comput 22, 3171–3185 (2018). https://doi.org/10.1007/s00500-017-2988-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-017-2988-6