Abstract
Ranking documents in terms of their relevance to a given query is fundamental to many real-life applications such as information retrieval and recommendation systems. Extensive study in these application domains has given rise to the development of many efficient ranking models. While most existing research focuses on developing learning to rank (LTR) models, the quality of the training features, which plays an important role in ranking performance, has not been fully studied. Thus, we propose a new approach that discovers effective features for the LTR problem. In this paper, we present a theoretical analysis on which frequent patterns are potentially effective for improving the performance of LTR and then propose an efficient method that selects frequent patterns for LTR. First, we define a new criterion, namely feature significance (or simply significance). Specifically, we use each feature’s value to rank the training instances and define the ranking effectiveness in terms of a performance measure as the significance of the feature. We show that the significance of an infrequent pattern is limited by using formal connection between pattern support and its significance. Then, we propose a methodology that sets the support value when performing frequent pattern mining. Finally, since frequent patterns are not equally effective for LTR, we further provide a coverage-based significant pattern generation algorithm to discover effective patterns and propose a new ranking approach called Significant Frequent Pattern-based Ranking (SFP-Rank), in which the ranking model is built upon the original features as well as the significant frequent patterns. Our experiments confirm that, by incorporating significant frequent patterns to train the ranking model, the performance of the ranking model can be substantially improved.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB ’94, pp 487–499
AOL Dataset (n.d.), http://zola.di.unipi.it/smalltext/datasets.html
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, Reading, MA
Batal I, Hauskrecht M (2010) Constructing classification features using minimal predictive patterns. In: Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM ’10. ACM, New York, NY, USA, pp 869–878
Burges C, Ragno R, Le Q (2006) Learning to rank with nonsmooth cost functions. In: NIPS ’06, pp 193–200
Burges C, Shaked T, Renshaw E et al (2005) Learning to rank using gradient descent. In: ICML ’05, pp 89–96
Cao H, Jiang D, Pei J et al (2008) Context-aware query suggestion by mining click-through and session data. In: KDD ’08, pp 875–883
Cao Y, Xu J, Liu T-Y et al (2006) Adapting ranking svm to document retrieval. In: SIGIR ’06, pp 186–193
Cao Z, Qin T, Liu T-Y et al (2007) Learning to rank: from pairwise approach to listwise approach. In: ICML ’07, pp 129–136
Cheng H, Yan X, Han J et al (2007) Discriminative frequent pattern analysis for effective classification. In: ICDE ’07, pp 169–178
Cheng H, Yan X, Han J et al (2008) Direct discriminative pattern mining for effective classification. In: ICDE ’08, pp. 169–178
Cossock D, Zhang T (2006) Subset ranking using regression. In: Learning theory, volume 4005 of LNCS’06, pp 605–619
Fagin R, Kumar R, Sivakumar D (2003) Comparing top k lists. In: SODA ’03, pp 28–36
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: UAI ’93, pp 1022–1027
Freund Y, Iyer R, Schapire RE et al (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969
Geng X, Liu T-Y, Qin T et al (2007) Feature selection for ranking. In: SIGIR ’07, pp 407–414
Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: FIMI’03
Han J, Cheng H, Xin D et al (2007) Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1):55–86
Hong L, Bekkerman R, Adler J et al (2012) Learning to rank social update streams. In: SIGIR ’12, pp 651–660
Jansen BJ, Spink A, Bateman J et al (1998) Real life information retrieval: a study of user queries on the web. SIGIR Forum 32(1):5–17
Jiang D, Leung KW-T, Ng W (2011) Context-aware search personalization with concept preference. In: CIKM ’11, pp 563–572
Joachims T (2006) Training linear svms in linear time. In: KDD ’06, pp 217–226
Karimzadehgan M, Li W, Zhang R et al (2011) A stochastic learning-to-rank algorithm and its application to contextual advertising. In: WWW ’11, pp 377–386
Li P, Burges CJC, Wu Q (2007) Mcrank: learning to rank using multiple classification and gradient boosting. In: NIPS ’07, pp 845–852
Li W, Han J, Pei J (2001) Cmar: Accurate and efficient classification based on multiple class-association rules. In: ICDM ’01, vol 0, pp 369–376
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: KDD ’98, pp 80–86
Nallapati R (2004) Discriminative models for information retrieval. In: SIGIR ’04, pp 64–71
Qin T, Liu T, Tsai M et al (2006) Learning to search web pages with query-level loss functions. Technical report, Microsoft Research
Qin T, Liu T, Xu J et al (2010) Letor: a benchmark collection for research on learning to rank for information retrieval. Inf Retr 13:346–374
Qin T, Zhang X-D, Wang D-S et al (2007) Ranking with multiple hyperplanes. In: SIGIR ’07, pp 279–286
Sculley D (2010) Combined regression and ranking. In: KDD ’10. ACM, New York, NY, USA, pp 979–988
Song Y, Leung K, Fang Q et al (2013) Fp-rank: an effective ranking approach based on frequent pattern analysis. In: DASFAA ’13
Tan J, Bu Y, Yang B (2009) An efficient close frequent pattern mining algorithm. In: ICICTA ’09, vol 1, pp 528–531
Thomas Fasciano RS, Shin MC (2012) Learning to rank biological motion trajectories. Image Vis Comput 31(6–7):502–510
Tong Y, Chen L, Cheng Y et al (2012) Mining frequent itemsets over uncertain databases. PVLDB’12 5(11):1650–1661
Tong Y, Chen L, Ding B (2012) Discovering threshold-based frequent closed itemsets over probabilistic data. In: ICDE ’12, pp 270–281
Tong Y, Chen L, Yu PS (2012) Ufimt: an uncertain frequent itemset mining toolbox. In: KDD ’12, pp 1508–1511
Tsai M-F, Liu T-Y, Qin T et al (2007) Frank: a ranking method with fidelity loss. In: SIGIR ’07, pp 383–390
Valizadegan H, Jin R, Zhang R et al (2009) Learning to rank by optimizing ndcg measure. In: NIPS ’09
Veloso AA, Almeida HM, Gonçalves MA et al (2008) Learning to rank at query-time using association rules. In: SIGIR ’08, pp 267–274
Verberne S, van Halteren H, Theijssen D et al (2011) Learning to rank for why-question answering. Inf Retr 14:107–132
Volkovs MN, Zemel RS (2009) Boltzrank: learning to maximize expected ranking gain. In: ICML ’09, pp 1089–1096
Wang J, Karypis G (2006) On mining instance-centric classification rules. IEEE Trans. Knowl. Data Eng. 18:1497–1511
Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: SIGIR ’07, pp 391–398
Yin X, Han J (2003) Cpar: classification based on predictive association rules. In: SDM’03
Yue Y, Finley T, Radlinski F et al (2007) A support vector method for optimizing average precision. In: SIGIR’07, pp 271–278
Acknowledgments
We thank anonymous reviewers for their very useful comments and suggestions. This work is supported by HKUST GRF Grant 617610.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Song, Y., Ng, W., Leung, K.WT. et al. SFP-Rank: significant frequent pattern analysis for effective ranking. Knowl Inf Syst 43, 529–553 (2015). https://doi.org/10.1007/s10115-014-0738-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0738-y