Skip to main content
Log in

SFP-Rank: significant frequent pattern analysis for effective ranking

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Ranking documents in terms of their relevance to a given query is fundamental to many real-life applications such as information retrieval and recommendation systems. Extensive study in these application domains has given rise to the development of many efficient ranking models. While most existing research focuses on developing learning to rank (LTR) models, the quality of the training features, which plays an important role in ranking performance, has not been fully studied. Thus, we propose a new approach that discovers effective features for the LTR problem. In this paper, we present a theoretical analysis on which frequent patterns are potentially effective for improving the performance of LTR and then propose an efficient method that selects frequent patterns for LTR. First, we define a new criterion, namely feature significance (or simply significance). Specifically, we use each feature’s value to rank the training instances and define the ranking effectiveness in terms of a performance measure as the significance of the feature. We show that the significance of an infrequent pattern is limited by using formal connection between pattern support and its significance. Then, we propose a methodology that sets the support value when performing frequent pattern mining. Finally, since frequent patterns are not equally effective for LTR, we further provide a coverage-based significant pattern generation algorithm to discover effective patterns and propose a new ranking approach called Significant Frequent Pattern-based Ranking (SFP-Rank), in which the ranking model is built upon the original features as well as the significant frequent patterns. Our experiments confirm that, by incorporating significant frequent patterns to train the ranking model, the performance of the ranking model can be substantially improved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://learningtorankchallenge.yahoo.com/datasets.php.

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB ’94, pp 487–499

  2. AOL Dataset (n.d.), http://zola.di.unipi.it/smalltext/datasets.html

  3. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, Reading, MA

    Google Scholar 

  4. Batal I, Hauskrecht M (2010) Constructing classification features using minimal predictive patterns. In: Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM ’10. ACM, New York, NY, USA, pp 869–878

  5. Burges C, Ragno R, Le Q (2006) Learning to rank with nonsmooth cost functions. In: NIPS ’06, pp 193–200

  6. Burges C, Shaked T, Renshaw E et al (2005) Learning to rank using gradient descent. In: ICML ’05, pp 89–96

  7. Cao H, Jiang D, Pei J et al (2008) Context-aware query suggestion by mining click-through and session data. In: KDD ’08, pp 875–883

  8. Cao Y, Xu J, Liu T-Y et al (2006) Adapting ranking svm to document retrieval. In: SIGIR ’06, pp 186–193

  9. Cao Z, Qin T, Liu T-Y et al (2007) Learning to rank: from pairwise approach to listwise approach. In: ICML ’07, pp 129–136

  10. Cheng H, Yan X, Han J et al (2007) Discriminative frequent pattern analysis for effective classification. In: ICDE ’07, pp 169–178

  11. Cheng H, Yan X, Han J et al (2008) Direct discriminative pattern mining for effective classification. In: ICDE ’08, pp. 169–178

  12. Cossock D, Zhang T (2006) Subset ranking using regression. In: Learning theory, volume 4005 of LNCS’06, pp 605–619

  13. Fagin R, Kumar R, Sivakumar D (2003) Comparing top k lists. In: SODA ’03, pp 28–36

  14. Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: UAI ’93, pp 1022–1027

  15. Freund Y, Iyer R, Schapire RE et al (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969

    Google Scholar 

  16. Geng X, Liu T-Y, Qin T et al (2007) Feature selection for ranking. In: SIGIR ’07, pp 407–414

  17. Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: FIMI’03

  18. Han J, Cheng H, Xin D et al (2007) Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1):55–86

    Article  MathSciNet  Google Scholar 

  19. Hong L, Bekkerman R, Adler J et al (2012) Learning to rank social update streams. In: SIGIR ’12, pp 651–660

  20. Jansen BJ, Spink A, Bateman J et al (1998) Real life information retrieval: a study of user queries on the web. SIGIR Forum 32(1):5–17

    Article  Google Scholar 

  21. Jiang D, Leung KW-T, Ng W (2011) Context-aware search personalization with concept preference. In: CIKM ’11, pp 563–572

  22. Joachims T (2006) Training linear svms in linear time. In: KDD ’06, pp 217–226

  23. Karimzadehgan M, Li W, Zhang R et al (2011) A stochastic learning-to-rank algorithm and its application to contextual advertising. In: WWW ’11, pp 377–386

  24. Li P, Burges CJC, Wu Q (2007) Mcrank: learning to rank using multiple classification and gradient boosting. In: NIPS ’07, pp 845–852

  25. Li W, Han J, Pei J (2001) Cmar: Accurate and efficient classification based on multiple class-association rules. In: ICDM ’01, vol 0, pp 369–376

  26. Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: KDD ’98, pp 80–86

  27. Nallapati R (2004) Discriminative models for information retrieval. In: SIGIR ’04, pp 64–71

  28. Qin T, Liu T, Tsai M et al (2006) Learning to search web pages with query-level loss functions. Technical report, Microsoft Research

  29. Qin T, Liu T, Xu J et al (2010) Letor: a benchmark collection for research on learning to rank for information retrieval. Inf Retr 13:346–374

    Article  Google Scholar 

  30. Qin T, Zhang X-D, Wang D-S et al (2007) Ranking with multiple hyperplanes. In: SIGIR ’07, pp 279–286

  31. Sculley D (2010) Combined regression and ranking. In: KDD ’10. ACM, New York, NY, USA, pp 979–988

  32. Song Y, Leung K, Fang Q et al (2013) Fp-rank: an effective ranking approach based on frequent pattern analysis. In: DASFAA ’13

  33. Tan J, Bu Y, Yang B (2009) An efficient close frequent pattern mining algorithm. In: ICICTA ’09, vol 1, pp 528–531

  34. Thomas Fasciano RS, Shin MC (2012) Learning to rank biological motion trajectories. Image Vis Comput 31(6–7):502–510

    Google Scholar 

  35. Tong Y, Chen L, Cheng Y et al (2012) Mining frequent itemsets over uncertain databases. PVLDB’12 5(11):1650–1661

    Google Scholar 

  36. Tong Y, Chen L, Ding B (2012) Discovering threshold-based frequent closed itemsets over probabilistic data. In: ICDE ’12, pp 270–281

  37. Tong Y, Chen L, Yu PS (2012) Ufimt: an uncertain frequent itemset mining toolbox. In: KDD ’12, pp 1508–1511

  38. Tsai M-F, Liu T-Y, Qin T et al (2007) Frank: a ranking method with fidelity loss. In: SIGIR ’07, pp 383–390

  39. Valizadegan H, Jin R, Zhang R et al (2009) Learning to rank by optimizing ndcg measure. In: NIPS ’09

  40. Veloso AA, Almeida HM, Gonçalves MA et al (2008) Learning to rank at query-time using association rules. In: SIGIR ’08, pp 267–274

  41. Verberne S, van Halteren H, Theijssen D et al (2011) Learning to rank for why-question answering. Inf Retr 14:107–132

    Article  Google Scholar 

  42. Volkovs MN, Zemel RS (2009) Boltzrank: learning to maximize expected ranking gain. In: ICML ’09, pp 1089–1096

  43. Wang J, Karypis G (2006) On mining instance-centric classification rules. IEEE Trans. Knowl. Data Eng. 18:1497–1511

    Article  Google Scholar 

  44. Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: SIGIR ’07, pp 391–398

  45. Yin X, Han J (2003) Cpar: classification based on predictive association rules. In: SDM’03

  46. Yue Y, Finley T, Radlinski F et al (2007) A support vector method for optimizing average precision. In: SIGIR’07, pp 271–278

Download references

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work is supported by HKUST GRF Grant 617610.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiong Fang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, Y., Ng, W., Leung, K.WT. et al. SFP-Rank: significant frequent pattern analysis for effective ranking. Knowl Inf Syst 43, 529–553 (2015). https://doi.org/10.1007/s10115-014-0738-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-014-0738-y

Keywords

Navigation