SFP-Rank: significant frequent pattern analysis for effective ranking

Song, Yuanfeng; Ng, Wilfred; Leung, Kenneth Wai-Ting; Fang, Qiong

doi:10.1007/s10115-014-0738-y

SFP-Rank: significant frequent pattern analysis for effective ranking

Regular Paper
Published: 25 March 2014

Volume 43, pages 529–553, (2015)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Yuanfeng Song¹,
Wilfred Ng¹,
Kenneth Wai-Ting Leung¹ &
…
Qiong Fang¹

333 Accesses
5 Citations
Explore all metrics

Abstract

Ranking documents in terms of their relevance to a given query is fundamental to many real-life applications such as information retrieval and recommendation systems. Extensive study in these application domains has given rise to the development of many efficient ranking models. While most existing research focuses on developing learning to rank (LTR) models, the quality of the training features, which plays an important role in ranking performance, has not been fully studied. Thus, we propose a new approach that discovers effective features for the LTR problem. In this paper, we present a theoretical analysis on which frequent patterns are potentially effective for improving the performance of LTR and then propose an efficient method that selects frequent patterns for LTR. First, we define a new criterion, namely feature significance (or simply significance). Specifically, we use each feature’s value to rank the training instances and define the ranking effectiveness in terms of a performance measure as the significance of the feature. We show that the significance of an infrequent pattern is limited by using formal connection between pattern support and its significance. Then, we propose a methodology that sets the support value when performing frequent pattern mining. Finally, since frequent patterns are not equally effective for LTR, we further provide a coverage-based significant pattern generation algorithm to discover effective patterns and propose a new ranking approach called Significant Frequent Pattern-based Ranking (SFP-Rank), in which the ranking model is built upon the original features as well as the significant frequent patterns. Our experiments confirm that, by incorporating significant frequent patterns to train the ranking model, the performance of the ranking model can be substantially improved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FP-Rank: An Effective Ranking Approach Based on Frequent Pattern Analysis

A Bayesian Approach to Sparse Learning-to-Rank for Search Engine Optimization

Linear feature extraction for ranking

Article 02 May 2018

Notes

http://learningtorankchallenge.yahoo.com/datasets.php.

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB ’94, pp 487–499
AOL Dataset (n.d.), http://zola.di.unipi.it/smalltext/datasets.html
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, Reading, MA
Google Scholar
Batal I, Hauskrecht M (2010) Constructing classification features using minimal predictive patterns. In: Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM ’10. ACM, New York, NY, USA, pp 869–878
Burges C, Ragno R, Le Q (2006) Learning to rank with nonsmooth cost functions. In: NIPS ’06, pp 193–200
Burges C, Shaked T, Renshaw E et al (2005) Learning to rank using gradient descent. In: ICML ’05, pp 89–96
Cao H, Jiang D, Pei J et al (2008) Context-aware query suggestion by mining click-through and session data. In: KDD ’08, pp 875–883
Cao Y, Xu J, Liu T-Y et al (2006) Adapting ranking svm to document retrieval. In: SIGIR ’06, pp 186–193
Cao Z, Qin T, Liu T-Y et al (2007) Learning to rank: from pairwise approach to listwise approach. In: ICML ’07, pp 129–136
Cheng H, Yan X, Han J et al (2007) Discriminative frequent pattern analysis for effective classification. In: ICDE ’07, pp 169–178
Cheng H, Yan X, Han J et al (2008) Direct discriminative pattern mining for effective classification. In: ICDE ’08, pp. 169–178
Cossock D, Zhang T (2006) Subset ranking using regression. In: Learning theory, volume 4005 of LNCS’06, pp 605–619
Fagin R, Kumar R, Sivakumar D (2003) Comparing top k lists. In: SODA ’03, pp 28–36
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: UAI ’93, pp 1022–1027
Freund Y, Iyer R, Schapire RE et al (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969
Google Scholar
Geng X, Liu T-Y, Qin T et al (2007) Feature selection for ranking. In: SIGIR ’07, pp 407–414
Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: FIMI’03
Han J, Cheng H, Xin D et al (2007) Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1):55–86
Article MathSciNet Google Scholar
Hong L, Bekkerman R, Adler J et al (2012) Learning to rank social update streams. In: SIGIR ’12, pp 651–660
Jansen BJ, Spink A, Bateman J et al (1998) Real life information retrieval: a study of user queries on the web. SIGIR Forum 32(1):5–17
Article Google Scholar
Jiang D, Leung KW-T, Ng W (2011) Context-aware search personalization with concept preference. In: CIKM ’11, pp 563–572
Joachims T (2006) Training linear svms in linear time. In: KDD ’06, pp 217–226
Karimzadehgan M, Li W, Zhang R et al (2011) A stochastic learning-to-rank algorithm and its application to contextual advertising. In: WWW ’11, pp 377–386
Li P, Burges CJC, Wu Q (2007) Mcrank: learning to rank using multiple classification and gradient boosting. In: NIPS ’07, pp 845–852
Li W, Han J, Pei J (2001) Cmar: Accurate and efficient classification based on multiple class-association rules. In: ICDM ’01, vol 0, pp 369–376
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: KDD ’98, pp 80–86
Nallapati R (2004) Discriminative models for information retrieval. In: SIGIR ’04, pp 64–71
Qin T, Liu T, Tsai M et al (2006) Learning to search web pages with query-level loss functions. Technical report, Microsoft Research
Qin T, Liu T, Xu J et al (2010) Letor: a benchmark collection for research on learning to rank for information retrieval. Inf Retr 13:346–374
Article Google Scholar
Qin T, Zhang X-D, Wang D-S et al (2007) Ranking with multiple hyperplanes. In: SIGIR ’07, pp 279–286
Sculley D (2010) Combined regression and ranking. In: KDD ’10. ACM, New York, NY, USA, pp 979–988
Song Y, Leung K, Fang Q et al (2013) Fp-rank: an effective ranking approach based on frequent pattern analysis. In: DASFAA ’13
Tan J, Bu Y, Yang B (2009) An efficient close frequent pattern mining algorithm. In: ICICTA ’09, vol 1, pp 528–531
Thomas Fasciano RS, Shin MC (2012) Learning to rank biological motion trajectories. Image Vis Comput 31(6–7):502–510
Google Scholar
Tong Y, Chen L, Cheng Y et al (2012) Mining frequent itemsets over uncertain databases. PVLDB’12 5(11):1650–1661
Google Scholar
Tong Y, Chen L, Ding B (2012) Discovering threshold-based frequent closed itemsets over probabilistic data. In: ICDE ’12, pp 270–281
Tong Y, Chen L, Yu PS (2012) Ufimt: an uncertain frequent itemset mining toolbox. In: KDD ’12, pp 1508–1511
Tsai M-F, Liu T-Y, Qin T et al (2007) Frank: a ranking method with fidelity loss. In: SIGIR ’07, pp 383–390
Valizadegan H, Jin R, Zhang R et al (2009) Learning to rank by optimizing ndcg measure. In: NIPS ’09
Veloso AA, Almeida HM, Gonçalves MA et al (2008) Learning to rank at query-time using association rules. In: SIGIR ’08, pp 267–274
Verberne S, van Halteren H, Theijssen D et al (2011) Learning to rank for why-question answering. Inf Retr 14:107–132
Article Google Scholar
Volkovs MN, Zemel RS (2009) Boltzrank: learning to maximize expected ranking gain. In: ICML ’09, pp 1089–1096
Wang J, Karypis G (2006) On mining instance-centric classification rules. IEEE Trans. Knowl. Data Eng. 18:1497–1511
Article Google Scholar
Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: SIGIR ’07, pp 391–398
Yin X, Han J (2003) Cpar: classification based on predictive association rules. In: SDM’03
Yue Y, Finley T, Radlinski F et al (2007) A support vector method for optimizing average precision. In: SIGIR’07, pp 271–278

Download references

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work is supported by HKUST GRF Grant 617610.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong, China
Yuanfeng Song, Wilfred Ng, Kenneth Wai-Ting Leung & Qiong Fang

Authors

Yuanfeng Song
View author publications
You can also search for this author in PubMed Google Scholar
Wilfred Ng
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth Wai-Ting Leung
View author publications
You can also search for this author in PubMed Google Scholar
Qiong Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiong Fang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, Y., Ng, W., Leung, K.WT. et al. SFP-Rank: significant frequent pattern analysis for effective ranking. Knowl Inf Syst 43, 529–553 (2015). https://doi.org/10.1007/s10115-014-0738-y

Download citation

Received: 18 February 2013
Revised: 18 January 2014
Accepted: 01 February 2014
Published: 25 March 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10115-014-0738-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SFP-Rank: significant frequent pattern analysis for effective ranking

Abstract

Access this article

Similar content being viewed by others

FP-Rank: An Effective Ranking Approach Based on Frequent Pattern Analysis

A Bayesian Approach to Sparse Learning-to-Rank for Search Engine Optimization

Linear feature extraction for ranking

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SFP-Rank: significant frequent pattern analysis for effective ranking

Abstract

Access this article

Similar content being viewed by others

FP-Rank: An Effective Ranking Approach Based on Frequent Pattern Analysis

A Bayesian Approach to Sparse Learning-to-Rank for Search Engine Optimization

Linear feature extraction for ranking

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation