ABSTRACT
Learning ranking (or preference) functions has been a major issue in the machine learning community and has produced many applications in information retrieval. SVMs (Support Vector Machines) - a classification and regression methodology - have also shown excellent performance in learning ranking functions. They effectively learn ranking functions of high generalization based on the "large-margin" principle and also systematically support nonlinear ranking by the "kernel trick". In this paper, we propose an SVM selective sampling technique for learning ranking functions. SVM selective sampling (or active learning with SVM) has been studied in the context of classification. Such techniques reduce the labeling effort in learning classification functions by selecting only the most informative samples to be labeled. However, they are not extendable to learning ranking functions, as the labeled data in ranking is relative ordering, or partial orders of data. Our proposed sampling technique effectively learns an accurate SVM ranking function with fewer partial orders. We apply our sampling technique to the data retrieval application, which enables fuzzy search on relational databases by interacting with users for learning their preferences. Experimental results show a significant reduction of the labeling effort in inducing accurate ranking functions.
- K. Brinker. Active learning of label ranking functions. In Proc. Int. Conf. Machine Learning (ICML'04), 2004. Google ScholarDigital Library
- N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. In Proc. Int. Conf. Data Engineering (ICDE'02), 2002. Google ScholarDigital Library
- C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121--167, 1998. Google ScholarDigital Library
- E. Chang and S. Tong. Support vector machine active learning for image retrieval. In ACM Multimedia 2001, 2001. Google ScholarDigital Library
- N. Christianini and J. Shawe-Taylor. An Introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, 2000. Google ScholarDigital Library
- W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. In Proc. Advances in Neural Information Processing Systems (NIPS'98), 1998. Google ScholarDigital Library
- J. Furnkranz and E. Hullermeier. Pairwise preference learning and ranking. In Proc. European Conf. Machine Learning (ECML'03), 2003.Google ScholarDigital Library
- S. Har-Peled, D. Roth, and D. Zimak. Constraint classification: A new approach to multiclass classification and ranking. In Proc. Advances in Neural Information Processing Systems (NIPS'02), 2002. Google ScholarDigital Library
- R. Herbrich, T. Graepel, and K. Obermayer, editors. Large margin rank boundaries for ordinal regression. MIT-Press, 2000.Google Scholar
- V. Hristidis, N. Koudas, and Y. Papakonstantinou. PREFER: A system for the efficient execution of multi-parametric ranked queries. Proceedings ACM SIGMOD International Conference on Management of Data, 2001. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'02), 2002. Google ScholarDigital Library
- G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. In Proc. Int. Conf. Machine Learning (ICML'00), pages 839--846, 2000. Google ScholarDigital Library
- S. Tong and D. Koller. Support vector machine active learning with applications to text classification. In Proc. Int. Conf. Machine Learning (ICML'00), pages 999--1006, 2000. Google ScholarDigital Library
- V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, 1998. Google ScholarDigital Library
- H. Yu, S. Hwang, and K. C.-C. Chang. Rankfp: A framework for supporting rank formulation and processing. In Proc. Int. Conf. Data Engineering (ICDE'05), 2005. Google ScholarDigital Library
Index Terms
- SVM selective sampling for ranking with application to data retrieval
Recommendations
Compression-Based Selective Sampling for Learning to Rank
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementLearning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can be later used to rank new query results. These training sets are very costly and laborious to produce, requiring human annotators to assess the relevance or ...
Passive Sampling for Regression
ICDM '10: Proceedings of the 2010 IEEE International Conference on Data MiningActive sampling (also called active learning or selective sampling) has been extensively researched for classification and rank learning methods, which is to select the most informative samples from unlabeled data such that, once the samples are labeled,...
An active learning-based SVM multi-class classification model
Traditional multi-class classification models are based on labeled data and are not applicable to unlabeled data. To overcome this limitation, this paper presents a multi-class classification model that is based on active learning and support vector ...
Comments