Article

SVM selective sampling for ranking with application to data retrieval

Author:
Hwanjo Yu

University of Iowa, Iowa City, IA

University of Iowa, Iowa City, IA
View Profile

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data miningAugust 2005Pages 354–363https://doi.org/10.1145/1081870.1081911

Published:21 August 2005Publication History

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Pages 354–363

ABSTRACT

Learning ranking (or preference) functions has been a major issue in the machine learning community and has produced many applications in information retrieval. SVMs (Support Vector Machines) - a classification and regression methodology - have also shown excellent performance in learning ranking functions. They effectively learn ranking functions of high generalization based on the "large-margin" principle and also systematically support nonlinear ranking by the "kernel trick". In this paper, we propose an SVM selective sampling technique for learning ranking functions. SVM selective sampling (or active learning with SVM) has been studied in the context of classification. Such techniques reduce the labeling effort in learning classification functions by selecting only the most informative samples to be labeled. However, they are not extendable to learning ranking functions, as the labeled data in ranking is relative ordering, or partial orders of data. Our proposed sampling technique effectively learns an accurate SVM ranking function with fewer partial orders. We apply our sampling technique to the data retrieval application, which enables fuzzy search on relational databases by interacting with users for learning their preferences. Experimental results show a significant reduction of the labeling effort in inducing accurate ranking functions.

References

K. Brinker. Active learning of label ranking functions. In Proc. Int. Conf. Machine Learning (ICML'04), 2004. Google ScholarDigital Library
N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. In Proc. Int. Conf. Data Engineering (ICDE'02), 2002. Google ScholarDigital Library
C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121--167, 1998. Google ScholarDigital Library
E. Chang and S. Tong. Support vector machine active learning for image retrieval. In ACM Multimedia 2001, 2001. Google ScholarDigital Library
N. Christianini and J. Shawe-Taylor. An Introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, 2000. Google ScholarDigital Library
W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. In Proc. Advances in Neural Information Processing Systems (NIPS'98), 1998. Google ScholarDigital Library
J. Furnkranz and E. Hullermeier. Pairwise preference learning and ranking. In Proc. European Conf. Machine Learning (ECML'03), 2003.Google ScholarDigital Library
S. Har-Peled, D. Roth, and D. Zimak. Constraint classification: A new approach to multiclass classification and ranking. In Proc. Advances in Neural Information Processing Systems (NIPS'02), 2002. Google ScholarDigital Library
R. Herbrich, T. Graepel, and K. Obermayer, editors. Large margin rank boundaries for ordinal regression. MIT-Press, 2000.Google Scholar
V. Hristidis, N. Koudas, and Y. Papakonstantinou. PREFER: A system for the efficient execution of multi-parametric ranked queries. Proceedings ACM SIGMOD International Conference on Management of Data, 2001. Google ScholarDigital Library
T. Joachims. Optimizing search engines using clickthrough data. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'02), 2002. Google ScholarDigital Library
G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. In Proc. Int. Conf. Machine Learning (ICML'00), pages 839--846, 2000. Google ScholarDigital Library
S. Tong and D. Koller. Support vector machine active learning with applications to text classification. In Proc. Int. Conf. Machine Learning (ICML'00), pages 999--1006, 2000. Google ScholarDigital Library
V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, 1998. Google ScholarDigital Library
H. Yu, S. Hwang, and K. C.-C. Chang. Rankfp: A framework for supporting rank formulation and processing. In Proc. Int. Conf. Data Engineering (ICDE'05), 2005. Google ScholarDigital Library

Index Terms

SVM selective sampling for ranking with application to data retrieval
1. Computing methodologies
2. Information systems
  1. Information systems applications

Recommendations

Compression-Based Selective Sampling for Learning to Rank
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can be later used to rank new query results. These training sets are very costly and laborious to produce, requiring human annotators to assess the relevance or ...
Read More
Passive Sampling for Regression
ICDM '10: Proceedings of the 2010 IEEE International Conference on Data Mining

Active sampling (also called active learning or selective sampling) has been extensively researched for classification and rank learning methods, which is to select the most informative samples from unlabeled data such that, once the samples are labeled,...
Read More
An active learning-based SVM multi-class classification model

Traditional multi-class classification models are based on labeled data and are not applicable to unlabeled data. To overcome this limitation, this paper presents a multi-class classification model that is based on active learning and support vector ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
August 2005
844 pages
ISBN:159593135X
DOI:10.1145/1081870
General Chair:
Robert Grossman
University of Illinois at Chicago & Open Data Partners, USA
,
Program Chairs:
Roberto Bayardo
IBM Almaden Research, USA
,
Kristin Bennett
RPI, USA
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
active learning
ranking
selective sampling
support vector machine
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 80
  Total Citations
  View Citations
- 1,260
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SVM selective sampling for ranking with application to data retrieval

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Compression-Based Selective Sampling for Learning to Rank

Passive Sampling for Regression

An active learning-based SVM multi-class classification model