research-article

Efficient feature weighting methods for ranking

Authors:
Hwanjo Yu

POSTECH, Pohang, South Korea

POSTECH, Pohang, South Korea
View Profile

,
Jinoh Oh

POSTECH, Pohang, South Korea

POSTECH, Pohang, South Korea
View Profile

,
Wook-Shin Han

Kyungbuk National University, Daegu, South Korea

Kyungbuk National University, Daegu, South Korea
View Profile

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementNovember 2009Pages 1157–1166https://doi.org/10.1145/1645953.1646100

Published:02 November 2009Publication History

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Pages 1157–1166

ABSTRACT

Feature weighting or selection is a crucial process to identify an important subset of features from a data set. Removing irrelevant or redundant features can improve the generalization performance of ranking functions in information retrieval. Due to fundamental differences between classification and ranking, feature weighting methods developed for classification cannot be readily applied to feature weighting for ranking. A state of the art feature selection method for ranking, called GAS, has been recently proposed, which exploits importance of each feature and similarity between every pair of features. However, GAS must compute the similarity scores of all pairs of features, thus it is not scalable for high-dimensional data and its performance degrades on nonlinear ranking functions. This paper proposes novel algorithms, RankWrapper and RankFilter, which is scalable for high-dimensional data and also performs reasonably well on nonlinear ranking functions. RankWrapper and RankFilter are designed based on the key idea of Relief algorithm. Relief is a feature selection algorithm for classification, which exploits the notions of hits (data points within the same class) and misses (data points from different classes) for classification. However, there is no such notion of hits or misses in ranking. The proposed algorithms instead utilize the ranking distances of nearest data points in order to identify the key features for ranking. Our extensive experiments show that RankWrapper and RankFilter generate higher accuracy overall than the GAS and traditional Relief algorithms adapted for ranking, and run substantially faster than the GAS on high dimensional data.

References

Letor: Learning to rank for information retrieval. http://research.microsoft.com/users/LETOR/.Google Scholar
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proc. Int. Conf. Machine Learning (ICML'04), 2004. Google ScholarDigital Library
B. Cao, D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. Feature selection in a kernel space. In Proc. Int. Conf. Machine Learning (ICML'07), 2007. Google ScholarDigital Library
Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking svm to document retrieval. In Proc. ACM SIGIR Int. Conf. Information Retrieval (SIGIR'06), 2006. Google ScholarDigital Library
M. Dash and H. Liu. Feature selection for classification. Intelligent Data Analysis, 1997.Google Scholar
J. Elsas, V. Carvalho, and J. Carbonell. Fast learning of document ranking functions with the committee perceptron. In Proc. Int. Conf. Web Search and Web Data Mining, 2008. Google ScholarDigital Library
G. Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 2003. Google ScholarDigital Library
Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 2003. Google ScholarDigital Library
X. Geng, T. Liu, T. Qin, and H. Li. Feature selection for ranking. In Proc. ACM SIGIR Int. Conf. Information Retrieval (SIGIR'07), 2007. Google ScholarDigital Library
I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003. Google ScholarDigital Library
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3):389--422, 2002. Google ScholarDigital Library
M. A. Hall and G. Holmes. Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering, 2003. Google ScholarDigital Library
R. Herbrich, T. Graepel, and K. Obermayer, editors. Large margin rank boundaries for ordinal regression. MIT-Press, 2000.Google Scholar
W. Hersh, C. Buckley, T. Leone, and D. Hickam. OHSUMED: An interactive retrieval evaluation and new large test collection for research. In Proc. ACM SIGIR Int. Conf. Information Retrieval (SIGIR'94), 1994. Google ScholarDigital Library
T. Joachims. Optimizing search engines using clickthrough data. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'02), 2002. Google ScholarDigital Library
K. Kira and L. A. Rendell. A practical approach to feature selection. In Proc. Int. Conf. Machine Learning (ICML'92), 1992. Google ScholarDigital Library
I. Kononenko. Estimating attributes: analysis and extensions of relief. In Proc. Euro. Conf. Machine Learning (ECML'92), 1994. Google ScholarDigital Library
D. Mladenic and M. Grobelnik. Feature selection for unbalanced class distribution and naive bayes. In Proc. Int. Conf. Machine Learning (ICML'99), 1999. Google ScholarDigital Library
T. Qin, T.-Y. Liu, W. Lai, X.-D. Zhang, D.--S. Wang, and H. Li. Ranking with multiple hyperplanes. In Proc. ACM SIGIR Int. Conf. Information Retrieval (SIGIR'07), 2007. Google ScholarDigital Library
M. R.-Sikonja and I. Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Machine Learning, 53:23--69, 2003. Google ScholarDigital Library
F. Radlinski and T. Joachims. Query chains: Learning to rank from implicit feedback. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'05), 2005. Google ScholarDigital Library
Y. Sun and J. Li. Iterative RELIEF for feature weighting. In Proc. Int. Conf. Machine Learning (ICML'06), 2006. Google ScholarDigital Library
J. Xu and H. Li. Adarank: A boosting algorithm for information retrieval. In Proc. ACM SIGIR Int. Conf. Information Retrieval (SIGIR'07), 2007. Google ScholarDigital Library
M. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In Proc. Int. Conf. Machine Learning (ICML'97), 1997. Google ScholarDigital Library
H. Yu. SVM selective sampling for ranking with application to data retrieval. In Proc. Int. Conf. Knowledge Discovery and Data Mining (KDD'05), 2005. Google ScholarDigital Library
H. Yu, S.-W. Hwang, and K. C.-C. Chang. Enabling soft queries for data retrieval. Information Systems, 2007. Google ScholarDigital Library

Index Terms

Efficient feature weighting methods for ranking
1. Information systems

Recommendations

Feature selection for ranking
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Ranking is a very important topic in information retrieval. While algorithms for learning ranking models have been intensively studied, this is not the case for feature selection, despite of its importance. The reality is that many feature selection ...
Read More
Analysis of feature weighting methods based on feature ranking methods for classification
ICONIP'11: Proceedings of the 18th international conference on Neural Information Processing - Volume Part II

We propose and analyze new fast feature weighting algorithms based on different types of feature ranking. Feature weighting may be much faster than feature selection because there is no need to find cut-threshold in the raking. Presented weighting ...
Read More
Hybridization of feature selection and feature weighting for high dimensional data

The classification of high dimensional data is a challenging problem due to the presence of redundant and irrelevant features in a higher amount. These unwanted features degrade accuracy and increase the computational complexity of machine learning ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
November 2009
2162 pages
ISBN:9781605585123
DOI:10.1145/1645953
General Chairs:
David Cheung
University of Hong Kong, Hong Kong
,
Il-Yeol Song
Drexel University, USA
,
Program Chairs:
Wesley Chu
UCLA, USA
,
Xiaohua Hu
Drexel University, USA
,
Jimmy Lin
University of Maryland, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
feature weighting
rank leaning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 599
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient feature weighting methods for ranking

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Feature selection for ranking

Analysis of feature weighting methods based on feature ranking methods for classification

Hybridization of feature selection and feature weighting for high dimensional data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient feature weighting methods for ranking

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Feature selection for ranking

Analysis of feature weighting methods based on feature ranking methods for classification

Hybridization of feature selection and feature weighting for high dimensional data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media