skip to main content
10.1145/2063576.2063734acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Correlated multi-label feature selection

Published: 24 October 2011 Publication History

Abstract

Multi-label learning studies the problem where each instance is associated with a set of labels. There are two challenges in multi-label learning: (1) the labels are interdependent and correlated, and (2) the data are of high dimensionality. In this paper, we aim to tackle these challenges in one shot. In particular, we propose to learn the label correlation and do feature selection simultaneously. We introduce a matrix-variate Normal prior distribution on the weight vectors of the classifier to model the label correlation. Our goal is to find a subset of features, based on which the label correlation regularized loss of label ranking is minimized. The resulting multi-label feature selection problem is a mixed integer programming, which is reformulated as quadratically constrained linear programming (QCLP). It can be solved by cutting plane algorithm, in each iteration of which a minimax optimization problem is solved by dual coordinate descent and projected sub-gradient descent alternatively. Experiments on benchmark data sets illustrate that the proposed methods outperform single-label feature selection method and many other state-of-the-art multi-label learning methods.

References

[1]
M. R. Boutell, J. Luo, X. Shen, and C. M. Brown. Learning multi-label scene classification. Pattern Recognition, 37(9):1757--1771, 2004.
[2]
S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, Cambridge, 2004.
[3]
G. Chen, Y. Song, F. Wang, and C. Zhang. Semi-supervised multi-label learning by solving a sylvester equation. In SDM, pages 410--419, 2008.
[4]
G. Chen, J. Zhang, F. Wang, C. Zhang, and Y. Gao. Efficient multi-label classification with hypergraph regularization. In CVPR, pages 1658--1665, 2009.
[5]
J. Chen and J. Ye. Training svm with indefinite kernels. In ICML, pages 136--143, 2008.
[6]
O. Dekel, C. D. Manning, and Y. Singer. Log-linear models for label ranking. In NIPS, 2003.
[7]
A. Elisseeff and J. Weston. A kernel method for multi-labelled classification. In NIPS, pages 681--687, 2001.
[8]
Q. Gu, Z. Li, and J. Han. Generalized fisher score for feature selection. In Proceedings of the International Conference on Uncertainty in Artificial Intelligence, 2011.
[9]
Q. Gu and J. Zhou. Subspace maximum margin clustering. In CIKM, pages 1337--1346, 2009.
[10]
A. K. Gupta and D. K. Nagar. Matrix Variate Distributions, volume 104 of Monographs and Surveys in Pure and Applied Mathematics. Chapman Hall/CRC, Florida, 2000.
[11]
B. Hariharan, L. Zelnik-Manor, S. V. N. Vishwanathan, and M. Varma. Large scale max-margin multi-label classification with priors. In ICML, pages 423--430, 2010.
[12]
C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear svm. In ICML, pages 408--415, 2008.
[13]
S. Ji, L. Tang, S. Yu, and J. Ye. Extracting shared subspace for multi-label classification. In KDD, pages 381--389, 2008.
[14]
T. Joachims. Optimizing search engines using clickthrough data. In KDD, pages 133--142, 2002.
[15]
T. Joachims. Training linear svms in linear time. In KDD, pages 217--226, 2006.
[16]
T. Joachims, T. Finley, and C.-N. J. Yu. Cutting-plane training of structural svms. Machine Learning, 77(1):27--59, 2009.
[17]
J. E. Kelley. The cutting plane method for solving convex programs. Journal of the SIAM, 8:703--712, 1960.
[18]
S.-J. Kim and S. Boyd. A minimax theorem with applications to machine learning, signal processing, and finance. SIAM J. on Optimization, 19:1344--1367, November 2008.
[19]
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, 2004.
[20]
Y. Li, I. Tsang, J. Kwok, and Z. Zhou. Tighter and convex maximum margin clusteringe. In AISTATS, 2009.
[21]
P. E. H. R. O. Duda and D. G. Stork. Pattern Classification. Wiley-Interscience Publication, 2001.
[22]
A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. More efficiency in multiple kernel learning. In ICML, pages 775--782, 2007.
[23]
R. M. Rifkin and A. Klautau. In defense of one-vs-all classification. Journal of Machine Learning Research, 5:101--141, 2004.
[24]
M. Rogati and Y. Yang. High-performing feature selection for text classification. In CIKM, pages 659--661, 2002.
[25]
S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In ICML, pages 807--814, 2007.
[26]
S. Sonnenburg, G. Ratsch, C. Schafer, and B. Schölkopf. Large scale multiple kernel learning. Journal of Machine Learning Research, 7:1531--1565, 2006.
[27]
L. Sun, S. Ji, and J. Ye. Hypergraph spectral learning for multi-label classification. In KDD, pages 668--676, 2008.
[28]
M. Tan, L. Wang, and I. W. Tsang. Learning sparse svm for feature selection on very high dimensional datasets. In ICML, pages 1047--1054, 2010.
[29]
L. Tang, J. Chen, and J. Ye. On multiple kernel learning with multiple labels. In IJCAI, pages 1255--1260, 2009.
[30]
N. Ueda and K. Saito. Parametric mixture models for multi-labeled text. In NIPS, pages 721--728, 2002.
[31]
H. Wang, C. H. Q. Ding, and H. Huang. Multi-label linear discriminant analysis. In ECCV (6), pages 126--139, 2010.
[32]
Z. Xu, R. Jin, I. King, and M. R. Lyu. An extended level method for efficient multiple kernel learning. In NIPS, pages 1825--1832, 2008.
[33]
Y. Yang. An evaluation of statistical approaches to text categorization. Inf. Retr., 1(1--2):69--90, 1999.
[34]
Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In ICML, pages 412--420, 1997.
[35]
K. Yu, S. Yu, and V. Tresp. Multi-label informed latent semantic indexing. In SIGIR, pages 258--265, 2005.
[36]
M.-L. Zhang and K. Zhang. Multi-label learning by exploiting label dependency. In KDD, pages 999--1008, 2010.
[37]
Y. Zhang and D. yan Yeung. A convex formulation for learning task relationships in multi-task learning. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI), 2010.
[38]
Y. Zhang and D.-Y. Yeung. Transfer metric learning by learning task relationships. In KDD, pages 1199--1208, 2010.
[39]
Y. Zhang and Z.-H. Zhou. Multi-label dimensionality reduction via dependence maximization. In AAAI, pages 1503--1505, 2008.

Cited By

View all
  • (2024)Learning Accurate Label-Specific Features From Partially Multilabeled DataIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324192135:8(10436-10450)Online publication date: Aug-2024
  • (2024)Low-Rank Multilabel Learning Based on Nonlinear MappingIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33290795:6(2693-2707)Online publication date: Jun-2024
  • (2024)Learning correlation information for multi-label feature selectionPattern Recognition10.1016/j.patcog.2023.109899145(109899)Online publication date: Jan-2024
  • Show More Cited By

Index Terms

  1. Correlated multi-label feature selection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
    October 2011
    2712 pages
    ISBN:9781450307178
    DOI:10.1145/2063576
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 October 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cutting plane
    2. dual coordinate descent
    3. feature selection
    4. label correlation
    5. multi-label learning

    Qualifiers

    • Research-article

    Conference

    CIKM '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Learning Accurate Label-Specific Features From Partially Multilabeled DataIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324192135:8(10436-10450)Online publication date: Aug-2024
    • (2024)Low-Rank Multilabel Learning Based on Nonlinear MappingIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33290795:6(2693-2707)Online publication date: Jun-2024
    • (2024)Learning correlation information for multi-label feature selectionPattern Recognition10.1016/j.patcog.2023.109899145(109899)Online publication date: Jan-2024
    • (2024)Multi-label feature selection based on nonlinear mappingInformation Sciences10.1016/j.ins.2024.121168(121168)Online publication date: Jul-2024
    • (2024)Sparse semi-supervised multi-label feature selection based on latent representationComplex & Intelligent Systems10.1007/s40747-024-01439-710:4(5139-5151)Online publication date: 17-Apr-2024
    • (2024)Weakly supervised multi-label feature selection based on shared subspaceInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02426-7Online publication date: 11-Nov-2024
    • (2023)Online Multi-Label Streaming Feature Selection Based on Label Group Correlation and Feature InteractionEntropy10.3390/e2507107125:7(1071)Online publication date: 17-Jul-2023
    • (2023)Online Multi-Label Streaming Feature Selection With Label CorrelationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.311351435:3(2901-2915)Online publication date: 1-Mar-2023
    • (2023)MLNet: Enhancing Joint Predictive Modeling of Chronic Diseases Using Deep Learning2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM58861.2023.10385782(3165-3172)Online publication date: 5-Dec-2023
    • (2023)Multi-label feature selection via joint label enhancement and pairwise label correlationsInternational Journal of Machine Learning and Cybernetics10.1007/s13042-023-01874-x14:11(3943-3964)Online publication date: 1-Jul-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media