skip to main content
10.1145/3018661.3018741acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification

Published: 02 February 2017 Publication History

Abstract

Extreme multi-label classification refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. Datasets in extreme classification exhibit fit to power-law distribution, i.e. a large fraction of labels have very few positive instances in the data distribution. Most state-of-the-art approaches for extreme multi-label classification attempt to capture correlation among labels by embedding the label matrix to a low-dimensional linear sub-space. However, in the presence of power-law distributed extremely large and diverse label spaces, structural assumptions such as low rank can be easily violated.
In this work, we present DiSMEC, which is a large-scale distributed framework for learning one-versus-rest linear classifiers coupled with explicit capacity control to control model size. Unlike most state-of-the-art methods, DiSMEC does not make any low rank assumptions on the label matrix. Using double layer of parallelization, DiSMEC can learn classifiers for datasets consisting hundreds of thousands labels within few hours. The explicit capacity control mechanism filters out spurious parameters which keep the model compact in size, without losing prediction accuracy. We conduct extensive empirical evaluation on publicly available real-world datasets consisting upto 670,000 labels. We compare DiSMEC with recent state-of-the-art approaches, including - SLEEC which is a leading approach for learning sparse local embeddings, and FastXML which is a tree-based approach optimizing ranking based loss function. On some of the datasets, DiSMEC can significantly boost prediction accuracies - 10% better compared to SLECC and 15% better compared to FastXML, in absolute terms.

References

[1]
R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In Proceedings of the International World Wide Web Conference, May 2013.
[2]
R. Babbar, C. Metzig, I. Partalas, E. Gaussier, and M.-R. Amini. On power law distributions in large-scale taxonomies. ACM SIGKDD Explorations Newsletter, 16(1):47--56, 2014.
[3]
R. Babbar, K. Muandet, and B. Schölkopf. Tersesvm : A scalable approach for learning compact models in large-scale classification. In SIAM International Conference on Data Mining (SDM 2016), 2016.
[4]
R. Babbar, I. Partalas, E. Gaussier, and M.-R. Amini. Re-ranking approach to classification in large-scale power-law distributed category systems. In ACM SIGIR, 2014.
[5]
R. Babbar, I. Partalas, E. Gaussier, M.-R. Amini, and C. Amblard. Learning taxonomy adaptation in large-scale classification. Journal of Machine Learning Research, 17(98):1--37, 2016.
[6]
R. Babbar, I. Partalas, C. Metzig, E. Gaussier, and M.-r. Amini. Comparative classifier evaluation for web-scale taxonomies using power law. In Extended Semantic Web Conference, pages 310--311. Springer, 2013.
[7]
S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In Neural Information Processing Systems, pages 163--171, 2010.
[8]
K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain. Sparse local embeddings for extreme multi-label classification. In Advances in Neural Information Processing Systems, pages 730--738, 2015.
[9]
M. M. Cisse, N. Usunier, T. Artieres, and P. Gallinari. Robust bloom filters for large multilabel classification tasks. In Advances in Neural Information Processing Systems, pages 1851--1859, 2013.
[10]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9:1871--1874, 2008.
[11]
S. Gopal and Y. Yang. Recursive regularization for large-scale classification with hierarchical and graphical dependencies. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 257--265. ACM, 2013.
[12]
S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems, pages 1135--1143, 2015.
[13]
D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing. In Advances in neural information processing systems, 2009.
[14]
H. Jain, Y. Prabhu, and M. Varma. Extreme multi-label loss functions for recommendation, tagging, ranking and other missing label applications. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2016.
[15]
K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx, and E. Hüllermeier. Extreme f-measure maximization using sparse probability estimates. In Proceedings of the 33nd International Conference on Machine Learning, pages 1435--1444.
[16]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.
[17]
Z. Lin, G. Ding, M. Hu, and J. Wang. Multi-label classification via feature-aware implicit label space encoding. pages 325--333, 2014.
[18]
J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems, pages 165--172. ACM, 2013.
[19]
I. Partalas, A. Kosmopoulos, N. Baskiotis, T. Artieres, G. Paliouras, E. Gaussier, I. Androutsopoulos, M.-R. Amini, and P. Galinari. Lshtc: A benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581, 2015.
[20]
Y. Prabhu and M. Varma. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the ACM SIGKDD, August 2014.
[21]
Y. Prabhu and M. Varma. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 263--272. ACM, 2014.
[22]
R. Rifkin and A. Klautau. In defense of one-vs-all classification. Journal of machine learning research, 5(Jan):101--141, 2004.
[23]
A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
[24]
F. Tai and H.-T. Lin. Multilabel classification with principal label space transformation. Neural Computation, pages 2508--2542, 2012.
[25]
J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. 2011.
[26]
J. Weston, A. Makadia, and H. Yee. Label partitioning for sublinear ranking. In Proceedings of The 30th International Conference on Machine Learning, pages 181--189, 2013.
[27]
R. Wetzker, C. Zimmermann, and C. Bauckhage. Analyzing social bookmarking systems: A del. icio. us cookbook.
[28]
C. Xu, D. Tao, and C. Xu. Robust extreme multi-label learning. In Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining, 2016.
[29]
I. E. Yen, X. Huang, P. Ravikumar, K. Zhong, and I. S. Dhillon. Pd-sparse : A primal and dual sparse approach to extreme multiclass and multilabel classification. In Proceedings of the 33nd International Conference on Machine Learning, 2016.
[30]
H.-F. Yu, P. Jain, P. Kar, and I. Dhillon. Large-scale multi-label learning with missing labels. In Proceedings of The 31st International Conference on Machine Learning, pages 593--601, 2014.
[31]
G.-X. Yuan, C.-H. Ho, and C.-J. Lin. Recent advances of large-scale linear classification. Proceedings of the IEEE, 100(9):2584--2603, 2012.

Cited By

View all
  • (2025)Multi-Head Encoding for Extreme Label ClassificationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.352229847:3(2199-2211)Online publication date: Mar-2025
  • (2024)OAKProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693537(36012-36028)Online publication date: 21-Jul-2024
  • (2024)A general online algorithm for optimizing complex performance metricsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693087(25396-25425)Online publication date: 21-Jul-2024
  • Show More Cited By

Index Terms

  1. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
    February 2017
    868 pages
    ISBN:9781450346757
    DOI:10.1145/3018661
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 February 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. extreme classification
    2. large-scale classification
    3. multi-label learning

    Qualifiers

    • Research-article

    Conference

    WSDM 2017

    Acceptance Rates

    WSDM '17 Paper Acceptance Rate 80 of 505 submissions, 16%;
    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)50
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Multi-Head Encoding for Extreme Label ClassificationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.352229847:3(2199-2211)Online publication date: Mar-2025
    • (2024)OAKProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693537(36012-36028)Online publication date: 21-Jul-2024
    • (2024)A general online algorithm for optimizing complex performance metricsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693087(25396-25425)Online publication date: 21-Jul-2024
    • (2024)Adaptive Taxonomy Learning and Historical Patterns Modeling for Patent ClassificationACM Transactions on Information Systems10.1145/367483442:6(1-24)Online publication date: 18-Oct-2024
    • (2024)On the Value of Head Labels in Multi-Label Text ClassificationACM Transactions on Knowledge Discovery from Data10.1145/364385318:5(1-21)Online publication date: 26-Mar-2024
    • (2024)Follow the Path: Hierarchy-Aware Extreme Multi-Label Completion for Semantic Text TaggingProceedings of the ACM Web Conference 202410.1145/3589334.3645558(2094-2105)Online publication date: 13-May-2024
    • (2024)BoostXML: Gradient Boosting for Extreme Multilabel Text Classification With Tail LabelsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328529435:11(15292-15305)Online publication date: Nov-2024
    • (2024)MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337475036:9(4781-4793)Online publication date: Sep-2024
    • (2024)Learning Label-Adaptive Representation for Large-Scale Multi-Label Text ClassificationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.339372232(2630-2640)Online publication date: 1-Jan-2024
    • (2024)Leveraging Pre-Trained Extreme Multi-Label Classifiers for Zero-Shot Learning2024 11th IEEE Swiss Conference on Data Science (SDS)10.1109/SDS60720.2024.00041(233-236)Online publication date: 30-May-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media