research-article

DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification

Authors:

Bernhard SchölkopfAuthors Info & Claims

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

Pages 721 - 729

https://doi.org/10.1145/3018661.3018741

Published: 02 February 2017 Publication History

Abstract

Extreme multi-label classification refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. Datasets in extreme classification exhibit fit to power-law distribution, i.e. a large fraction of labels have very few positive instances in the data distribution. Most state-of-the-art approaches for extreme multi-label classification attempt to capture correlation among labels by embedding the label matrix to a low-dimensional linear sub-space. However, in the presence of power-law distributed extremely large and diverse label spaces, structural assumptions such as low rank can be easily violated.

In this work, we present DiSMEC, which is a large-scale distributed framework for learning one-versus-rest linear classifiers coupled with explicit capacity control to control model size. Unlike most state-of-the-art methods, DiSMEC does not make any low rank assumptions on the label matrix. Using double layer of parallelization, DiSMEC can learn classifiers for datasets consisting hundreds of thousands labels within few hours. The explicit capacity control mechanism filters out spurious parameters which keep the model compact in size, without losing prediction accuracy. We conduct extensive empirical evaluation on publicly available real-world datasets consisting upto 670,000 labels. We compare DiSMEC with recent state-of-the-art approaches, including - SLEEC which is a leading approach for learning sparse local embeddings, and FastXML which is a tree-based approach optimizing ranking based loss function. On some of the datasets, DiSMEC can significantly boost prediction accuracies - 10% better compared to SLECC and 15% better compared to FastXML, in absolute terms.

References

[1]

R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In Proceedings of the International World Wide Web Conference, May 2013.

Digital Library

[2]

R. Babbar, C. Metzig, I. Partalas, E. Gaussier, and M.-R. Amini. On power law distributions in large-scale taxonomies. ACM SIGKDD Explorations Newsletter, 16(1):47--56, 2014.

Digital Library

[3]

R. Babbar, K. Muandet, and B. Schölkopf. Tersesvm : A scalable approach for learning compact models in large-scale classification. In SIAM International Conference on Data Mining (SDM 2016), 2016.

[4]

R. Babbar, I. Partalas, E. Gaussier, and M.-R. Amini. Re-ranking approach to classification in large-scale power-law distributed category systems. In ACM SIGIR, 2014.

Digital Library

[5]

R. Babbar, I. Partalas, E. Gaussier, M.-R. Amini, and C. Amblard. Learning taxonomy adaptation in large-scale classification. Journal of Machine Learning Research, 17(98):1--37, 2016.

[6]

R. Babbar, I. Partalas, C. Metzig, E. Gaussier, and M.-r. Amini. Comparative classifier evaluation for web-scale taxonomies using power law. In Extended Semantic Web Conference, pages 310--311. Springer, 2013.

[7]

S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In Neural Information Processing Systems, pages 163--171, 2010.

[8]

K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain. Sparse local embeddings for extreme multi-label classification. In Advances in Neural Information Processing Systems, pages 730--738, 2015.

Digital Library

[9]

M. M. Cisse, N. Usunier, T. Artieres, and P. Gallinari. Robust bloom filters for large multilabel classification tasks. In Advances in Neural Information Processing Systems, pages 1851--1859, 2013.

Digital Library

[10]

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9:1871--1874, 2008.

Digital Library

[11]

S. Gopal and Y. Yang. Recursive regularization for large-scale classification with hierarchical and graphical dependencies. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 257--265. ACM, 2013.

Digital Library

[12]

S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems, pages 1135--1143, 2015.

Digital Library

[13]

D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing. In Advances in neural information processing systems, 2009.

[14]

H. Jain, Y. Prabhu, and M. Varma. Extreme multi-label loss functions for recommendation, tagging, ranking and other missing label applications. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2016.

Digital Library

[15]

K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx, and E. Hüllermeier. Extreme f-measure maximization using sparse probability estimates. In Proceedings of the 33nd International Conference on Machine Learning, pages 1435--1444.

[16]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.

Digital Library

[17]

Z. Lin, G. Ding, M. Hu, and J. Wang. Multi-label classification via feature-aware implicit label space encoding. pages 325--333, 2014.

[18]

J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems, pages 165--172. ACM, 2013.

Digital Library

[19]

I. Partalas, A. Kosmopoulos, N. Baskiotis, T. Artieres, G. Paliouras, E. Gaussier, I. Androutsopoulos, M.-R. Amini, and P. Galinari. Lshtc: A benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581, 2015.

[20]

Y. Prabhu and M. Varma. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the ACM SIGKDD, August 2014.

Digital Library

[21]

Y. Prabhu and M. Varma. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 263--272. ACM, 2014.

Digital Library

[22]

R. Rifkin and A. Klautau. In defense of one-vs-all classification. Journal of machine learning research, 5(Jan):101--141, 2004.

[23]

A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.

[24]

F. Tai and H.-T. Lin. Multilabel classification with principal label space transformation. Neural Computation, pages 2508--2542, 2012.

Digital Library

[25]

J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. 2011.

[26]

J. Weston, A. Makadia, and H. Yee. Label partitioning for sublinear ranking. In Proceedings of The 30th International Conference on Machine Learning, pages 181--189, 2013.

Digital Library

[27]

R. Wetzker, C. Zimmermann, and C. Bauckhage. Analyzing social bookmarking systems: A del. icio. us cookbook.

[28]

C. Xu, D. Tao, and C. Xu. Robust extreme multi-label learning. In Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining, 2016.

Digital Library

[29]

I. E. Yen, X. Huang, P. Ravikumar, K. Zhong, and I. S. Dhillon. Pd-sparse : A primal and dual sparse approach to extreme multiclass and multilabel classification. In Proceedings of the 33nd International Conference on Machine Learning, 2016.

[30]

H.-F. Yu, P. Jain, P. Kar, and I. Dhillon. Large-scale multi-label learning with missing labels. In Proceedings of The 31st International Conference on Machine Learning, pages 593--601, 2014.

Digital Library

[31]

G.-X. Yuan, C.-H. Ho, and C.-J. Lin. Recent advances of large-scale linear classification. Proceedings of the IEEE, 100(9):2584--2603, 2012.

Cited By

Liang DZhang HYuan DZhang M(2025)Multi-Head Encoding for Extreme Label ClassificationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.352229847:3(2199-2211)Online publication date: Mar-2025
https://doi.org/10.1109/TPAMI.2024.3522298
Mohan SSaini DMittal AChowdhury SPaliwal BJiao JGupta MVarma MSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)OAKProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693537(36012-36028)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693537
Kotłlowski WWydmuch MSchultheis EBabbar RDembczyński KSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)A general online algorithm for optimizing complex performance metricsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693087(25396-25425)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693087
Show More Cited By

Index Terms

DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification

Recommendations

Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

The choice of the loss function is critical in extreme multi-label learning where the objective is to annotate each data point with the most relevant subset of labels from an extremely large label set. Unfortunately, existing loss functions, such as the ...
FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

The objective in extreme multi-label classification is to learn a classifier that can automatically tag a data point with the most relevant subset of labels from a large label set. Extreme multi-label classification is an important research problem ...
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

February 2017

868 pages

ISBN:9781450346757

DOI:10.1145/3018661

General Chairs:
Maarten de Rijke
University of Amsterdam
,
Milad Shokouhi
Microsoft
,
Program Chairs:
Andrew Tomkins
Google
,
Min Zhang
Tsinghua University

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM 2017

Sponsor:

WSDM 2017: Tenth ACM International Conference on Web Search and Data Mining

February 6 - 10, 2017

Cambridge, United Kingdom

Acceptance Rates

WSDM '17 Paper Acceptance Rate 80 of 505 submissions, 16%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

132
Total Citations
View Citations
963
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)6

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liang DZhang HYuan DZhang M(2025)Multi-Head Encoding for Extreme Label ClassificationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.352229847:3(2199-2211)Online publication date: Mar-2025
https://doi.org/10.1109/TPAMI.2024.3522298
Mohan SSaini DMittal AChowdhury SPaliwal BJiao JGupta MVarma MSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)OAKProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693537(36012-36028)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693537
Kotłlowski WWydmuch MSchultheis EBabbar RDembczyński KSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)A general online algorithm for optimizing complex performance metricsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693087(25396-25425)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693087
Zou TYu LYe JSun LDu BWang D(2024)Adaptive Taxonomy Learning and Historical Patterns Modeling for Patent ClassificationACM Transactions on Information Systems10.1145/367483442:6(1-24)Online publication date: 18-Oct-2024
https://dl.acm.org/doi/10.1145/3674834
Wang HPeng CDong HFeng LLiu WHu TChen KChen G(2024)On the Value of Head Labels in Multi-Label Text ClassificationACM Transactions on Knowledge Discovery from Data10.1145/364385318:5(1-21)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3643853
Ostapuk NAudiffren JDolamic LMermoud ACudré-Mauroux PChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Follow the Path: Hierarchy-Aware Extreme Multi-Label Completion for Semantic Text TaggingProceedings of the ACM Web Conference 202410.1145/3589334.3645558(2094-2105)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645558
Li FZuo YLin HWu J(2024)BoostXML: Gradient Boosting for Extreme Multilabel Text Classification With Tail LabelsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328529435:11(15292-15305)Online publication date: Nov-2024
https://doi.org/10.1109/TNNLS.2023.3285294
Ye HSunderraman RJi S(2024)MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337475036:9(4781-4793)Online publication date: Sep-2024
https://doi.org/10.1109/TKDE.2024.3374750
Peng CWang HWang JShou LChen KChen GYao C(2024)Learning Label-Adaptive Representation for Large-Scale Multi-Label Text ClassificationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.339372232(2630-2640)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TASLP.2024.3393722
Ostapuk NDolamic LMermoud ACudré-Mauroux P(2024)Leveraging Pre-Trained Extreme Multi-Label Classifiers for Zero-Shot Learning2024 11th IEEE Swiss Conference on Data Science (SDS)10.1109/SDS60720.2024.00041(233-236)Online publication date: 30-May-2024
https://doi.org/10.1109/SDS60720.2024.00041
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten