short-paper

Long-tail Mixup for Extreme Multi-label Classification

Authors:
Sangwoo Han

Algorima Inc., Seoul, Republic of Korea

Algorima Inc., Seoul, Republic of Korea
View Profile

,
Eunseong Choi

Sungkyunkwan University, Suwon, Republic of Korea

Sungkyunkwan University, Suwon, Republic of Korea
View Profile

,
Chan Lim

Sungkyunkwan University, Suwon, Republic of Korea

Sungkyunkwan University, Suwon, Republic of Korea
View Profile

,
Hyunjung Shim

Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea

Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
View Profile

,
Jongwuk Lee

Sungkyunkwan University, Suwon, Republic of Korea

Sungkyunkwan University, Suwon, Republic of Korea
View Profile

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementOctober 2022Pages 3998–4002https://doi.org/10.1145/3511808.3557632

Published:17 October 2022Publication History

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Pages 3998–4002

ABSTRACT

Extreme multi-label classification (XMC) aims at finding multiple relevant labels for a given sample from a huge label set at the industrial scale. The XMC problem inherently poses two challenges: scalability and label sparsity - the number of labels is too large, and labels follow the long-tail distribution. To resolve these problems, we propose a novel Mixup-based augmentation method for long-tail labels, called TailMix. Building upon the partition-based model, TailMix utilizes the context vectors generated from the label attention layer. It first selectively chooses two context vectors using the inverse propensity score of labels and the label proximity graph representing the co-occurrence of labels. Using two context vectors, it augments new samples with the long-tail label to improve the accuracy of long-tail labels. Despite its simplicity, experimental results show that TailMix consistently outperforms other augmentation methods on three benchmark datasets, especially for long-tail labels in terms of two metrics, PSP@k and PSN@k.

References

Rohit Babbar and Bernhard Schö lkopf. 2017. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM.Google Scholar
Rohit Babbar and Bernhard Schö lkopf. 2019. Data scarcity, robustness and extreme multi-label classification. Mach. Learn., Vol. 108, 8--9 (2019), 1329--1351.Google ScholarDigital Library
K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma. 2016. The extreme classification repository: Multi-label datasets and code.Google Scholar
Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification. In NeurIPS.Google Scholar
Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S. Dhillon. 2020. Taming Pretrained Transformers for Extreme Multi-label Text Classification. In SIGKDD.Google Scholar
Jiaao Chen, Zichao Yang, and Diyi Yang. 2020. MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification. In ACL.Google Scholar
Hongyu Guo, Yongyi Mao, and Richong Zhang. 2019. Augmenting Data with Mixup for Sentence Classification: An Empirical Study. arXiv preprint arXiv:1905.08941 (2019). arxiv: 1905.08941Google Scholar
Himanshu Jain, Venkatesh Balasubramanian, Bhanu Chunduri, and Manik Varma. 2019. Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches. In WSDM.Google ScholarDigital Library
Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme Multi-Label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications. In SIGKDD.Google Scholar
Kalina Jasinska, Krzysztof Dembczynski, Robert Busa-Fekete, Karlson Pfannschmidt, Timo Klerx, and Eyke Hullermeier. 2016. Extreme F-measure Maximization using Sparse Probability Estimates. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). New York, New York, USA, 1435--1444.Google Scholar
Sujay Khandagale, Han Xiao, and Rohit Babbar. 2020. Bonsai: diverse and shallow trees for extreme multi-label classification. Mach. Learn., Vol. 109, 11 (2020), 2099--2119.Google ScholarDigital Library
Julian J. McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In RecSys.Google Scholar
Eneldo Loza Menc'i a and Johannes Fü rnkranz. 2008. Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain. In ECML/PKDD.Google Scholar
Yashoteja Prabhu, Anil Kag, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018. Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising. In WWW.Google ScholarDigital Library
Yashoteja Prabhu and Manik Varma. 2014. FastXML: A Fast, Accurate and Stable Tree-Classifier for Extreme Multi-Label Learning. In SIGKDD.Google Scholar
Lichao Sun, Congying Xia, Wenpeng Yin, Tingting Liang, Philip S. Yu, and Lifang He. 2020. Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks. In COLING.Google Scholar
Yukihiro Tagami. 2017. AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-Label Classification. In SIGKDD.Google Scholar
Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, David Lopez-Paz, and Yoshua Bengio. 2019. Manifold Mixup: Better Representations by Interpolating Hidden States. In ICML.Google Scholar
Ian En-Hsu Yen, Xiangru Huang, Wei Dai, Pradeep Ravikumar, Inderjit S. Dhillon, and Eric P. Xing. 2017. PPDsparse: A Parallel Primal-Dual Sparse Method for Extreme Classification. In SIGKDD.Google Scholar
Ian En-Hsu Yen, Xiangru Huang, Pradeep Ravikumar, Kai Zhong, and Inderjit S. Dhillon. 2016. PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification. In ICML.Google Scholar
Soyoung Yoon, Gyuwan Kim, and Kyumin Park. 2021. SSMix: Saliency-Based Span Mixup for Text Classification. In ACL/IJCNLP.Google Scholar
Ronghui You, Zihan Zhang, Ziye Wang, Suyang Dai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2019. AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification. In NeurIPS.Google Scholar
Danqing Zhang, Tao Li, Haiyang Zhang, and Bing Yin. 2020. On Data Augmentation for Extreme Multi-label Classification. arXiv preprint arXiv:2009.10778 (2020).Google Scholar
Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: Beyond Empirical Risk Minimization. In ICLR.Google Scholar
Arkaitz Zubiaga. 2012. Enhancing Navigation on Wikipedia with Social Tags. arXiv preprint arXiv:1202.5469 (2012).Google Scholar

Index Terms

Long-tail Mixup for Extreme Multi-label Classification
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks

Recommendations

Data scarcity, robustness and extreme multi-label classification
Abstract
The goal in extreme multi-label classification (XMC) is to learn a classifier which can assign a small subset of relevant labels to an instance from an extremely large set of target labels. The distribution of training instances among labels in ...
Read More
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Read More
ECLARE: Extreme Classification with Label Graph Correlations
WWW '21: Proceedings of the Web Conference 2021

Deep extreme classification (XC) seeks to train deep architectures that can tag a data point with its most relevant subset of labels from an extremely large label set. The core utility of XC comes from predicting labels that are rarely seen during ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
October 2022
5274 pages
ISBN:9781450392365
DOI:10.1145/3511808
General Chairs:
Mohammad Al Hasan
Indiana University Purdue University, Indianapolis, USA
,
Li Xiong
Emory University, Atlanta, USA
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data augmentation
extreme multi-label classification
neural networks
Qualifiers
- short-paper
Conference

Acceptance Rates
CIKM '22 Paper Acceptance Rate621of2,257submissions,28%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 356
  Total Downloads
- Downloads (Last 12 months)158
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Long-tail Mixup for Extreme Multi-label Classification

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Data scarcity, robustness and extreme multi-label classification

Semi-supervised multi-label classification using incomplete label information

ECLARE: Extreme Classification with Label Graph Correlations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Long-tail Mixup for Extreme Multi-label Classification

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Data scarcity, robustness and extreme multi-label classification

Semi-supervised multi-label classification using incomplete label information

ECLARE: Extreme Classification with Label Graph Correlations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media