research-article

Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels

Authors:

Mohammadreza Qaraei,

Erik Schultheis,

Priyanshu Gupta,

Rohit BabbarAuthors Info & Claims

WWW '21: Proceedings of the Web Conference 2021

Pages 3711 - 3720

https://doi.org/10.1145/3442381.3450139

Published: 03 June 2021 Publication History

Get Access

Abstract

Extreme Classification (XC) refers to supervised learning where each training/test instance is labeled with small subset of relevant labels that are chosen from a large set of possible target labels. The framework of XC has been widely employed in web applications such as automatic labeling of web-encyclopedia, prediction of related searches, and recommendation systems.

While most state-of-the-art models in XC achieve high overall accuracy by performing well on the frequently occurring labels, they perform poorly on a large number of infrequent (tail) labels. This arises from two statistical challenges, (i) missing labels, as it is virtually impossible to manually assign every relevant label to an instance, and (ii) highly imbalanced data distribution where a large fraction of labels are tail labels. In this work, we consider common loss functions that decompose over labels, and calculate unbiased estimates that compensate missing labels according to Natarajan et al. [26]. This turns out to be disadvantageous from an optimization perspective, as important properties such as convexity and lower-boundedness are lost. To circumvent this problem, we use the fact that typical loss functions in XC are convex surrogates of the 0-1 loss, and thus propose to switch to convex surrogates of its unbiased version. These surrogates are further adapted to the label imbalance by combining with label-frequency-based rebalancing.

We show that the proposed loss functions can be easily incorporated into various different frameworks for extreme classification. This includes (i) linear classifiers, such as DiSMEC, on sparse input data representation, (ii) attention-based deep architecture, AttentionXML, learnt on dense Glove embeddings, and (iii) XLNet-based transformer model for extreme classification, APLC-XLNet. Our results demonstrate consistent improvements over the respective vanilla baseline models, on the propensity-scored metrics for precision and nDCG.

References

[1]

Rahul Agrawal, Archit Gupta, Yashoteja Prabhu, and Manik Varma. 2013. Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages. In Proceedings of the 22nd International Conference on World Wide Web (Rio de Janeiro, Brazil) (WWW ’13). Association for Computing Machinery, New York, NY, USA, 13–24. https://doi.org/10.1145/2488388.2488391

Abstract

References

Cited By

Recommendations

On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification

Confidence-based Weighted Loss for Multi-label Classification with Missing Labels

Improving multi-label classification with missing labels by learning label-specific features

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

HTML Format

Share

Share this Publication link

Share on social media

Affiliations