skip to main content
10.1145/3652583.3658056acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Learning from Reduced Labels for Long-Tailed Data

Published: 07 June 2024 Publication History

Abstract

Long-tailed data is prevalent in real-world classification tasks and heavily relies on supervised information, which makes the annotation process exceptionally labor-intensive and time-consuming. Unfortunately, despite being a common approach to mitigate labeling costs, existing weakly supervised learning methods struggle to adequately preserve supervised information for tail samples, resulting in a decline in accuracy for the tail classes. To alleviate this problem, we introduce a novel weakly supervised labeling setting called Reduced Label. The proposed labeling setting not only avoids the decline of supervised information for the tail samples, but also decreases the labeling costs associated with long-tailed data. Additionally, we propose an straightforward and highly efficient unbiased framework with strong theoretical guarantees to learn from these Reduced Labels. Extensive experiments conducted on benchmark datasets including ImageNet validate the effectiveness of our approach, surpassing the performance of state-of-the-art weakly supervised methods. Source code is available at \hrefhttps://github.com/WilsonMqz/LTRL https://github.com/WilsonMqz/LTRL

References

[1]
Shin Ando and Chun Yuan Huang. 2017. Deep over-sampling framework for classifying imbalanced data. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2017, Skopje, Macedonia, September 18-22, 2017, Proceedings, Part I 10. Springer, 770--785.
[2]
David Berthelot, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, and Colin Raffel. 2019. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. arXiv preprint arXiv:1911.09785 (2019).
[3]
David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. Mixmatch: A holistic approach to semi-supervised learning. Advances in neural information processing systems 32 (2019).
[4]
Mateusz Buda, Atsuto Maki, and Maciej A Mazurowski. 2018. A systematic study of the class imbalance problem in convolutional neural networks. Neural networks 106 (2018), 249--259.
[5]
Lei Cai, Jingyang Gao, and Di Zhao. 2020. A review of the application of deep learning in medical image classification and segmentation. Annals of translational medicine 8, 11 (2020).
[6]
Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. 2006. A discussion of semi-supervised learning and transduction. In Semi-supervised learning. 473--478.
[7]
Adam Coates, Andrew Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 215--223.
[8]
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In International Conference on Machine Learning. 160--167.
[9]
Timothee Cour, Ben Sapp, and Ben Taskar. 2011. Learning from partial labels. The Journal of Machine Learning Research 12 (2011), 1501--1536.
[10]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
[11]
Lei Feng, Jiaqi Lv, Bo Han, Miao Xu, Gang Niu, Xin Geng, Bo An, and Masashi Sugiyama. 2020. Provably consistent partial-label learning. Advances in neural information processing systems 33 (2020), 10948--10960.
[12]
Lan-Zhe Guo and Yu-Feng Li. 2022. Class-imbalanced semi-supervised learning with adaptive thresholding. In International Conference on Machine Learning. PMLR, 8082--8094.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 770--778.
[14]
Zhuoxun He, Lingxi Xie, Xin Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Data augmentation revisited: Rethinking the distribution gap between clean and augmented data. arXiv preprint arXiv:1909.09148 (2019).
[15]
Pavel Izmailov, Polina Kirichenko, Marc Finzi, and Andrew Gordon Wilson. 2020. Semi-supervised learning with normalizing flows. In International Conference on Machine Learning. 4615--4630.
[16]
Muhammad Abdullah Jamal, Matthew Brown, Ming-Hsuan Yang, Liqiang Wang, and Boqing Gong. 2020. Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7610--7619.
[17]
Ziyu Jiang, Tianlong Chen, Ting Chen, and Zhangyang Wang. 2021. Improving contrastive learning on imbalanced data via open-world sampling. Advances in neural information processing systems 34 (2021), 5997--6009.
[18]
Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. 2016. Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410 (2016).
[19]
Jaehyung Kim, Youngbum Hur, Sejun Park, Eunho Yang, Sung Ju Hwang, and Jinwoo Shin. 2020. Distribution aligning refinery of pseudo-label for imbalanced semi-supervised learning. Advances in neural information processing systems 33 (2020), 14567--14579.
[20]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. Technique Report (2009).
[21]
Hyuck Lee, Seungjae Shin, and Heeyoung Kim. 2021. Abc: Auxiliary balanced classifier for class-imbalanced semi-supervised learning. Advances in neural information processing systems 34 (2021), 7082--7094.
[22]
Junnan Li, Caiming Xiong, and Steven CH Hoi. 2021. Comatch: Semi-supervised learning with contrastive graph regularization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9475--9484.
[23]
Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X Yu. 2019. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2537--2546.
[24]
Thomas Lucas, Philippe Weinzaepfel, and Gregory Rogez. 2022. Barely-supervised learning: semi-supervised learning with very few labeled images. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 1881--1889.
[25]
Jiaqi Lv, Miao Xu, Lei Feng, Gang Niu, Xin Geng, and Masashi Sugiyama. 2020. Progressive identification of true labels for partial-label learning. In International Conference on Machine Learning. 6500--6510.
[26]
Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, and Laurens Van Der Maaten. 2018. Exploring the limits of weakly supervised pretraining. In Proceedings of the European conference on computer vision (ECCV). 181--196.
[27]
Colin McDiarmid et al. 1989. On the method of bounded differences. Surveys in combinatorics 141, 1 (1989), 148--188.
[28]
Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. 2018. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence 41, 8 (2018), 1979--1993.
[29]
Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. 2018. Foundations of machine learning.
[30]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning. (2011).
[31]
Youngtaek Oh, Dong-Jin Kim, and In So Kweon. 2022. Daso: Distribution-aware semantics-oriented pseudo-label for imbalanced semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9786--9796.
[32]
Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. 2020. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems 33 (2020), 596--608.
[33]
Cai-Zhi Tang and Min-Ling Zhang. 2017. Confidence-rated discriminative partial label learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31. 2611--2617.
[34]
Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems 30 (2017), 1195--1204.
[35]
Antonio Torralba, Rob Fergus, and William T Freeman. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE transactions on pattern analysis and machine intelligence 30, 11 (2008), 1958--1970.
[36]
Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, David Lopez-Paz, and Yoshua Bengio. 2019. Manifold mixup: Better representations by interpolating hidden states. In International Conference on Machine Learning. PMLR, 6438--6447.
[37]
Tong Wei and Kai Gan. 2023. Towards Realistic Long-Tailed Semi-Supervised Learning: Consistency Is All You Need. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3469--3478.
[38]
Dong-Dong Wu, Deng-Bao Wang, and Min-Ling Zhang. 2022. Revisiting consistency regularization for deep partial label learning. In International Conference on Machine Learning. PMLR, 24212--24225.
[39]
Jing-Han Wu and Min-Ling Zhang. 2019. Disambiguation enabled linear discriminant analysis for partial label dimensionality reduction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 416--424.
[40]
Ming-Kun Xie and Sheng-Jun Huang. 2018. Partial multi-label learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. 4302--4309.
[41]
Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. 2020. Unsupervised data augmentation for consistency training. Advances in neural information processing systems 33 (2020), 6256--6268.
[42]
Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and Quoc V Le. 2020. Selftraining with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10687--10698.
[43]
Yuzhe Yang and Zhi Xu. 2020. Rethinking the value of labels for improving class-imbalanced learning. Advances in neural information processing systems 33 (2020), 19290--19301.
[44]
Yuzhe Yang, Guo Zhang, Dina Katabi, and Zhi Xu. 2019. Me-net: Towards effective adversarial robustness with matrix estimation. arXiv preprint arXiv:1905.11971 (2019).
[45]
Ekim Yurtsever, Jacob Lambert, Alexander Carballo, and Kazuya Takeda. 2020. A survey of autonomous driving: Common practices and emerging technologies. IEEE access 8 (2020), 58443--58469.
[46]
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017).
[47]
Zhen-Ru Zhang, Qian-Wen Zhang, Yunbo Cao, and Min-Ling Zhang. 2021. Exploiting unlabeled data via partial label assignment for multi-class semisupervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 10973--10980.
[48]
Mingkai Zheng, Shan You, Lang Huang, Fei Wang, Chen Qian, and Chang Xu. 2022. Simmatch: Semi-supervised learning with similarity matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14471--14481.

Index Terms

  1. Learning from Reduced Labels for Long-Tailed Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval
    May 2024
    1379 pages
    ISBN:9798400706196
    DOI:10.1145/3652583
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep models
    2. long-tailed
    3. reduced labels
    4. weakly labels
    5. weakly supervised learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICMR '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 61
      Total Downloads
    • Downloads (Last 12 months)61
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media