ABSTRACT
Neural network models trained in a supervised learning way have become dominant. Although high performances can be achieved when training data is ample, the performance on labels with sparse training instances can be poor. This performance drift caused by imbalanced data is named as long tail issue and impacts many NN models used in reality. In this talk, we will firstly review machine learning approaches addressing the long-tail issue. Next, we will report on our effort on applying one recent LT-addressing method on the item categorization (IC) task that aims to classify product description texts into leaf nodes in a category taxonomy tree. In particular, we adopted a new method, which consists of decoupling the entire classification task into (a) learning representations using the K-positive contrastive loss (KCL) and (b) training a classifier on balanced data set, into IC tasks. Using SimCSE to be our self-learning backbone, we demonstrated that the proposed method works on the IC text classification task. In addition, we spotted a shortcoming in the KCL: false negative (FN) instances may harm the representation learning step. After eliminating FN instances, IC performance (measured by macro-F1) has been further improved.
- Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. 2019. Learning imbalanced datasets with label-distribution-aware margin loss. arXiv preprint arXiv:1906.07413 (2019).Google Scholar
- Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, Vol. 16 (2002), 321--357.Google ScholarCross Ref
- Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607.Google Scholar
- Tsai-Shien Chen, Wei-Chih Hung, Hung-Yu Tseng, Shao-Yi Chien, and Ming-Hsuan Yang. 2021. Incremental False Negative Detection for Contrastive Learning. arXiv preprint arXiv:2106.03719 (2021).Google Scholar
- Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9268--9277.Google ScholarCross Ref
- Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Empirical Methods in Natural Language Processing (EMNLP).Google Scholar
- Tri Huynh, Simon Kornblith, Matthew R Walter, Michael Maire, and Maryam Khademi. 2020. Boosting contrastive self-supervised learning with false negative cancellation. arXiv preprint arXiv:2011.11765 (2020).Google Scholar
- Bingyi Kang, Yu Li, Sa Xie, Zehuan Yuan, and Jiashi Feng. 2021. Exploring Balanced Feature Spaces for Representation Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=OqtLIabPTitGoogle Scholar
- Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. 2019. Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217 (2019).Google Scholar
- Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. arXiv preprint arXiv:2004.11362 (2020).Google Scholar
- Mengmeng Li, Tian Gan, Meng Liu, Zhiyong Cheng, Jianhua Yin, and Liqiang Nie. 2019. Long-tail hashtag recommendation for micro-videos with graph convolutional network. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 509--518.Google ScholarDigital Library
- Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.Google ScholarCross Ref
- Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X. Yu. 2019. Large-Scale Long-Tailed Recognition in an Open World. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Lin Xiao, Xiangliang Zhang, Liping Jing, Chi Huang, and Mingyang Song. 2021. Does Head Label Help for Long-Tailed Multi-Label Text Classification. arXiv preprint arXiv:2101.09704 (2021).Google Scholar
- Yuzhe Yang and Zhi Xu. 2020. Rethinking the value of labels for improving class-imbalanced learning. arXiv preprint arXiv:2006.07529 (2020).Google Scholar
- Boyan Zhou, Quan Cui, Xiu-Shen Wei, and Zhao-Min Chen. 2020a. Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9719--9728.Google ScholarCross Ref
- Xiangzeng Zhou, Pan Pan, Yun Zheng, Yinghui Xu, and Rong Jin. 2020b. Large scale long-tailed product recognition system at alibaba. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 3353--3356.Google ScholarDigital Library
Index Terms
- Utilizing Contrastive Learning To Address Long Tail Issue in Product Categorization
Recommendations
Contrastive learning from label distribution: A case study on text classification
Highlights- The proposed method learns under the supervision of the predicted label distribution.
AbstractState-of-the-art text classification models are dominated by deep neural networks, but they still struggle to the issue of poor generalization ability when using cross entropy loss for training. One of the reasons is the training ...
Label contrastive learning for image classification
AbstractImage classification is one of the most important research tasks in computer vision. Current image classification methods with supervised learning have achieved good classification accuracy. However, supervised image classification methods mainly ...
Generalized self-supervised contrastive learning with bregman divergence for image recognition
Highlights- A generalized contrastive learning framework is proposed.
- Bregman divergence ...
AbstractContrastive learning techniques continue to receive a lot of attention in the self-supervised learning area. Specifically, the learned distance features can be further utilized to capture the distance between latent features in the ...
Comments