Abstract
Supervised learning often requires a large number of labeled examples, which has become a critical bottleneck in the case that manual annotating the class labels is costly. To mitigate this issue, a new framework called pairwise comparison (Pcomp) classification is proposed to allow training examples only weakly annotated with pairwise comparison, i.e., which one of two examples is more likely to be positive. The previous study solves Pcomp problems by minimizing the classification error, which may lead to less robust model due to its sensitivity to class distribution. In this paper, we propose a robust learning framework for Pcomp data along with a pairwise surrogate loss called Pcomp-AUC. It provides an unbiased estimator to equivalently maximize AUC without accessing the precise class labels. Theoretically, we prove the consistency with respect to AUC and further provide the estimation error bound for the proposed method. Empirical studies on multiple datasets validate the effectiveness of the proposed method.
Similar content being viewed by others
References
Zhou Z H. A brief introduction to weakly supervised learning. National Science Review, 2018, 5(1): 44–53
Zhu X, Goldberg A B. Introduction to Semi-Supervised Learning. Cham: Springer, 2009, 1–130
Niu G, Jitkrittum W, Dai B Hachiya H, Sugiyama M. Squared-loss mutual information regularization: a novel information-theoretic approach to semi-supervised learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning. 2013, III-10–III-18
Natarajan N, Dhillon I S, Ravikumar P, Tewari A. Learning with noisy labels. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 1196–1204
Liu T, Tao D. Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine intelligence, 2016, 38(3): 447–461
Du Plessis M C, Niu G, Sugiyama M. Analysis of learning from positive and unlabeled data. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 703–711
Du Plessis M, Niu G, Sugiyama M. Convex formulation for learning from positive and unlabeled data. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 1386–1394
Kiryo R, Niu G, du Plessis M C, Sugiyama M. Positive-unlabeled learning with non-negative risk estimator. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 1674–1684
Cour T, Sapp B, Taskar B. Learning from partial labels. The Journal of Machine Learning Research, 2011, 12: 1501–1536
Xie M K, Huang S J. Partial multi-label learning. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 4302–4309
Feng L, Lv J, Han B, Xu M, Niu G, Geng X, An B, Sugiyama M. Provably consistent partial-label learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 10948–10960
Ishida T, Niu G, Hu W, Sugiyama M. Learning from complementary labels. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 5644–5654
Yu X Y, Liu T L, Gong M M, Tao D C. Learning with biased complementary labels. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 69–85
Bao H, Niu G, Sugiyama M. Classification from pairwise similarity and unlabeled data. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 452–461
Shimada T, Bao H, Sato I, Sugiyama M. Classification from pairwise similarities/dissimilarities and unlabeled data via empirical risk minimization. Neural Computation, 2021, 33(5): 1234–1268
Feng L, Shu S, Cao Y, Tao L, Wei H, Xiang T, An B, Niu G. Multiple-instance learning from similar and dissimilar bags. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 374–382
Zhang D, Han J, Cheng G, Yang M H. Weakly supervised object localization and detection: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9): 5866–5885
Zhang D, Zeng W, Yao J, Han J. Weakly supervised object detection using proposal- and semantic-level relationships. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 3349–3363
Feng L, Shu S, Lu N, Han B, Xu M, Niu G, An B, Sugiyama M. Pointwise binary classification with pairwise confidence comparisons. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 3252–3262
Zhou Z H, Chen K J, Jiang Y. Exploiting unlabeled data in content-based image retrieval. In: Proceedings of the 15th European Conference on Machine Learning. 2004, 525–536
Cortes C, Mohri M. AUC optimization vs. error rate minimization. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. 2003, 313–320
Elkan C. The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence. 2001, 973–978
Freund Y, Iyer R, Schapire R E, Singer Y. An efficient boosting algorithm for combining preferences. The Journal of Machine Learning Research, 2003, 4: 933–969
Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters, 2006, 27(8): 861–874
Zhou K, Gao S, Cheng J, Gu Z, Fu H, Tu Z, Yang J, Zhao Y, Liu J. Sparse-Gan: sparsity-constrained generative adversarial network for anomaly detection in retinal OCT image. In: Proceedings of the 17th IEEE International Symposium on Biomedical Imaging. 2020, 1227–1231
Liu W, Luo W, Lian D, Gao S. Future frame prediction for anomaly detection–a new baseline. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 6536–6545
Liu C, Zhong Q, Ao X, Sun L, Lin W, Feng J, He Q, Tang J. Fraud transactions detection via behavior tree with local intention calibration. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020, 3035–3043
Chen Y, Chen B, He X, Gao C, Li Y, Lou J G, Wang Y. λOpt: learn to regularize recommender models in finer levels. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019, 978–986
Dai L, Yin Y, Qin C, Xu T, He X, Chen E, Xiong H. Enterprise cooperation and competition analysis with a sign-oriented preference network. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, 774–782
Yang Z Y, Xu Q Q, Bao S, Bao S L, Cao X C, Huang Q M. Learning with Multiclass AUC: theory and algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(11): 7747–7763
Calders T, Jaroszewicz S. Efficient AUC optimization for classification. In: Proceedings of the 11th European Conference on Principles of Data Mining and Knowledge Discovery. 2007, 42–53
Herschtal A, Raskutti B. Optimising area under the ROC curve using gradient descent. In: Proceedings of the 21st International Conference on Machine Learning. 2004, 49
Joachims T. Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006, 217–226
Zhao P, Hoi S C H, Jin R, Yang T. Online AUC maximization. In: Proceedings of the 28th International Conference on International Conference on Machine Learning. 2011, 233–240
Gao W, Jin R, Zhu S, Zhou Z H. One-pass AUC optimization. In: Proceedings of the 30th International Conference on Machine Learning. 2013, III-906–III-914
Ying Y, Wen L, Lyu S. Stochastic online AUC maximization. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 451–459
Dang Z, Li X, Gu B, Deng C, Huang H. Large-scale nonlinear AUC maximization via triply stochastic gradients. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(3): 1385–1398
Agarwal S, Graepel T, Herbrich R, Har-Peled S, Roth D. Generalization bounds for the area under the ROC curve. The Journal of Machine Learning Research, 2005, 6: 393–425
Usunier N, Amini M R, Gallinari P. A data-dependent generalisation error bound for the AUC. In: Proceedings of the ICML 2005 Workshop on ROC Analysis in Machine Learning. 2005
Agarwal S. Surrogate regret bounds for bipartite ranking via strongly proper losses. The Journal of Machine Learning Research, 2014, 15(1): 1653–1674
Gao W, Zhou Z H. On the consistency of AUC pairwise optimization. In: Proceedings of the 24th International Conference on Artificial Intelligence. 2015, 939–945
Elkan C, Noto K. Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 213–220
Niu G, du Plessis, Sakai T, Ma Y, Sugiyama M. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 1207–1215
Ren K, Yang H, Zhao Y, Chen W, Xue M, Miao H, Huang S, Liu J. A robust AUC maximization framework with simultaneous outlier detection and feature selection for positive-unlabeled classification. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(10): 3072–3083
Lu N, Niu G, Menon A K, Sugiyama M. On the minimal supervision for training any binary classifier from only unlabeled data. In: Proceedings of the 7th International Conference on Learning Representations. 2019
Brefeld U, Scheffer T. AUC maximizing support vector learning. In: Proceedings of ICML 2005 workshop on ROC Analysis in Machine Learning. 2005
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278–2324
Xiao H, Rasul K, Vollgraf R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. 2017, arXiv preprint arXiv: 1708.07747
Clanuwat T, Bober-Irizar M, Kitamoto A, Lamb A, Yamamoto K, Ha D. Deep learning for classical Japanese literature. 2018, arXiv preprint arXiv: 1812.01718
Hull J J. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1994, 16(5): 550–554
Dua D, Graff C. UCI Machine Learning Repository. Irvine: University of California, School of Information and Computer Science. See archive.ics.uci.edu/ml/citation_policy.html website, 2019
Kingma D P, Ba J. Adam: a method for stochastic optimization. 2014, arXiv preprint arXiv: 1412.6980
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E Z, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S. PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019, 8024–8035
Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 2004, 32(1): 56–85
Mohri M, Rostamizadeh A, Talwalkar A. Foundations of Machine Learning. 2nd ed. MIT Press, 2018
Acknowledgements
This research was supported by the Natural Science Foundation of Jiangsu Province, China (BK20222012, BK20211517), the National Key R&D Program of China (2020AAA0107000), and National Natural Science Foundation of China (Grant No. 62222605)
Author information
Authors and Affiliations
Corresponding author
Additional information
Haochen Shi received BS, MS degrees in signal and information processing from Nanjing University of Aeronautics and Astronautics, China in 2016 and 2018, respectively, and received a MS degree in electronics from Queen’s University Belfast, UK in 2019. She is now a PhD student in Nanjing University of Aeronautics and Astronautics, China. Her main research interrests include machine learning, image processing and pattern recognition.
Mingkun Xie received the BS degree in 2018. He is currently a PhD student in the MIIT Key Laboratory of Pattern Analysis and Machine Intelligence of Nanjing University of Aeronautics and Astronautics, China. He has served as a PC member of NeurIPS, ICML, ICLR, also a reviewer of TNNLS, MLJ. His research interests are mainly in machine learning. Particularly, he is interested in multilabel learning and weakly-supervised learning.
Shengjun Huang received the BS and PhD in computer science from Nanjing University, China in 2008 and 2014, respectively. He is now a professor in the College of Computer Science and Technology of Nanjing University of Aeronautics and Astronautics, China. His main research interests include machine learning and data mining. He has been selected to the Young Elite Scientists Sponsorship Program by CAST in 2016, and won the China Computer Federation Outstanding Doctoral Dissertation Award in 2015, the KDD Best Poster Award at the in 2012, and the Microsoft Fellowship Award in 2011. He is a Junior Associate Editor of Frontiers of Computer Science.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Shi, H., Xie, M. & Huang, S. Robust AUC maximization for classification with pairwise confidence comparisons. Front. Comput. Sci. 18, 184317 (2024). https://doi.org/10.1007/s11704-023-2709-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-023-2709-5