Abstract
Semi-supervised learning (SSL) aims to improve performance by exploiting unlabeled data when labels are scarce. Conventional SSL studies typically assume close environments where important factors (e.g., label, feature, distribution) between labeled and unlabeled data are consistent. However, more practical tasks involve open environments where important factors between labeled and unlabeled data are inconsistent. It has been reported that exploiting inconsistent unlabeled data causes severe performance degradation, even worse than the simple supervised learning baseline. Manually verifying the quality of unlabeled data is not desirable, therefore, it is important to study robust SSL with inconsistent unlabeled data in open environments. This paper briefly introduces some advances in this line of research, focusing on techniques concerning label, feature, and data distribution inconsistency in SSL, and presents the evaluation benchmarks. Open research problems are also discussed for reference purposes.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Wang Y, Chen H, Fan Y, Sun W, Tao R, Hou W, Wang R, Yang L, Zhou Z, Guo L Z, Qi H, Wu Z, Li Y, Nakamura S, Ye W, Savvides M, Raj B, Shinozaki T, Schiele B, Wang J, Xie X, Zhang Y. USB: a unified semi-supervised learning benchmark for classification. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 285
Zhou Z H. Open-environment machine learning. National Science Review, 2022, 9(8): nwac123
Guo L Z, Li Y F. A general formulation for safely exploiting weakly supervised data. In: Proceedings of the 32nd AAAI conference on Artificial Intelligence. 2018, 3126–3133
Oliver A, Odena A, Raffel C, Cubuk E D, Goodfellow I J. Realistic evaluation of deep semi-supervised learning algorithms. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 3239–3250
Guo L Z, Zhang Z Y, Jiang Y, Li Y F, Zhou Z H. Safe deep semi-supervised learning for unseen-class unlabeled data. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 365
Li Y F, Guo L Z, Zhou Z H. Towards safe weakly supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(1): 334–346
Li Y F, Liang D M. Safe semi-supervised learning: a brief introduction. Frontiers of Computer Science, 2019, 13(4): 669–676
Wang C, Cao X, Guo L, Shi Z. DualMatch: robust semi-supervised learning with dual-level interaction. In: Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2023, 102–119
Hendrycks D, Gimpel K. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: Proceedings of the 5th International Conference on Learning Representations. 2017
Zhou Z, Guo L Z, Jia L H, Zhang D C, Li Y F. ODS: test-time adaptation in the presence of open-world data shift. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 1793
Zhou Z, Yang M, Shi J X, Guo L Z, Li Y F. DeCoOp: robust prompt tuning with out-of-distribution detection. In: Proceedings of the 41st International Conference on Machine Learning. 2024
Shao J J, Guo L Z, Yang X W, Li Y F. LOG: active model adaptation for label-efficient OOD generalization. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 801
Geng C, Huang S J, Chen S. Recent advances in open set recognition: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3614–3631
Shao J J, Yang X W, Guo L Z. Open-set learning under covariate shift. Machine Learning, 2024, 113(4): 1643–1659
Sehwag V, Chiang M, Mittal P. SSD: a unified framework for self-supervised outlier detection. In: Proceedings of the 9th International Conference on Learning Representations. 2021
Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 149
Yang H, Zhu S, King I, Lyu M R. Can irrelevant data help semi-supervised learning, why and how?. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, 937–946
Jia L H, Guo L Z, Zhou Z, Li Y F. Realistic evaluation of semi-supervised learning algorithms in open environments. In: Proceedings of the 12th International Conference on Learning Representations. 2024
Zhou Z, Guo L Z, Cheng Z, Li Y, Pu S. STEP: out-of-distribution detection in the presence of limited in-distribution labeled data. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2021, 29168–29180
Chen Y, Zhu X, Li W, Gong S. Semi-supervised learning under class distribution mismatch. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 3569–3576
Yu Q, Ikami D, Irie G, Aizawa K. Multi-task curriculum framework for open-set semi-supervised learning. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 438–454
Saito K, Kim D, Saenko K. OpenMatch: open-set consistency regularization for semi-supervised learning with outliers. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 1987
Peng A Y, Koh Y S, Riddle P, Pfahringer B. Investigating the effect of novel classes in semi-supervised learning. In: Proceedings of the 11th Asian Conference on Machine Learning. 2019, 615–630
Huang J, Fang C, Chen W, Chai Z, Wei X, Wei P, Lin L, Li G. Trash to treasure: harvesting OOD data with cross-modal matching for open-set semi-supervised learning. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 8290–8299
Cao K, Brbic M, Leskovec J. Open-world semi-supervised learning. In: Proceedings of the 10th International Conference on Learning Representations. 2022
Guo L Z, Zhang Y G, Wu Z F, Shao J J, Lit Y F. Robust semi-supervised learning when not all classes have labels. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 239
Masana M, Liu X, Twardowski B, Menta M, Bagdanov A D, Van De Weijer J. Class-incremental learning: survey and performance evaluation on image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(5): 5513–5533
Borisov V, Leemann T, Seßler K, Haug J, Pawelczyk M, Kasneci G. Deep neural networks and tabular data: a survey. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(6): 7499–7519
Carlini N. Poisoning the unlabeled dataset of semi-supervised learning. In: Proceedings of the 30th USENIX Security Symposium. 2021, 1577–1592
Yan Z, Li G, Tian Y, Wu J, Li S, Chen M, Poor H V. DeHiB: deep hidden backdoor attack on semi-supervised learning via adversarial perturbation. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 10585–10593
Liu X, Si S, Zhu X, Li Y, Hsieh C J. A unified framework for data poisoning attack to graph-based semi-supervised learning. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 877
Miyato T, Maeda S I, Koyama M, Ishii S. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1979–1993
Yu B, Wu J, Ma J, Zhu Z. Tangent-normal adversarial regularization for semi-supervised learning. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 10668–10676
Najafi A, Maeda S I, Koyama M, Miyato T. Robustness to adversarial perturbations in learning from incomplete data. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 497
Zhao P, Zhang Y J, Zhang L, Zhou Z H. Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization. Journal of Machine Learning Research, 2024, 25(98): 1–52
Mo S, Kim M, Lee K, Shin J. S-CLIP: semi-supervised vision-language learning using few specialist captions. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 2674
Zhou Z, Shi J X, Song P X, Yang X W, Jin Y X, Guo L Z, Li Y F. LawGPT: a Chinese legal knowledge-enhanced large language model. 2024, arXiv preprint arXiv: 2406.04614
Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345–1359
Chen K, Yao L, Zhang D, Chang X, Long G, Wang S. Distributionally robust semi-supervised learning for people-centric sensing. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 3321–3328
Huang Z, Xue C, Han B, Yang J, Gong C. Universal semi-supervised learning. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 2046
Guo L Z, Zhou Z, Li Y F. RECORD: resource constrained semi-supervised learning under distribution shift. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020, 1636–1644
Jia L H, Guo L Z, Zhou Z, Shao J J, Xiang Y K, Li Y F. Bidirectional adaptation for robust semi-supervised learning with inconsistent data distributions. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 607
Kim J, Hur Y, Park S, Yang E, Hwang S J, Shin J. Distribution aligning refinery of pseudo-label for imbalanced semi-supervised learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1221
Wei C, Sohn K, Mellina C, Yuille A, Yang F. CReST: a class-rebalancing self-training framework for imbalanced semi-supervised learning. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 10852–10861
Guo L Z, Zhou Z, Shao J J, Zhang Q, Kuang F, Li G L, Liu Z X, Wu G B, Ma N, Li Q, Li Y F. Learning from imbalanced and incomplete supervision with its application to ride-sharing liability judgment. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 487–495
Guo L Z, Li Y F. Class-imbalanced semi-supervised learning with adaptive thresholding. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 8082–8094
Wei T, Liu Q Y, Shi J X, Tu W W, Guo L Z. Transfer and share: semi-supervised learning from long-tailed data. Machine Learning, 2024, 113(4): 1725–1742
Caputo B, Müller H, Martinez-Gomez J, Villegas M, Acar B, Patricia N, Marvasti N, Üsküdarlı S, Paredes R, Cazorla M, Garcia-Varea I, Morell V. ImageCLEF 2014: overview and analysis of the results. In: Proceedings of the 5th International Conference of the Cross-Language Evaluation Forum for European Languages. 2014, 192–211
McAuley J, Leskovec J. Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM Conference on Recommender Systems. 2013, 165–172
Maas A L, Daly R E, Pham P T, Huang D, Ng A Y, Potts C. Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 142–150
Jia L H, Guo L Z, Zhou Z, Li Y F. LAMDA-SSL: a comprehensive semi-supervised learning toolkit. Science China Information Sciences, 2024, 67(1): 117101
Ye H J, Liu S Y, Cai H R, Zhou Q L, Zhan D C. A closer look at deep learning on tabular data. 2024, arXiv preprint arXiv: 2407.00956
Zhou Z, Jin Y X, Li Y F. RTS: Learning robustly from time series data with noisy label. Froniters of Computer Science, 2024, 18(6): 186332.
Guo L Z, Zhou Z, Li Y F, Zhou Z H. Identifying useful learnwares for heterogeneous label spaces. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 12122–12131
Li S Y, Zhao S J, Cao Z T, Huang S J, Chen S C. Robust domain adaptation with noisy and shifted label distribution. Froniters of Computer Science, 2025, 19(3): 193310.
Huang J, Gu S, Hou L, Wu Y, Wang X, Yu H, Han J. Large language models can self-improve. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 1051–1068
Zhu B, Zhang H. Debiasing vision-language models for vision tasks: A survey. Froniters of Computer Science, 2025, 19(1): 191321.
Yu T, Kumar A, Chebotar Y, Hausman K, Finn C, Levine S. How to leverage unlabeled data in offline reinforcement learning. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 25611–25635
Shao J J, Shi H S, Guo L Z, Li Y F. Offline imitation learning with model-based reverse augmentation. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2024, 2608–2617
Zheng Q, Henaff M, Amos B, Grover A. Semi-supervised offline reinforcement learning with action-free trajectories. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 1782
Li Z, Xu T, Qin Z, Yu Y, Luo Z Q. Imitation learning from imperfection: theoretical justifications and algorithms. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 810
Acknowledgements
This research was supported by the Key Program of Jiangsu Science Foundation (BK20243012) and the National Natural Science Foundation of China (NSFC) (Grant Nos. 62306133, 62176118).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.
Additional information
Lan-Zhe Guo is an assistant professor in the School of Intelligence Science and Technology at Nanjing University, China. His research interests are mainly in semi-supervised learning and robust machine learning. He has published over 30 papers in top-tier conferences and journals such as ICML, NeurIPS, ICLR, TPAMI, and received the Outstanding Doctoral Dissertation Award from CAAI.
Lin-Han Jia is currently working toward a PhD degree in the School of Computer Science at Nanjing University, China. His research interests are mainly in weakly supervised learning and optimization.
Jie-Jing Shao is currently working toward a PhD degree in the School of Computer Science at Nanjing University, China. His research interests are mainly in weakly supervised learning and reinforcement learning.
Yu-Feng Li is a professor in the School of Artificial Intelligence at Nanjing University, China. His research interests are mainly in weakly supervised learning, statistical learning, and optimization. He has received the PAKDD Early-Career Research Award. He is/was co-chair of ACML 22/21 journal track, and Area Chair/SPC of top-tier conferences such as ICML, NeurIPS, ICLR, AAAI.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Guo, LZ., Jia, LH., Shao, JJ. et al. Robust semi-supervised learning in open environments. Front. Comput. Sci. 19, 198345 (2025). https://doi.org/10.1007/s11704-024-40646-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-024-40646-w