research-article

Towards Practical Large Scale Non-Linear Semi-Supervised Learning with Balancing Constraints

Authors:

Zhengqing Gao,

Huimin Wu,

Martin Takáč,

Bin GuAuthors Info & Claims

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Pages 3072 - 3081

https://doi.org/10.1145/3511808.3557150

Published: 17 October 2022 Publication History

Get Access

Abstract

Semi-Supervised Support Vector Machine (S³VM) is one of the most popular methods for semi-supervised learning, which can make full use of plentiful, easily accessible unlabeled data. Balancing constraint is normally enforced in S³VM (denoted as BCS³VM) to avoid the harmful solution which assigns all or most of the unlabeled examples to one same label. Traditionally, non-linear BCS³VM is solved by sequential minimal optimization algorithm. Recently, a novel incremental learning algorithm (IL-BCS³VM) was proposed to scale up BCS³VM further. However, IL-BCS³VM needs to calculate the inverse of the linear system related to the support matrix, making the algorithm not scalable enough. To make BCS³VM be more practical in large-scale problems, in this paper, we propose a new scalable BCS³VM with accelerated triply stochastic gradients (denoted as TSG-BCS³VM). Specifically, to make the balancing constraint handle different proportions of positive and negative samples among labeled and unlabeled data, we propose a soft balancing constraint for S³VM. To make the algorithm scalable, we generate triply stochastic gradients by sampling labeled and unlabeled samples as well as the random features to update the solutions, where Quasi-Monte Carlo (QMC) sampling is utilized on random features to accelerate TSG-BCS³VM further. Our theoretical analysis shows that the convergence rate is O(1/√T) for both diminishing and constant learning rates where T is the number of iterations, which is much better than previous results thanks to the QMC method. Empirical results on a variety of benchmark datasets show that our algorithm not only has a good generalization performance but also enjoys better scalability than existing BCS³VM algorithms.

References

[1]

Haim Avron, Vikas Sindhwani, Jiyan Yang, and Michael W. Mahoney. 2016. Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels. Journal of Machine Learning Research 17, 120 (2016), 1--38. http://jmlr.org/papers/v17/14--538.html

Abstract

References

Index Terms

Recommendations

Tackle Balancing Constraint for Incremental Semi-Supervised Support Vector Learning

An overview on semi-supervised support vector machine

Large-scale semi-supervised learning for natural language processing

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations