skip to main content
10.1145/3511808.3557150acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Towards Practical Large Scale Non-Linear Semi-Supervised Learning with Balancing Constraints

Published: 17 October 2022 Publication History

Abstract

Semi-Supervised Support Vector Machine (S3VM) is one of the most popular methods for semi-supervised learning, which can make full use of plentiful, easily accessible unlabeled data. Balancing constraint is normally enforced in S3VM (denoted as BCS3VM) to avoid the harmful solution which assigns all or most of the unlabeled examples to one same label. Traditionally, non-linear BCS3VM is solved by sequential minimal optimization algorithm. Recently, a novel incremental learning algorithm (IL-BCS3VM) was proposed to scale up BCS3VM further. However, IL-BCS3VM needs to calculate the inverse of the linear system related to the support matrix, making the algorithm not scalable enough. To make BCS3VM be more practical in large-scale problems, in this paper, we propose a new scalable BCS3VM with accelerated triply stochastic gradients (denoted as TSG-BCS3VM). Specifically, to make the balancing constraint handle different proportions of positive and negative samples among labeled and unlabeled data, we propose a soft balancing constraint for S3VM. To make the algorithm scalable, we generate triply stochastic gradients by sampling labeled and unlabeled samples as well as the random features to update the solutions, where Quasi-Monte Carlo (QMC) sampling is utilized on random features to accelerate TSG-BCS3VM further. Our theoretical analysis shows that the convergence rate is O(1/√T) for both diminishing and constant learning rates where T is the number of iterations, which is much better than previous results thanks to the QMC method. Empirical results on a variety of benchmark datasets show that our algorithm not only has a good generalization performance but also enjoys better scalability than existing BCS3VM algorithms.

References

[1]
Haim Avron, Vikas Sindhwani, Jiyan Yang, and Michael W. Mahoney. 2016. Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels. Journal of Machine Learning Research 17, 120 (2016), 1--38. http://jmlr.org/papers/v17/14--538.html
[2]
Andrew Carlson, Justin Betteridge, Richard C. Wang, Estevam R. Hruschka, and Tom M. Mitchell. 2010. Coupled Semi-Supervised Learning for Information Extraction. In Proceedings of the Third ACM International Conference on Web Search and Data Mining (New York, New York, USA) (WSDM '10). Association for Computing Machinery, New York, NY, USA, 101--110. https://doi.org/10.1145/1718487.1718501
[3]
Luigi Carratino, Alessandro Rudi, and Lorenzo Rosasco. 2018. Learning with sgd and random features. In Advances in Neural Information Processing Systems. 10213--10224.
[4]
Olivier Chapelle. 2007. Training a Support Vector Machine in the Primal. Neural Computation 19, 5 (2007), 1155--1178. https://doi.org/10.1162/neco.2007.19.5.1155
[5]
Olivier Chapelle, Mingmin Chi, and Alexander Zien. 2006. A Continuation Method for Semi-Supervised SVMs. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, USA) (ICML '06). Association for Computing Machinery, New York, NY, USA, 185--192. https://doi.org/10.1145/1143844.1143868
[6]
Olivier Chapelle, Vikas Sindhwani, and Sathiya Keerthi. 2006. Branch and bound for semi-supervised support vector machines. Advances in neural information processing systems 19 (2006).
[7]
Olivier Chapelle, Vikas Sindhwani, and Sathiya S Keerthi. 2008. Optimization techniques for semi-supervised support vector machines. Journal of Machine Learning Research 9, Feb (2008), 203--233.
[8]
Olivier Chapelle and Alexander Zien. 2005. Semi-supervised classification by low density separation. In AISTATS, Vol. 2005. Citeseer, 57--64.
[9]
Ronan Collobert, Fabian Sinz, Jason Weston, and Léon Bottou. 2006. Large scale transductive SVMs. Journal of Machine Learning Research 7, Aug (2006), 1687--1712.
[10]
Bo Dai, Bo Xie, Niao He, Yingyu Liang, Anant Raj, Maria-Florina Balcan, and Le Song. 2014. Scalable kernel methods via doubly stochastic gradients. arXiv preprint arXiv:1407.5599 (2014).
[11]
Xiang Geng, Bin Gu, Xiang Li, Wanli Shi, Guansheng Zheng, and Heng Huang. 2019. Scalable semi-supervised SVM via triply stochastic gradients. arXiv preprint arXiv:1907.11584 (2019).
[12]
Saeed Ghadimi and Guanghui Lan. 2013. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23, 4 (2013), 2341--2368.
[13]
Fabian Gieseke, Antti Airola, Tapio Pahikkala, and Oliver Kramer. 2012. Sparse Quasi-Newton Optimization for Semi-supervised Support Vector Machines. In ICPRAM (1). 45--54.
[14]
Bin Gu, Zhouyuan Huo, Cheng Deng, and Heng Huang. 2018. Faster Derivative- Free Stochastic Algorithm for Shared Memory Machines. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 1812--1821. https://proceedings.mlr.press/v80/gu18a.html
[15]
Bin Gu, Yingying Shan, Xiang Geng, and Guansheng Zheng. 2018. Accelerated Asynchronous Greedy Coordinate Descent Algorithm for SVMs. In IJCAI. 2170--2176.
[16]
Bin Gu, Xiao-Tong Yuan, Songcan Chen, and Heng Huang. 2018. NewIncremental Learning Algorithm for Semi-Supervised Support Vector Machine. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1475--1484.
[17]
Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid. 2010. Multimodal semi-supervised learning for image classification. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 902--909. https://doi.org/10.1109/CVPR.2010.5540120
[18]
Einar Hille. 1972. Introduction to general theory of reproducing kernels. The Rocky Mountain Journal of Mathematics 2, 3 (1972), 321--368.
[19]
Thomas Hofmann, Bernhard Schölkopf, and Alexander J. Smola. 2008. Kernel methods in machine learning. The annals of statistics 36, 3 (2008), 1171--1220.
[20]
Amir Hussain and Erik Cambria. 2018. Semi-supervised learning for big social data analysis. Neurocomputing 275 (2018), 1662--1673.
[21]
Thorsten Joachims. 1999. Transductive inference for text classification using support vector machines. In Icml, Vol. 99. 200--209.
[22]
Trung Le, Phuong Duong, Mi Dinh, Tu Dinh Nguyen, Vu Nguyen, and Dinh Q Phung. 2016. Budgeted Semi-supervised Support Vector Machine. In UAI.
[23]
Yu-Feng Li, James Kwok, and Zhi-Hua Zhou. 2010. Cost-sensitive semi-supervised support vector machine. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 24.
[24]
Zhiyun Lu, Avner May, Kuan Liu, Alireza Bagheri Garakani, Dong Guo, Aurélien Bellet, Linxi Fan, Michael Collins, Brian Kingsbury, Michael Picheny, and Fei Sha. 2015. How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets. arXiv:1411.4000 [cs.LG]
[25]
Ali Rahimi, Benjamin Recht, et al. 2007. Random Features for Large-Scale Kernel Machines. In NIPS, Vol. 3. Citeseer, 5.
[26]
Bernhard Schölkopf, Alexander J Smola, Francis Bach, et al. 2002. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.
[27]
Vikas Sindhwani and S. Sathiya Keerthi. 2006. Large Scale Semi-Supervised Linear SVMs. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle, Washington, USA) (SIGIR '06). Association for Computing Machinery, New York, NY, USA, 477--484. https://doi.org/10.1145/1148170.1148253
[28]
Fabian Sinz and Matteo Roffilli. 2012. UniverSVM. https://github.com/fabiansinz/UniverSVM
[29]
Fabian H. Sinz, Olivier Chapelle, Alekh Agarwal, and Bernhard Schölkopf. 2007. An Analysis of Inference with the Universum. In Proceedings of the 20th International Conference on Neural Information Processing Systems (Vancouver, British Columbia, Canada) (NIPS'07). Curran Associates Inc., Red Hook, NY, USA, 1369--1376.
[30]
Xilan Tian, Gilles Gasso, and Stéphane Canu. 2012. A multiple kernel framework for inductive semi-supervised SVM learning. Neurocomputing 90 (2012), 46--58.
[31]
Jean-Francois Ton, Seth Flaxman, Dino Sejdinovic, and Samir Bhatt. 2018. Spatial mapping with Gaussian processes and nonstationary Fourier features. Spatial statistics 28 (2018), 59--78.
[32]
Grace Wahba. 1990. Spline models for observational data. SIAM.
[33]
Jason Weston, Ronan Collobert, Fabian Sinz, Léon Bottou, and Vladimir Vapnik. 2006. Inference with the Universum. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, USA) (ICML '06). Association for Computing Machinery, New York, NY, USA, 1009--1016. https://doi.org/10.1145/1143844.1143971
[34]
Christopher Williams and Matthias Seeger. 2001. Using the Nyström Method to Speed Up Kernel Machines. In Advances in Neural Information Processing Systems 13. MIT Press, 682--688.
[35]
Shuyang Yu, Bin Gu, Kunpeng Ning, Haiyan Chen, Jian Pei, and Heng Huang. 2019. Tackle Balancing Constraint for Incremental Semi-Supervised Support Vector Learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery amp; Data Mining (Anchorage, AK, USA) (KDD '19). Association for Computing Machinery, New York, NY, USA, 1587--1595. https://doi.org/10.1145/3292500.3330962

Index Terms

  1. Towards Practical Large Scale Non-Linear Semi-Supervised Learning with Balancing Constraints

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
    October 2022
    5274 pages
    ISBN:9781450392365
    DOI:10.1145/3511808
    • General Chairs:
    • Mohammad Al Hasan,
    • Li Xiong
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. balancing constraint
    2. semi-supervised support vector machine

    Qualifiers

    • Research-article

    Conference

    CIKM '22
    Sponsor:

    Acceptance Rates

    CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 127
      Total Downloads
    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media