Abstract
In the process of human learning, training samples are often obtained successively. Therefore, many human learning tasks exhibit online and semi-supervision characteristics, that is, the observations arrive in sequence and the corresponding labels are presented very sporadically. In this paper, we propose a novel manifold regularized model in a reproducing kernel Hilbert space (RKHS) to solve the online semi-supervised learning (OS2L) problems. The proposed algorithm, named Model-Based Online Manifold Regularization (MOMR), is derived by solving a constrained optimization problem. Different from the stochastic gradient algorithm used for solving the online version of the primal problem of Laplacian support vector machine (LapSVM), the proposed algorithm can obtain an exact solution iteratively by solving its Lagrange dual problem. Meanwhile, to improve the computational efficiency, a fast algorithm is presented by introducing an approximate technique to compute the derivative of the manifold term in the proposed model. Furthermore, several buffering strategies are introduced to improve the scalability of the proposed algorithms and theoretical results show the reliability of the proposed algorithms. Finally, the proposed algorithms are experimentally shown to have a comparable performance to the standard batch manifold regularization algorithm.
Similar content being viewed by others
References
Kivinen J, Smola AJ, Williamson RC. Online learning with kernels. IEEE Trans Sig Process. 2004;52(8):2165–76.
Li GQ, Wen CY, Li ZG, Zhang A, Yang F, Mao K. Model-based online learning with kernels. IEEE Trans Neural Netw Learn Syst. 2013;24(3):356–69.
Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res. 2011;12:2121–59.
Huang KZ, Yang HQ, Lyu MR. Machine learning: modeling data locally and globally. Springer Science & Business Media. 2008.
Orabona F, Keshet J, Caputo B. Bounded kernel-based online learning. J Mach Learn Res. 2009;10: 2643–66.
Ertekin S, Bottou L, Giles CL. Nonconvex online support vector machines. IEEE Trans Pattern Anal Mach Intell. 2011;33(2):368–81.
Hoi SC, Wang JL, Zhao PL. Libol: A library for online learning algorithms. J Mach Learn Res. 2014; 15(1):495–9.
Ding S, Zhang J, Jia H, Qian J. An adaptive density data stream clustering algorithm. Cogn Comput. 2016;8(1):30–8.
Gepperth A, Karaoguz C. A bio-inspired incremental learning architecture for applied perceptual problems. Cogn Comput. 2016;8(5):924–34.
Zhao J, Du C, Sun H, Liu X, Sun J. Biologically motivated model for outdoor scene classification. Cogn Comput. 2015;7(1):20– 33.
Wang D, Qiao H, Zhang B, Wang M. Online support vector machine based on convex Hull vertices selection. IEEE Trans Neural Netw Learn Syst. 2013;24(4):593–609.
Ding SG, Nie XL, Qiao H, Zhang B. Online classification for SAR target recognition based on SVM and approximate convex hull vertices selection. In: 11th World Congress on intelligent control and automation (WCICA); 2014. p. 1473–1478.
Wu PC, Hoi SC, Zhao PL, Xia H, Liu ZY, Miao CY. Online multi-modal distance metric learning with application to image retrieval. IEEE Trans Knowl Data Eng. 2016;28(2):454–67.
Scardapane S, Uncini A. Semi-supervised echo state networks for audio classification. Cogn Comput. 2016;1–11.
Zhang YM, Huang KZ, Geng GG, Liu CL. A fast and robust graph-based transductive learning method. IEEE Trans Neural Netw Learn Syst. 2015;26(9):1979–91.
Zhu XJ, Rogers T, Qian RC, Kalish C. Humans perform semi-supervised classification too. In: Proceedings of the national conference on artificial intelligence. vol. 22. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999; 2007. p. 864.
Yang HQ, Huang KZ, King I, Lyu MR. Maximum margin semi-supervised learning with irrelevant data. Neural Netw. 2015;70 :90–102.
Gibson BR, Rogers TT, Zhu XJ. Human semi-supervised learning. Topics Cogn Sci. 2013;5(1):132–72.
Babenko B, Yang MH, Belongie S. Visual tracking with online multiple instance learning. In: IEEE Conference on computer vision and pattern recognition; 2009. p. 983–990.
Grabner H, Leistner C, Bischof H. Semi-supervised on-line boosting for robust tracking. In: Computer Vision–European conference on computer vision. Springer; 2008. p. 234–247.
Dyer KB, Capo R, Polikar R. Compose: a semisupervised learning framework for initially labeled nonstationary streaming data. IEEE Trans Neural Netw Learn Syst. 2014;25(1):12–26.
Kveton B, Philipose M, Valko M, Huang L. Online semi-supervised perception: Real-time learning without explicit feedback. In: IEEE Computer society conference on computer vision and pattern recognition workshops (CVPRW); 2010. p. 15–21.
Farajtabar M, Shaban A, Rabiee HR, Rohban MH. Manifold coarse graining for online semi-supervised learning. In: Machine Learning and Knowledge Discovery in Databases. Springer; 2011. p. 391–406.
Goldberg AB, Li M, Zhu XJ. Online manifold regularization: A new learning setting and empirical study. Springer. 2008;393–407.
Goldberg AB, Zhu XJ, Furger A, Xu JM. OASIS: Online active semi-supervised learning. In: Proceedings of the Twenty-Fifth AAAI conference on artificial intelligence; 2011.
Sun BL, Li GH, Jia L, Zhang H. Online manifold regularization by dual ascending procedure. Math Probl Eng. 2013;2013.
Sun BL, Li GH, Jia L, Huang KH. Online coregularization for multiview semisupervised learning. Sci World J. 2013;2013.
Ding SG, Xi XY, Liu ZY, Qiao H, Zhang B. A novel manifold regularized online semi-supervised learning algorithm. In: International conference on neural information processing. Springer; 2016. p. 597–605.
Slater M. Lagrange multipliers revisited. Springer. 2014.
Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res. 2006;7:2399–434.
Schölkopf B, Herbrich R, Smola AJ. A generalized representer theorem. In: Computational learning theory. Springer; 2001. p. 416–426.
Melacci S, Belkin M. Laplacian support vector machines trained in the primal. J Mach Learn Res. 2011; 12:1149–84.
Dekel O, Shalev-Shwartz S, Singer Y. The forgetron: A kernel-based perceptron on a budget. SIAM J Comput. 2008;37(5):1342–72.
Griva I, Nash SG, Sofer A. Linear and nonlinear optimization. 2009.
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.
Heisele B, Poggio T, Pontil M. Face detection in still gray images. AI Memo 1697 Massachusetts Institute of Technology. 2000.
Liu J, Luo J, Shah M. Recognizing realistic actions from videos “in the wild”. In: IEEE Conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE; 2009. p. 1996–2003.
Wang H, Kläser A, Schmid C, Liu CL. Action recognition by dense trajectories. In: 2011 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE; 2011. p. 3169– 3176.
Acknowledgment
This work is partly supported by NSFC grants 61375005, U1613213, 61210009, 61627808, 61603389, 61602483, MOST grants 2015BAK35B00, 2015BAK35B01, Guangdong Science and Technology Department grant 2016B090910001, and BNSF grant 4174107.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
We declare that we have no conflict of interest.
Human and Animal Rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Appendix
Appendix
In this Appendix, we give out the derivation process of Eq. 13.
For simplicity, we define D and W as
Substituting (10), (23), (24) into (12) and letting L = D − W, we have
where α = [α 1,..., α t+1]T, \(\tilde {\alpha }^{t} = [{\alpha ^{t}_{1}},...,{\alpha ^{t}_{t}},0]^{T}\), K is a (t + 1) × (t + 1) Gram Matrix with K i j = K(x i , x j ), J = K e, e = [0,..., 0, 1]T is a (t + 1)-dimensional vector and c 0 is a constant.
Note that L(α, ξ t+1, γ t+1, β t+1) attains its minimum with respect to α and ξ t+1, if and only if the following conditions are satisfied:
Therefore, we have
According to the above identity, we formulate a reduced Lagrangian:
Taking derivative of Eq. 29 with respect to α, we have:
Note that ∂ L R/∂ α = 0. Therefore, we have:
Substituting (31) back into the reduced Lagrangian (29), we get:
where A = K + λ 1 K + λ 2 K L K.
Let \(\overline {\gamma }_{t+1}\) be the stationary point of the object function of Eq. 32.
Therefore,
Assume that the optimal solution of Eq. 32 is γ t+1∗ . Note that the object function (32) is quadratic, so the optimal solution γ t+1∗ in the interval [0, C] is at either 0, C or \(\overline {\gamma }_{t+1}\). Hence
Furthermore, if δ t+1 = 0, we can obtain the solution of the proposed model by the similar process as above. Thus, the classifier obtained at time t + 1 is:
where
Rights and permissions
About this article
Cite this article
Ding, S., Xi, X., Liu, Z. et al. A Novel Manifold Regularized Online Semi-supervised Learning Model. Cogn Comput 10, 49–61 (2018). https://doi.org/10.1007/s12559-017-9489-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-017-9489-x