Abstract
We focus on developing a novel scalable graph-based semi-supervised learning (SSL) method for input data consisting of a small amount of labeled data and a large amount of unlabeled data. Due to the lack of labeled data and the availability of large-scale unlabeled data, existing SSL methods usually either encounter suboptimal performance because of an improper graph constructed from input data or are impractical due to the high-computational complexity of solving large-scale optimization problems. In this paper, we propose to address both problems by constructing a novel graph of input data for graph-based SSL methods. A density-based approach is proposed to learn a latent graph from input data. Based on the latent graph, a novel graph construction approach is proposed to construct the graph of input data by an efficient formula. With this formula, two transductive graph-based SSL methods are devised with the computational complexity linear in the number of input data points. Extensive experiments on synthetic data and real datasets demonstrate that the proposed methods not only are scalable for large-scale data, but also achieve good classification performance, especially for an extremely small number of labeled data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. JMLR 7, 2399–2434 (2006)
Bishop, C.M.: Pattern Recognition and Machine Learning, 1st edn. Springer, New York (2006)
Chang, X., Lin, S.B., Zhou, D.X.: Distributed semi-supervised learning with kernel ridge regression. JMLR 18(1), 1493–1514 (2017)
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge, MA (2006)
Chen, J., Liu, Y.: Locally linear embedding: a survey. Artif. Intell. Rev. 36(1), 29–48 (2011)
Chen, Y.C.: A tutorial on kernel density estimation and recent advances. Biostat. Epidemiol. 1(1), 161–187 (2017)
Cohen, G., Afshar, S., Tapson, J., van Schaik, A.: EMNIST: an extension of MNIST to handwritten letters. arXiv preprint arXiv:1702.05373 (2017)
Elhamifar, E., Vidal, R.: Sparse manifold clustering and embedding. In: NIPS, pp. 55–63 (2011)
Fergus, R., Weiss, Y., Torralba, A.: Semi-supervised learning in gigantic image collections. In: NIPS, pp. 522–530 (2009)
Franceschi, L., Niepert, M., Pontil, M., He, X.: Learning discrete structures for graph neural networks. In: ICML (2019)
Geršgorin, S.: Über die abgrenzung der eigenwerte einer matrix. Izv. Akad. Nauk SSSR Ser. Mat 1(7), 749–755 (1931)
Joachims, T.: Transductive learning via spectral graph partitioning. In: ICML, pp. 290–297 (2003)
Karlen, M., Weston, J., Erkan, A., Collobert, R.: Large scale manifold transduction. In: ICML, pp. 448–455 (2008)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Krishnapuram, R., Keller, J.M.: The possibilistic c-means algorithm: insights and recommendations. IEEE Trans. Fuzzy Syst. 4(3), 385–393 (1996)
Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7(1), 48–50 (1956)
Li, Q., Wu, X.M., Liu, H., Zhang, X., Guan, Z.: Label efficient semi-supervised learning via graph filtering. In: CVPR, pp. 9582–9591 (2019)
Liu, W., He, J., Chang, S.F.: Large graph construction for scalable semi-supervised learning. In: ICML, pp. 679–686 (2010)
Mao, Q., Wang, L., Tsang, I.W.: Principal graph and structure learning based on reversed graph embedding. IEEE TPAMI 39(11), 2227–2241 (2016)
Melacci, S., Belkin, M.: Laplacian support vector machines trained in the primal. JMLR 12(Mar), 1149–1184 (2011)
Nocedal, J., Wright, S.: Numerical optimization. Springer Series in Operations Research and Financial Engineering, 2 edn. Springer, New York (2006). https://doi.org/10.1007/978-0-387-40065-5
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Sivananthan, S., et al.: Manifold regularization based on nyström type subsampling. Appl. Comput. Harmonic Anal. 44, 1–200 (2018)
Subramanya, A., Bilmes, J.: Semi-supervised learning with measure propagation. JMLR 12(Nov), 3311–3370 (2011)
Taherkhani, F., Kazemi, H., Nasrabadi, N.M.: Matrix completion for graph-based deep semi-supervised learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5058–5065 (2019)
Wang, F., Zhang, C.: Label propagation through linear neighborhoods. IEEE TKDE 20(1), 55–67 (2007)
Wang, M., Li, H., Tao, D., Lu, K., Wu, X.: Multimodal graph-based reranking for web image search. IEEE TIP 21(11), 4649–4661 (2012)
Yin, K., Tai, X.C.: An effective region force for some variational models for learning and clustering. J. Sci. Comput. 74(1), 175–196 (2018)
Zhang, H., Zhang, Z., Zhao, M., Ye, Q., Zhang, M., Wang, M.: Robust triple-matrix-recovery-based auto-weighted label propagation for classification. arXiv preprint arXiv:1911.08678 (2019)
Zhang, K., Kwok, J.T., Parvin, B.: Prototype vector machine for large scale semi-supervised learning. In: ICML, pp. 1233–1240. ACM (2009)
Zhang, Z., Jia, L., Zhao, M., Liu, G., Wang, M., Yan, S.: Kernel-induced label propagation by mapping for semi-supervised classification. IEEE TBD 5(2), 148–165 (2019)
Zhang, Z., Zhang, Y., Liu, G., Tang, J., Yan, S., Wang, M.: Joint label prediction based semi-supervised adaptive concept factorization for robust data representation. In: IEEE TKDE (2019)
Zhang, Z., Zhao, M., Chow, T.W.: Marginal semi-supervised sub-manifold projections with informative constraints for dimensionality reduction and recognition. Neural Netw. 36, 97–111 (2012)
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: NIPS, pp. 321–328 (2003)
Zhuang, L., Zhou, Z., Gao, S., Yin, J., Lin, Z., Ma, Y.: Label information guided graph construction for semi-supervised learning. IEEE TIP 26(9), 4182–4192 (2017)
Acknowledgements
L. Wang was supported in part by NSF DMS-2009689. R. Chan was supported in part by HKRGC GRF Grants CUHK14301718, CityU11301120, and CRF Grant C1013-21GF. T. Zeng was supported in part by the National Key R &D Program of China under Grant 2021YFE0203700.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Z., Wang, L., Chan, R., Zeng, T. (2023). Exploring Latent Sparse Graph for Large-Scale Semi-supervised Learning. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-26412-2_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)