Exploring Latent Sparse Graph for Large-Scale Semi-supervised Learning

Wang, Zitong; Wang, Li; Chan, Raymond; Zeng, Tieyong

doi:10.1007/978-3-031-26412-2_23

Zitong Wang¹³,
Li Wang^14,15,
Raymond Chan¹⁶ &
…
Tieyong Zeng¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13716))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

538 Accesses

Abstract

We focus on developing a novel scalable graph-based semi-supervised learning (SSL) method for input data consisting of a small amount of labeled data and a large amount of unlabeled data. Due to the lack of labeled data and the availability of large-scale unlabeled data, existing SSL methods usually either encounter suboptimal performance because of an improper graph constructed from input data or are impractical due to the high-computational complexity of solving large-scale optimization problems. In this paper, we propose to address both problems by constructing a novel graph of input data for graph-based SSL methods. A density-based approach is proposed to learn a latent graph from input data. Based on the latent graph, a novel graph construction approach is proposed to construct the graph of input data by an efficient formula. With this formula, two transductive graph-based SSL methods are devised with the computational complexity linear in the number of input data points. Extensive experiments on synthetic data and real datasets demonstrate that the proposed methods not only are scalable for large-scale data, but also achieve good classification performance, especially for an extremely small number of labeled data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. JMLR 7, 2399–2434 (2006)
MathSciNet MATH Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning, 1st edn. Springer, New York (2006)
Google Scholar
Chang, X., Lin, S.B., Zhou, D.X.: Distributed semi-supervised learning with kernel ridge regression. JMLR 18(1), 1493–1514 (2017)
MathSciNet MATH Google Scholar
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge, MA (2006)
Google Scholar
Chen, J., Liu, Y.: Locally linear embedding: a survey. Artif. Intell. Rev. 36(1), 29–48 (2011)
Article Google Scholar
Chen, Y.C.: A tutorial on kernel density estimation and recent advances. Biostat. Epidemiol. 1(1), 161–187 (2017)
Article Google Scholar
Cohen, G., Afshar, S., Tapson, J., van Schaik, A.: EMNIST: an extension of MNIST to handwritten letters. arXiv preprint arXiv:1702.05373 (2017)
Elhamifar, E., Vidal, R.: Sparse manifold clustering and embedding. In: NIPS, pp. 55–63 (2011)
Google Scholar
Fergus, R., Weiss, Y., Torralba, A.: Semi-supervised learning in gigantic image collections. In: NIPS, pp. 522–530 (2009)
Google Scholar
Franceschi, L., Niepert, M., Pontil, M., He, X.: Learning discrete structures for graph neural networks. In: ICML (2019)
Google Scholar
Geršgorin, S.: Über die abgrenzung der eigenwerte einer matrix. Izv. Akad. Nauk SSSR Ser. Mat 1(7), 749–755 (1931)
MATH Google Scholar
Joachims, T.: Transductive learning via spectral graph partitioning. In: ICML, pp. 290–297 (2003)
Google Scholar
Karlen, M., Weston, J., Erkan, A., Collobert, R.: Large scale manifold transduction. In: ICML, pp. 448–455 (2008)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Google Scholar
Krishnapuram, R., Keller, J.M.: The possibilistic c-means algorithm: insights and recommendations. IEEE Trans. Fuzzy Syst. 4(3), 385–393 (1996)
Article Google Scholar
Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7(1), 48–50 (1956)
Article MathSciNet MATH Google Scholar
Li, Q., Wu, X.M., Liu, H., Zhang, X., Guan, Z.: Label efficient semi-supervised learning via graph filtering. In: CVPR, pp. 9582–9591 (2019)
Google Scholar
Liu, W., He, J., Chang, S.F.: Large graph construction for scalable semi-supervised learning. In: ICML, pp. 679–686 (2010)
Google Scholar
Mao, Q., Wang, L., Tsang, I.W.: Principal graph and structure learning based on reversed graph embedding. IEEE TPAMI 39(11), 2227–2241 (2016)
Article Google Scholar
Melacci, S., Belkin, M.: Laplacian support vector machines trained in the primal. JMLR 12(Mar), 1149–1184 (2011)
Google Scholar
Nocedal, J., Wright, S.: Numerical optimization. Springer Series in Operations Research and Financial Engineering, 2 edn. Springer, New York (2006). https://doi.org/10.1007/978-0-387-40065-5
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Sivananthan, S., et al.: Manifold regularization based on nyström type subsampling. Appl. Comput. Harmonic Anal. 44, 1–200 (2018)
Google Scholar
Subramanya, A., Bilmes, J.: Semi-supervised learning with measure propagation. JMLR 12(Nov), 3311–3370 (2011)
Google Scholar
Taherkhani, F., Kazemi, H., Nasrabadi, N.M.: Matrix completion for graph-based deep semi-supervised learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5058–5065 (2019)
Google Scholar
Wang, F., Zhang, C.: Label propagation through linear neighborhoods. IEEE TKDE 20(1), 55–67 (2007)
Google Scholar
Wang, M., Li, H., Tao, D., Lu, K., Wu, X.: Multimodal graph-based reranking for web image search. IEEE TIP 21(11), 4649–4661 (2012)
MathSciNet MATH Google Scholar
Yin, K., Tai, X.C.: An effective region force for some variational models for learning and clustering. J. Sci. Comput. 74(1), 175–196 (2018)
Article MathSciNet MATH Google Scholar
Zhang, H., Zhang, Z., Zhao, M., Ye, Q., Zhang, M., Wang, M.: Robust triple-matrix-recovery-based auto-weighted label propagation for classification. arXiv preprint arXiv:1911.08678 (2019)
Zhang, K., Kwok, J.T., Parvin, B.: Prototype vector machine for large scale semi-supervised learning. In: ICML, pp. 1233–1240. ACM (2009)
Google Scholar
Zhang, Z., Jia, L., Zhao, M., Liu, G., Wang, M., Yan, S.: Kernel-induced label propagation by mapping for semi-supervised classification. IEEE TBD 5(2), 148–165 (2019)
Google Scholar
Zhang, Z., Zhang, Y., Liu, G., Tang, J., Yan, S., Wang, M.: Joint label prediction based semi-supervised adaptive concept factorization for robust data representation. In: IEEE TKDE (2019)
Google Scholar
Zhang, Z., Zhao, M., Chow, T.W.: Marginal semi-supervised sub-manifold projections with informative constraints for dimensionality reduction and recognition. Neural Netw. 36, 97–111 (2012)
Article MATH Google Scholar
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: NIPS, pp. 321–328 (2003)
Google Scholar
Zhuang, L., Zhou, Z., Gao, S., Yin, J., Lin, Z., Ma, Y.: Label information guided graph construction for semi-supervised learning. IEEE TIP 26(9), 4182–4192 (2017)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

L. Wang was supported in part by NSF DMS-2009689. R. Chan was supported in part by HKRGC GRF Grants CUHK14301718, CityU11301120, and CRF Grant C1013-21GF. T. Zeng was supported in part by the National Key R &D Program of China under Grant 2021YFE0203700.

Author information

Authors and Affiliations

Department of Industrial Engineering and Operations Research, Columbia University in the City of New York, New York, USA
Zitong Wang
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, 76019-0408, USA
Li Wang
Department of Mathematics, University of Texas at Arlington, Arlington, TX, 76019-0408, USA
Li Wang
Department of Mathematics, City University of Hong Kong, Kowloon, Hong Kong
Raymond Chan
Department of Mathematics, The Chinese University of Hong Kong, Hong Kong, China
Tieyong Zeng

Authors

Zitong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Li Wang
View author publications
You can also search for this author in PubMed Google Scholar
Raymond Chan
View author publications
You can also search for this author in PubMed Google Scholar
Tieyong Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Wang .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 240 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z., Wang, L., Chan, R., Zeng, T. (2023). Exploring Latent Sparse Graph for Large-Scale Semi-supervised Learning. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-26412-2_23
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Exploring Latent Sparse Graph for Large-Scale Semi-supervised Learning