A Consistent Diffusion-Based Algorithm for Semi-Supervised Graph Learning

Bonald, Thomas; De Lara, Nathan

doi:10.1007/978-3-031-53468-3_23

Thomas Bonald⁶ &
Nathan De Lara⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1141))

Included in the following conference series:

International Conference on Complex Networks and Their Applications

1213 Accesses

Abstract

The task of semi-supervised classification aims at assigning labels to all nodes of a graph based on the labels known for a few nodes, called the seeds. One of the most popular algorithms relies on the principle of heat diffusion, where the labels of the seeds are spread by thermo-conductance and the temperature of each node at equilibrium is used as a score function for each label. In this paper, we prove that this algorithm is not consistent unless the temperatures of the nodes at equilibrium are centered before scoring. This crucial step does not only make the algorithm provably consistent on a block model but brings significant performance gains on real graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Optimizing Diffusion Rate and Label Reliability in a Graph-Based Semi-supervised Classifier

Graph Laplacian for Semi-supervised Learning

Network-Based Semi-Supervised Learning

Notes

1.
The number of citations of the paper [14] exceeds 4 000 in 2023, according to Google Scholar.
2.
https://perso.telecom-paris.fr/bonald/notebooks/diffusion.ipynb.
3.
https://snap.stanford.edu/.
4.
https://netset.telecom-paris.fr/.

References

Airoldi, E.M., Blei, D.M., Fienberg, S.E., Xing, E.P.: Mixed membership stochastic blockmodels. J. Mach. Learn. Res. (2008)
Google Scholar
Berberidis, D., Nikolakopoulos, A.N., Giannakis, G.B.: Adadif: Adaptive diffusions for efficient semi-supervised learning over graphs. In: International Conference on Big Data. IEEE (2018)
Google Scholar
Chung, F.R.: Spectral graph theory. American Mathematical Soc. (1997)
Google Scholar
Donnat, C., Zitnik, M., Hallac, D., Leskovec, J.: Learning structural node embeddings via diffusion wavelets. In: International Conference on Knowledge Discovery & Data Mining. In: ACM (2018)
Google Scholar
Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete structures. In: Proceedings of the 19th international conference on machine learning (2002)
Google Scholar
Li, Q., An, S., Li, L., Liu, W.: Semi-supervised learning on graph with an alternating diffusion process. CoRR (2019)
Google Scholar
Ma, H., King, I., Lyu, M.R.: Mining web graphs for recommendations. IEEE Transactions on Knowledge and Data Engineering (2011)
Google Scholar
Newman, M.E.J., Girvan, M.: Mixing patterns and community structure in networks. In: Pastor-Satorras, R., Rubi, M., Diaz-Guilera, A. (eds.) Statistical Mechanics of Complex Networks, pp. 66–87. Springer Berlin Heidelberg, Berlin, Heidelberg (2003). https://doi.org/10.1007/978-3-540-44943-0_5
Chapter Google Scholar
Rossi, E., Kenlay, H., Gorinova, M.I., Chamberlain, B.P., Dong, X., Bronstein, M.M.: On the unreasonable effectiveness of feature propagation in learning on graphs with missing node features. In: Proceedings of Machine Learning Research (2022)
Google Scholar
Thanou, D., Dong, X., Kressner, D., Frossard, P.: Learning heat diffusion graphs. IEEE Transactions on Signal and Information Processing over Networks (2017)
Google Scholar
Tremblay, N., Borgnat, P.: Graph wavelets for multiscale community mining. IEEE Transactions on Signal Processing (2014)
Google Scholar
Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. (1977)
Google Scholar
Zhu, X.: Semi-supervised learning with graphs. Ph.D. thesis, Carnegie Mellon University (2005)
Google Scholar
Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International conference on Machine learning (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut Polytechnique de Paris, Paris, France
Thomas Bonald & Nathan De Lara

Authors

Thomas Bonald
View author publications
You can also search for this author in PubMed Google Scholar
Nathan De Lara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Bonald .

Editor information

Editors and Affiliations

University of Burgundy, Dijon Cedex, France
Hocine Cherifi
Thomas J. Watson College of Engineering and Applied Sciences, Binghamton University, Binghamton, NY, USA
Luis M. Rocha
IUT Lumière - Université Lyon 2, University of Lyon, Bron, France
Chantal Cherifi
Department of Economics, Yildiz Technical University, Istanbul, Türkiye
Murat Donduran

Appendices

Appendix

A Proof of Lemma 1

Proof

In view of (2), we have:

$$\begin{aligned} (n_1(p-q) + nq) T_1 &= s_1 p + (n_1-s_1)pT_1 + \sum _{j\ne 1} (n_j - s_j) qT_j,\\ (n_k(p-q) + nq) T_k& = s_1 q + (n_k-s_k)pT_k + \sum _{j\ne k} (n_j - s_j) qT_j, \end{aligned}$$

for $ k=2,\ldots ,K$. We deduce:

$$\begin{aligned} (s_1(p-q) + nq) T_1 &= s_1 p + Uq,\\ (s_k(p-q) + nq) T_k &= s_1 q + Uq\quad \quad \forall k=2,\ldots ,K, \end{aligned}$$

with

$$ U = \sum _{j=1}^K (n_j - s_j) T_j. $$

The proof then follows from the fact that

$$ n \bar{T} = s_1 + \sum _{j=1}^K (n_j - s_j) T_j = s_1 + U. $$

B Proof of Theorem 1

Proof

Let $\varDelta ^{(1)}_k = T_k - \bar{T}$ be the deviation of temperature of non-seed nodes of block k for the Dirichlet problem associated with label 1. In view of Lemma 1, we have:

$$\begin{aligned} (s_1(p-q) + nq) \varDelta ^{(1)}_1 &= s_1 (p-q) (1-\bar{T}),\\ (s_k(p-q) + nq)\varDelta ^{(1)}_k &= -s_k(p-q) \bar{T} \quad \quad k=2,\ldots ,K, \end{aligned}$$

For $p>q$, using the fact that $\bar{T} \in (0,1)$, we get $\varDelta ^{(1)}_1 > 0$ and $\varDelta ^{(1)}_k<0$ for all $k=2,\ldots ,K$. By symmetry, for each label $l = 1,\ldots ,K$, $\varDelta ^{(l)}_l > 0$ and $\varDelta ^{(l)}_k<0$ for all $k\ne l$. We deduce that for each block k, $\hat{y}_i=\arg \max _{l}\varDelta ^{(l)}_k = k$ for each free node i of block k.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bonald, T., De Lara, N. (2024). A Consistent Diffusion-Based Algorithm for Semi-Supervised Graph Learning. In: Cherifi, H., Rocha, L.M., Cherifi, C., Donduran, M. (eds) Complex Networks & Their Applications XII. COMPLEX NETWORKS 2023. Studies in Computational Intelligence, vol 1141. Springer, Cham. https://doi.org/10.1007/978-3-031-53468-3_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-53468-3_23
Published: 20 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53467-6
Online ISBN: 978-3-031-53468-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

A Consistent Diffusion-Based Algorithm for Semi-Supervised Graph Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing Diffusion Rate and Label Reliability in a Graph-Based Semi-supervised Classifier

Graph Laplacian for Semi-supervised Learning

Network-Based Semi-Supervised Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix

A Proof of Lemma 1

Proof

B Proof of Theorem 1

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us