Skip to main content

An Information Theoretic Perspective for Heterogeneous Subgraph Federated Learning

  • Conference paper
  • First Online:
  • 1997 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13943))

Abstract

Mining graph data has gained wide attention in modern applications. With the explosive growth of graph data, it is common to see many of them collected and stored in different distinction systems. These local graphs can not be directly shared due to privacy and bandwidth concerns. Thus, Federated Learning approach needs to be considered to collaboratively train a powerful generalizable model. However, these local subgraphs are usually heterogeneously distributed. Such heterogeneity brings challenges for subgraph federated learning. In this work, we analyze subgraph federated learning and find that sub-optimal objectives under the FedAVG training setting influence the performance of GNN. To this end, we propose InfoFedSage, a federated subgraph learning framework guided by Information bottleneck to alleviate the non-iid issue. Experiments on public datasets demonstrate the effectiveness of InfoFedSage against heterogeneous subgraph federated learning.

J. Guo and S. Li—Equal contribution.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Achille, A., Soatto, S.: Information dropout: learning optimal representations through noisy computation. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2897–2905 (2018)

    Article  Google Scholar 

  2. Alemi, A.A., Fischer, I., Dillon, J.V., Murphy, K.: Deep variational information bottleneck. arXiv preprint arXiv:1612.00410 (2016)

  3. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008, 10008 (2008)

    Article  MATH  Google Scholar 

  4. Blum, A., Haghtalab, N., Procaccia, A.D.: Variational dropout and the local reparameterization trick. In: NIPS (2015)

    Google Scholar 

  5. Gao, C., et al.: Graph neural networks for recommender systems: challenges, methods, and directions. ArXiv abs/2109.12843 (2021)

    Google Scholar 

  6. Guo, J.N., Li, S., Zhao, Y., Zhang, Y.: Learning robust representation through graph adversarial contrastive learning. In: Bhattacharya, A., et al. (eds.) DASFAA 2022. LNCS, vol. 13245, pp. 682–697. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-00123-9_54

    Chapter  Google Scholar 

  7. Guo, J., et al.: Learning multi-granularity user intent unit for session-based recommendation. In: Proceedings of the 15’th ACM International Conference on Web Search and Data Mining, WSDM 2022 (2022)

    Google Scholar 

  8. Guo, J., Zhang, P., Li, C., Xie, X., Zhang, Y., Kim, S.: Evolutionary preference learning via graph nested GRU ode for session-based recommendation. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management (2022)

    Google Scholar 

  9. Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS (2017)

    Google Scholar 

  10. Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm (1979)

    Google Scholar 

  11. He, C., et al.: FedGraphNN: a federated learning system and benchmark for graph neural networks. ArXiv abs/2104.07145 (2021)

    Google Scholar 

  12. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. CoRR abs/1312.6114 (2014)

    Google Scholar 

  13. Li, D., Wang, J.: FedMD: heterogenous federated learning via model distillation. ArXiv abs/1910.03581 (2019)

    Google Scholar 

  14. Li, T., Sahu, A.K., Talwalkar, A.S., Smith, V.: Federated learning: challenges, methods, and future directions. IEEE Signal Process. Mag. 37, 50–60 (2020)

    Google Scholar 

  15. Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2, 429–450 (2020)

    Google Scholar 

  16. Li, X., Huang, K., Yang, W., Wang, S., Zhang, Z.: On the convergence of FedAvg on non-IID data. ArXiv abs/1907.02189 (2020)

    Google Scholar 

  17. Lin, T., Kong, L., Stich, S.U., Jaggi, M.: Ensemble distillation for robust model fusion in federated learning. ArXiv abs/2006.07242 (2020)

    Google Scholar 

  18. McMahan, H.B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS (2017)

    Google Scholar 

  19. Namata, G., London, B., Getoor, L., Huang, B.: Query-driven active surveying for collective classification (2012)

    Google Scholar 

  20. Qiu, Y., Huang, C., Wang, J., Huang, Z., Xiao, J.: A privacy-preserving subgraph-level federated graph neural network via differential privacy. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds.) KSEM 2022. LNCS, vol. 13370, pp. 165–177. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-10989-8_14

    Chapter  Google Scholar 

  21. Sahu, A.K., Li, T., Sanjabi, M., Zaheer, M., Talwalkar, A.S., Smith, V.: Federated optimization in heterogeneous networks. arXiv Learning (2020)

    Google Scholar 

  22. Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data (2008)

    Google Scholar 

  23. Shchur, O., Mumme, M., Bojchevski, A., Günnemann, S.: Pitfalls of graph neural network evaluation. ArXiv abs/1811.05868 (2018)

    Google Scholar 

  24. Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle. In: 2015 IEEE Information Theory Workshop (ITW), pp. 1–5 (2015)

    Google Scholar 

  25. Wasserman, S., Faust, K.: Social network analysis - methods and applications. In: Structural Analysis in the Social Sciences (2007)

    Google Scholar 

  26. Wu, L., et al.: Graph neural networks for natural language processing: a survey. ArXiv abs/2106.06090 (2021)

    Google Scholar 

  27. Wu, T., Ren, H., Li, P., Leskovec, J.: Graph information bottleneck. ArXiv abs/2010.12811 (2020)

    Google Scholar 

  28. Xie, H., Ma, J., Xiong, L., Yang, C.: Federated graph classification over non-IID graphs. Adv. Neural. Inf. Process. Syst. 34, 18839–18852 (2021)

    Google Scholar 

  29. Zhang, K., Yang, C., Li, X., Sun, L., Yiu, S.M.: Subgraph federated learning with missing neighbor generation. ArXiv abs/2106.13430 (2021)

    Google Scholar 

  30. Zhang, P., et al.: Efficiently leveraging multi-level user intent for session-based recommendation via atten-mixer network. arXiv preprint arXiv:2206.12781 (2022)

  31. Zhu, Z., Hong, J., Zhou, J.: Data-free knowledge distillation for heterogeneous federated learning. Proc. Mach. Learn. Res. 139, 12878–12889 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shangyang Li .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Proof for Proposition 4.1

We first state a lemma.

Lemma A.1. Given X have n states (\(x_1,x_2, \cdots x_n\)) and \(x_1\) can be divided into k sub states (\(x_{11},x_{12}, \cdots x_{1n}\)), Y has m states (\(y_1,y_2, \cdots y_n\)), we have

$$\begin{aligned} \begin{aligned}&I\left( x_{11}, x_{12}, \cdots , x_{1 k}, x_{2}, \cdots , x_{n} ; Y\right) \\&=\,p\left( x_{1}\right) \cdot I\left( x_{11}, x_{12}, \cdots , x_{1 k} ; Y\right) +I\left( x_{1}, x_{2}, \cdots , x_{n} ; Y\right) \end{aligned} \end{aligned}$$
(15)

Proof. The mutual information between (\(x_{11},x_{12}, \cdots x_{1n}\)) and Y:

$$\begin{aligned} \begin{aligned}&I\left( x_{11}, x_{12}, \cdots , x_{1 k} ; Y\right) \\&=\,H\left( x_{11}, x_{12}, \cdots , x_{1 k}\right) -H\left( x_{11}, x_{12}, \cdots , x_{1 k} / Y\right) \\ \end{aligned} \end{aligned}$$
(16)

Then, we have

$$\begin{aligned} \begin{aligned}&I\left( x_{11}, x_{12}, \cdots , x_{1 k}, x_{2}, \cdots , x_{n} ; Y\right) \\&= -\,\sum _{t=1}^{k} p\left( x_{1 t}\right) \log \frac{p\left( x_{1 t}\right) }{p\left( x_{1}\right) }+\sum _{t=1}^{k} \sum _{j=1}^{m} p\left( x_{1 t} y_{j}\right) \log \frac{p\left( x_{1 t} y_{j}\right) }{p\left( x_{1} y_{j}\right) }\\&+\,I\left( x_{1}, x_{2}, \cdots , x_{n} ; Y\right) \\&=\, p\left( x_{1}\right) \cdot I\left( x_{11}, x_{12}, \cdots , x_{1 k} ; Y\right) +I\left( x_{1}, x_{2}, \cdots , x_{n} ; Y\right) \end{aligned} \end{aligned}$$
(17)

Then, we can get Corollary A.1:

$$\begin{aligned} I\left( x_{11}, x_{12}, \cdots , x_{1 k}, x_{2}, \cdots , x_{n} ; Y\right) \ge I\left( x_{1}, x_{2}, \cdots , x_{n} ; Y\right) \end{aligned}$$
(18)

We restate Proposition 4.1: For \(Z_X' = Z_{X1} \cup \cdots \cup Z_{Xm}\) and \(Y' = Y_{1} \cup \cdots \cup Y_{m}\), we have

$$\begin{aligned} \begin{aligned} {\mathcal {L}}_{\text {FedAVG}} \ge -\frac{1}{m}\sum _{i=1}^{m} I(Z_{X_i}, Y_{i}) \ge -I(Z_X', Y'). \end{aligned} \end{aligned}$$
(19)

Proof. We first consider the first inequality, since the definition of mutual information has the form

$$\begin{aligned} \begin{aligned} I(Z_{X_i}, Y_i)=\sum p(y_i, z_{X_i}) \log p(y_i \mid z_{X_i})+H(Y_i) \end{aligned} \end{aligned}$$
(20)

Notice that the entropy of labels \(H(Y_i)\) is independent of our optimization procedure and can be ignored. Therefore, we have

$$\begin{aligned} \begin{aligned} {\mathcal {L}}_{\textrm{FedAVG}} =&-\frac{1}{m}\sum _{i=1}^{m} I(Z_{X_i} ; Y_i) + \frac{1}{m} H(Y) \ge -\frac{1}{m} \sum _{i=1}^{m} I(Z_{X_i}, Y_{i}). \end{aligned} \end{aligned}$$
(21)

Then, we consider the second inequality. Directly, based on our Corollary A.1, we can get

$$\begin{aligned} \begin{aligned} {\mathcal {L}}_{\textrm{FedAVG}} \ge&-\frac{1}{m}\sum _{i=1}^{m} I(Z_{X_i}, Y_{i}) \\ \ge&-max (I(Z_{X_i}, Y_{i})) \ge -I(Z', Y_{i} ) \ge -I(Z', Y') \end{aligned} \end{aligned}$$
(22)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guo, J., Li, S., Zhang, Y. (2023). An Information Theoretic Perspective for Heterogeneous Subgraph Federated Learning. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13943. Springer, Cham. https://doi.org/10.1007/978-3-031-30637-2_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30637-2_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30636-5

  • Online ISBN: 978-3-031-30637-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics