Skip to main content

Degree-Normalization Improves Random-Walk-Based Embedding Accuracy in PPI Graphs

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2023)

Abstract

Among the many proposed solutions in graph embedding, traditional random walk-based embedding methods have shown their promise in several fields. However, when the graph contains high-degree nodes, random walks often neglect low- or middle-degree nodes and tend to prefer stepping through high-degree ones instead. This results in random-walk samples providing a very accurate topological representation of neighbourhoods surrounding high-degree nodes, which contrasts with a coarse-grained representation of neighbourhoods surrounding middle and low-degree nodes. This in turn affects the performance of the subsequent predictive models, which tend to overfit high-degree nodes and/or edges having high-degree nodes as one of the vertices. We propose a solution to this problem, which relies on a degree normalization approach. Experiments with popular RW-based embedding methods applied to edge prediction problems involving eight protein-protein interaction (PPI) graphs from the STRING database show the effectiveness of the proposed approach: degree normalization not only improves predictions but also provides more stable results, suggesting that our proposal has a regularization effect leading to a more robust convergence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The bias-correction schema proposed here is made available in Rust with Python bindings as part of the GRAPE library for graph machine learning [3]. Besides novel implementations of DeepWalk, Node2Vec, and Walklets, GRAPE integrates the random forest implementation from sklearn [9].

References

  1. Ben-Israel, A., Greville, T.N.E.: Generalized Inverses: Theory and Applications, vol. 15. Springer, Heidelberg (2003). https://doi.org/10.1007/b97366

    Book  Google Scholar 

  2. Campbell, S.L., Meyer, C.D.: Generalized Inverses of Linear Transformations. SIAM (2009)

    Google Scholar 

  3. Cappelletti, L., et al.: GRAPE: fast and scalable graph processing and embedding. arXiv preprint arXiv:2110.06196 (2022)

  4. Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over f1 score and accuracy in binary classification evaluation. BMC Genom. 21(1), 1–13 (2020)

    Article  Google Scholar 

  5. Cuzzocrea, A., Cappelletti, L., Valentini, G.: A neural model for the prediction of pathogenic genomic variants in mendelian diseases. In: Proceedings of the 1st International Conference on Advances in Signal Processing and Artificial Intelligence (ASPAI 2019), Barcelona, Spain, pp. 34–38 (2019)

    Google Scholar 

  6. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)

    Google Scholar 

  7. Li, M.M., Huang, K., Zitnik, M.: Graph representation learning in biomedicine and healthcare. Nat. Biomed. Eng. 6(12), 1353–1369 (2022)

    Article  PubMed  Google Scholar 

  8. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  9. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    Google Scholar 

  10. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)

    Google Scholar 

  11. Perozzi, B., Kulkarni, V., Chen, H., Skiena, S.: Don’t walk, skip! Online learning of multi-scale network embeddings. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 258–265 (2017)

    Google Scholar 

  12. Petrini, A., et al.: parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants. GigaScience 9(5), giaa052 (2020)

    Google Scholar 

  13. Radhakrishna Rao, C., Mitra, S.K., et al.: Generalized inverse of a matrix and its applications. In: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 601–620. University of California Press, Oakland (1972)

    Google Scholar 

  14. Szklarczyk, D., et al.: The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49(D1), D605–D612 (2021)

    Article  CAS  PubMed  Google Scholar 

  15. Yi, H.-C., You, Z.-H., Huang, D.-S., Kwoh, C.K.: Graph representation learning in bioinformatics: trends, methods and applications. Brief. Bioinform. 23(1), bbab340 (2021)

    Article  Google Scholar 

  16. Yue, X., et al.: Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics 36(4), 1241–1251 (2019)

    Article  PubMed Central  Google Scholar 

Download references

Acknowledgment

This research was supported by the “National Center for Gene Therapy and Drugs based on RNA Technology”, PNRR-NextGenerationEU program [G43C22001320007].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgio Valentini .

Editor information

Editors and Affiliations

A Additional Results

A Additional Results

(See Figs. 4 and 5).

Fig. 4.
figure 4

Average (and standard deviation) of the F1 scores of classic (orange bars) versus degree-normalized (blue bars) DeepWalk, Node2Vec, and Walklets across the 10 test (left) and train (right) holdouts (Color figure online)

Fig. 5.
figure 5

Average (and standard deviation) of the AUROC scores of classic (orange bars) versus degree-normalized (blue bars) DeepWalk, Node2Vec, and Walklets across the 10 test (left) and train (right) holdouts. (Color figure online)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cappelletti, L. et al. (2023). Degree-Normalization Improves Random-Walk-Based Embedding Accuracy in PPI Graphs. In: Rojas, I., Valenzuela, O., Rojas Ruiz, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2023. Lecture Notes in Computer Science(), vol 13920. Springer, Cham. https://doi.org/10.1007/978-3-031-34960-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34960-7_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34959-1

  • Online ISBN: 978-3-031-34960-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics