Degree-Normalization Improves Random-Walk-Based Embedding Accuracy in PPI Graphs

Cappelletti, Luca; Taverni, Stefano; Fontana, Tommaso; Joachimiak, Marcin P.; Reese, Justin; Robinson, Peter; Casiraghi, Elena; Valentini, Giorgio

doi:10.1007/978-3-031-34960-7_26

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 13920))

Included in the following conference series:

International Work-Conference on Bioinformatics and Biomedical Engineering

553 Accesses

Abstract

Among the many proposed solutions in graph embedding, traditional random walk-based embedding methods have shown their promise in several fields. However, when the graph contains high-degree nodes, random walks often neglect low- or middle-degree nodes and tend to prefer stepping through high-degree ones instead. This results in random-walk samples providing a very accurate topological representation of neighbourhoods surrounding high-degree nodes, which contrasts with a coarse-grained representation of neighbourhoods surrounding middle and low-degree nodes. This in turn affects the performance of the subsequent predictive models, which tend to overfit high-degree nodes and/or edges having high-degree nodes as one of the vertices. We propose a solution to this problem, which relies on a degree normalization approach. Experiments with popular RW-based embedding methods applied to edge prediction problems involving eight protein-protein interaction (PPI) graphs from the STRING database show the effectiveness of the proposed approach: degree normalization not only improves predictions but also provides more stable results, suggesting that our proposal has a regularization effect leading to a more robust convergence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The bias-correction schema proposed here is made available in Rust with Python bindings as part of the GRAPE library for graph machine learning [3]. Besides novel implementations of DeepWalk, Node2Vec, and Walklets, GRAPE integrates the random forest implementation from sklearn [9].

References

Ben-Israel, A., Greville, T.N.E.: Generalized Inverses: Theory and Applications, vol. 15. Springer, Heidelberg (2003). https://doi.org/10.1007/b97366
Book Google Scholar
Campbell, S.L., Meyer, C.D.: Generalized Inverses of Linear Transformations. SIAM (2009)
Google Scholar
Cappelletti, L., et al.: GRAPE: fast and scalable graph processing and embedding. arXiv preprint arXiv:2110.06196 (2022)
Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over f1 score and accuracy in binary classification evaluation. BMC Genom. 21(1), 1–13 (2020)
Article Google Scholar
Cuzzocrea, A., Cappelletti, L., Valentini, G.: A neural model for the prediction of pathogenic genomic variants in mendelian diseases. In: Proceedings of the 1st International Conference on Advances in Signal Processing and Artificial Intelligence (ASPAI 2019), Barcelona, Spain, pp. 34–38 (2019)
Google Scholar
Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
Google Scholar
Li, M.M., Huang, K., Zitnik, M.: Graph representation learning in biomedicine and healthcare. Nat. Biomed. Eng. 6(12), 1353–1369 (2022)
Article PubMed Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)
Google Scholar
Perozzi, B., Kulkarni, V., Chen, H., Skiena, S.: Don’t walk, skip! Online learning of multi-scale network embeddings. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 258–265 (2017)
Google Scholar
Petrini, A., et al.: parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants. GigaScience 9(5), giaa052 (2020)
Google Scholar
Radhakrishna Rao, C., Mitra, S.K., et al.: Generalized inverse of a matrix and its applications. In: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 601–620. University of California Press, Oakland (1972)
Google Scholar
Szklarczyk, D., et al.: The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49(D1), D605–D612 (2021)
Article CAS PubMed Google Scholar
Yi, H.-C., You, Z.-H., Huang, D.-S., Kwoh, C.K.: Graph representation learning in bioinformatics: trends, methods and applications. Brief. Bioinform. 23(1), bbab340 (2021)
Article Google Scholar
Yue, X., et al.: Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics 36(4), 1241–1251 (2019)
Article PubMed Central Google Scholar

Download references

Acknowledgment

This research was supported by the “National Center for Gene Therapy and Drugs based on RNA Technology”, PNRR-NextGenerationEU program [G43C22001320007].

Author information

Authors and Affiliations

AnacletoLab, Department of Computer Science “Giovanni degli Antoni”, Universitá degli Studi di Milano, 20133, Milan, Italy
Luca Cappelletti, Stefano Taverni, Tommaso Fontana, Elena Casiraghi & Giorgio Valentini
Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Marcin P. Joachimiak, Justin Reese & Elena Casiraghi
The Jackson Laboratory for Genomic Medicine, Farmington, USA
Peter Robinson
CINI, Infolife National Laboratory, Rome, Italy
Elena Casiraghi & Giorgio Valentini
ELLIS, European Laboratory for Learning and Intelligent Systems, Tuebingen, Germany
Giorgio Valentini

Authors

Luca Cappelletti
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Taverni
View author publications
You can also search for this author in PubMed Google Scholar
Tommaso Fontana
View author publications
You can also search for this author in PubMed Google Scholar
Marcin P. Joachimiak
View author publications
You can also search for this author in PubMed Google Scholar
Justin Reese
View author publications
You can also search for this author in PubMed Google Scholar
Peter Robinson
View author publications
You can also search for this author in PubMed Google Scholar
Elena Casiraghi
View author publications
You can also search for this author in PubMed Google Scholar
Giorgio Valentini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giorgio Valentini .

Editor information

Editors and Affiliations

University of Granada, Granada, Spain
Ignacio Rojas
University of Granada, Granada, Spain
Olga Valenzuela
University of Granada, Granada, Spain
Fernando Rojas Ruiz
University of Granada, Granada, Spain
Luis Javier Herrera
University of Granada, Granada, Spain
Francisco Ortuño

A Additional Results

(See Figs. 4 and 5).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cappelletti, L. et al. (2023). Degree-Normalization Improves Random-Walk-Based Embedding Accuracy in PPI Graphs. In: Rojas, I., Valenzuela, O., Rojas Ruiz, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2023. Lecture Notes in Computer Science(), vol 13920. Springer, Cham. https://doi.org/10.1007/978-3-031-34960-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-34960-7_26
Published: 29 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34959-1
Online ISBN: 978-3-031-34960-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Degree-Normalization Improves Random-Walk-Based Embedding Accuracy in PPI Graphs

Abstract

Access this chapter

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Additional Results

A Additional Results

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation