Abstract
Diffusion State Distance (DSD) is a data-dependent metric that compares data points using a data-driven diffusion process and provides a powerful tool for learning the underlying structure of high-dimensional data. While finding the exact nearest neighbors in the DSD metric is computationally expensive, in this paper, we propose a new random-walk based algorithm that empirically finds approximate k-nearest neighbors accurately in an efficient manner. Numerical results for real-world protein-protein interaction networks are presented to illustrate the efficiency and robustness of the proposed algorithm. The set of approximate k-nearest neighbors performs well when used to predict proteins’ functional labels.
The work of Cowen, Hu, and Wu was partially supported by the National Science Foundation under grant DMS-1812503, CCF-1934553, and OAC-2018149.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66(4), 671–687 (2003)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Berriz, G.F., Beaver, J.E., Cenik, C., Tasan, M., Roth, F.P.: Next generation software for functional trend analysis. Bioinformatics 25(22), 3043–3044 (2009)
Cao, M., et al.: New directions for Diffusion-based network prediction of protein function: incorporating pathways with confidence. Bioinformatics 30(12), i219–i227 (2014)
Cao, M., et al.: Going the distance for protein function prediction: a new distance metric for protein interaction networks. PLoS One 8(10), 1–12 (2013)
Choobdar, S., et al.: Assessment of network module identification across complex diseases. Nat. Methods 16(9), 843–852 (2019)
Consortium, T.G.O.: The gene ontology resource: 20 years and still GOing strong. Nucleic Acids Res. 47(D1), D330–D338 (2018)
Cowen, L., Devkota, K., Hu, X., Murphy, J.M., Wu, K.: Diffusion state distances: multitemporal analysis, fast algorithms, and applications to biological networks. SIAM J. Math. Data Sci. 3(1), 142–170 (2021)
Finkel, R., Friedman, J., Bentley, J.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3, 200–226 (1977)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998)
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26(189–206), 1 (1984)
Kleinberg, J.M.: Two algorithms for nearest-neighbor search in high dimensions. In: Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, pp. 599–608 (1997)
Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput. 30(2), 457–474 (2000)
Li, T., et al.: A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods 14(1), 61 (2017)
Lin, J., Cowen, L.J., Hescott, B., Hu, X.: Computing the diffusion state distance on graphs via algebraic multigrid and random projections. Numer. Linear Algebra Appl. 25(3), e2156 (2018)
Liu, T., Moore, A.W., Yang, K., Gray, A.G.: An investigation of practical approximate nearest neighbor algorithms. In: Advances in Neural Information Processing Systems, pp. 825–832 (2005)
Szklarczyk, D., et al.: STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43, D447–D452 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Cowen, L.J., Hu, X., Lin, J., Shen, Y., Wu, K. (2022). Random-Walk Based Approximate k-Nearest Neighbors Algorithm for Diffusion State Distance. In: Lirkov, I., Margenov, S. (eds) Large-Scale Scientific Computing. LSSC 2021. Lecture Notes in Computer Science, vol 13127. Springer, Cham. https://doi.org/10.1007/978-3-030-97549-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-97549-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97548-7
Online ISBN: 978-3-030-97549-4
eBook Packages: Computer ScienceComputer Science (R0)