Skip to main content

Random-Walk Based Approximate k-Nearest Neighbors Algorithm for Diffusion State Distance

  • Conference paper
  • First Online:
Book cover Large-Scale Scientific Computing (LSSC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13127))

Included in the following conference series:

  • 910 Accesses

Abstract

Diffusion State Distance (DSD) is a data-dependent metric that compares data points using a data-driven diffusion process and provides a powerful tool for learning the underlying structure of high-dimensional data. While finding the exact nearest neighbors in the DSD metric is computationally expensive, in this paper, we propose a new random-walk based algorithm that empirically finds approximate k-nearest neighbors accurately in an efficient manner. Numerical results for real-world protein-protein interaction networks are presented to illustrate the efficiency and robustness of the proposed algorithm. The set of approximate k-nearest neighbors performs well when used to predict proteins’ functional labels.

The work of Cowen, Hu, and Wu was partially supported by the National Science Foundation under grant DMS-1812503, CCF-1934553, and OAC-2018149.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66(4), 671–687 (2003)

    Article  MathSciNet  Google Scholar 

  2. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  Google Scholar 

  3. Berriz, G.F., Beaver, J.E., Cenik, C., Tasan, M., Roth, F.P.: Next generation software for functional trend analysis. Bioinformatics 25(22), 3043–3044 (2009)

    Article  Google Scholar 

  4. Cao, M., et al.: New directions for Diffusion-based network prediction of protein function: incorporating pathways with confidence. Bioinformatics 30(12), i219–i227 (2014)

    Article  Google Scholar 

  5. Cao, M., et al.: Going the distance for protein function prediction: a new distance metric for protein interaction networks. PLoS One 8(10), 1–12 (2013)

    Google Scholar 

  6. Choobdar, S., et al.: Assessment of network module identification across complex diseases. Nat. Methods 16(9), 843–852 (2019)

    Article  Google Scholar 

  7. Consortium, T.G.O.: The gene ontology resource: 20 years and still GOing strong. Nucleic Acids Res. 47(D1), D330–D338 (2018)

    Google Scholar 

  8. Cowen, L., Devkota, K., Hu, X., Murphy, J.M., Wu, K.: Diffusion state distances: multitemporal analysis, fast algorithms, and applications to biological networks. SIAM J. Math. Data Sci. 3(1), 142–170 (2021)

    Article  MathSciNet  Google Scholar 

  9. Finkel, R., Friedman, J., Bentley, J.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3, 200–226 (1977)

    MATH  Google Scholar 

  10. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998)

    Google Scholar 

  11. Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26(189–206), 1 (1984)

    MathSciNet  MATH  Google Scholar 

  12. Kleinberg, J.M.: Two algorithms for nearest-neighbor search in high dimensions. In: Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, pp. 599–608 (1997)

    Google Scholar 

  13. Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput. 30(2), 457–474 (2000)

    Article  MathSciNet  Google Scholar 

  14. Li, T., et al.: A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods 14(1), 61 (2017)

    Article  Google Scholar 

  15. Lin, J., Cowen, L.J., Hescott, B., Hu, X.: Computing the diffusion state distance on graphs via algebraic multigrid and random projections. Numer. Linear Algebra Appl. 25(3), e2156 (2018)

    Google Scholar 

  16. Liu, T., Moore, A.W., Yang, K., Gray, A.G.: An investigation of practical approximate nearest neighbor algorithms. In: Advances in Neural Information Processing Systems, pp. 825–832 (2005)

    Google Scholar 

  17. Szklarczyk, D., et al.: STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43, D447–D452 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaozhe Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cowen, L.J., Hu, X., Lin, J., Shen, Y., Wu, K. (2022). Random-Walk Based Approximate k-Nearest Neighbors Algorithm for Diffusion State Distance. In: Lirkov, I., Margenov, S. (eds) Large-Scale Scientific Computing. LSSC 2021. Lecture Notes in Computer Science, vol 13127. Springer, Cham. https://doi.org/10.1007/978-3-030-97549-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-97549-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-97548-7

  • Online ISBN: 978-3-030-97549-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics