Skip to main content

SimSky: An Accuracy-Aware Algorithm for Single-Source SimRank Search

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Research Track (ECML PKDD 2023)

Abstract

SimRank is a popular node-pair similarity search model based on graph topology. It has received sustained attention due to its wide range of applications in real-world scenarios. Considerable effort has been devoted to devising fast algorithms for SimRank computation through either iterative approaches or random walk based methods. In this paper, we propose an efficient accuracy-aware algorithm for computing single-source SimRank similarity. First, we devise an algorithm, ApproxDiag, to approximate the diagonal correction matrix. Next, we propose an efficient algorithm, named SimSky, which utilizes two Krylov subspaces for transforming high-dimensional single-source SimRank search into low-dimensional matrix-vector multiplications. Extensive experiments on various real datasets demonstrate the superior search quality of SimSky compared to other competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    d denotes the average node degree.

  2. 2.

    https://github.com/AnonSimRank/SimSky.

  3. 3.

    http://snap.stanford.edu/data/index.html.

References

  1. Bekas, C., Kokiopoulou, E., Saad, Y.: An estimator for the diagonal of a matrix. Appl. Numer. Math. 57(11–12), 1214–1229 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  2. Boley, D.L.: Krylov space methods on state-space control models. Circuits Syst. Signal Process. 13, 733–758 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  3. Fouss, F., Pirotte, A., Renders, J.M., Saerens, M.: Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355–369 (2007)

    Article  Google Scholar 

  4. Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Onizuka, M.: Efficient search algorithm for SimRank. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 589–600. IEEE (2013)

    Google Scholar 

  5. Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543 (2002)

    Google Scholar 

  6. Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the 12th International Conference on World Wide Web, pp. 271–279 (2003)

    Google Scholar 

  7. Kusumoto, M., Maehara, T., Kawarabayashi, K.i.: Scalable similarity search for SimRank. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 325–336 (2014)

    Google Scholar 

  8. Liben-Nowell, D., Kleinberg, J.: The link prediction problem for social networks. J. Am. Soc. Inform. Sci. Technol. 58(7), 1019–1031 (2007)

    Article  Google Scholar 

  9. Lu, J., Gong, Z., Yang, Y.: A matrix sampling approach for efficient SimRank computation. Inf. Sci. 556, 1–26 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  10. Rothe, S., Schütze, H.: CoSimRank: a flexible & efficient graph-theoretic similarity measure. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1392–1402 (2014)

    Google Scholar 

  11. Saad, Y.: Analysis of some Krylov subspace approximations to the matrix exponential operator. SIAM J. Numer. Anal. 29(1), 209–228 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  12. Tian, B., Xiao, X.: SLING: a near-optimal index structure for SimRank. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1859–1874 (2016)

    Google Scholar 

  13. Wang, H., Wei, Z., Yuan, Y., Du, X., Wen, J.R.: Exact single-source SimRank computation on large graphs. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 653–663 (2020)

    Google Scholar 

  14. Yu, W., Iranmanesh, S., Haldar, A., Zhang, M., Ferhatosmanoglu, H.: Rolesim*: scaling axiomatic role-based similarity ranking on large graphs. World Wide Web 25(2), 785–829 (2022). https://doi.org/10.1007/s11280-021-00925-z

  15. Yu, W., Lin, X., Zhang, W., Pei, J., McCann, J.A.: Simrank*: effective and scalable pairwise similarity search based on graph topology. VLDB J. 28(3), 401–426 (2019)

    Article  Google Scholar 

  16. Yu, W., McCann, J.A.: Efficient partial-pairs SimRank search on large networks. Proc. VLDB Endow. 8(5), 569–580 (2015)

    Article  Google Scholar 

  17. Yu, W., McCann, J.A., Zhang, C., Ferhatosmanoglu, H.: Scaling high-quality pairwise link-based similarity retrieval on billion-edge graphs. ACM Trans. Inf. Syst. 40(4), 78:1–78:45 (2022). https://doi.org/10.1145/3495209

  18. Yu, W., McCann, J.A.: High quality graph-based similarity search. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 83–92 (2015)

    Google Scholar 

  19. Yu, W., Wang, F.: Fast exact CoSimRank search on evolving and static graphs. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, 23–27 April 2018, pp. 599–608. ACM (2018). https://doi.org/10.1145/3178876.3186126

  20. Yu, W., Yang, J., Zhang, M., Wu, D.: CoSimHeat: an effective heat kernel similarity measure based on billion-scale network topology. In: WWW 2022: The ACM Web Conference 2022, Virtual Event, Lyon, France, 25–29 April 2022, pp. 234–245. ACM (2022). https://doi.org/10.1145/3485447.3511952

Download references

Acknowledgments

This work has been supported by the National Natural Science Foundation of China under Grant No. 61972203.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiren Yu .

Editor information

Editors and Affiliations

Ethics declarations

Ethical Statement

We acknowledge the importance of ethical considerations in the design of our ApproxDiag and SimSky algorithms. All the datasets used in this paper are publicly-available online, and do not have any privacy issues. We ensure that our algorithms do not lead to any potential negative influences. We declare that we allow our algorithms to be used for the benefit of society.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yan, L., Yu, W. (2023). SimSky: An Accuracy-Aware Algorithm for Single-Source SimRank Search. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14171. Springer, Cham. https://doi.org/10.1007/978-3-031-43418-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43418-1_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43417-4

  • Online ISBN: 978-3-031-43418-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics