Skip to main content

Delusive PageRank in Incomplete Graphs

  • Conference paper
  • First Online:

Part of the book series: Studies in Computational Intelligence ((SCI,volume 812))

Abstract

Most real-world graphs collected from the Web like Web graphs and social network graphs are incomplete. This leads to inaccurate estimates of graph properties based on link analysis such as PageRank. In this paper we focus on studying such deviations in ordering/ranking imposed by PageRank over incomplete graphs. We first show that deviations in rankings induced by PageRank are indeed possible. We measure how much a ranking, induced by PageRank, on an input graph could deviate from the original unseen graph. More importantly, we are interested in conceiving a measure that approximates the rank correlation among them without any knowledge of the original graph. To this extent we formulate the HAK measure that is based on computing the impact redistribution of PageRank according to the local graph structure. Finally, we perform extensive experiments on both real-world Web and social network graphs with more than 100M vertices and 10B edges as well as synthetic graphs to showcase the utility of HAK.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The archive has been generously provided to us by the Internet Archive.

  2. 2.

    http://alexandria-project.eu

References

  1. Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How much of the web is archived? In: Proceeding ACM/IEEE- JCDL 2011

    Google Scholar 

  2. Archiveteam. Friendster Social Network Dataset: Friends, : published under, vol. CC0, p. 1.0. Universal (2011)

    Google Scholar 

  3. Boldi, P., Santini, M., Vigna, S.: Do your worst to make the best: paradoxical effects in pagerank incremental computations. In: WAW (2004)

    Google Scholar 

  4. Boldi, P., Vigna, S.: The WebGraph framework I: Compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601. ACM Press, USA (2004)

    Google Scholar 

  5. Bollobás, B., Borgs, C., Chayes, J., Riordan, O.: Directed scale-free graphs. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms, SODA 2003 (2003)

    Google Scholar 

  6. Costa, M., Gomes, D., Silva, M.J.: The evolution of web archiving. Int. J. Digit. Libr. 18(3), 191–205 (2016)

    Google Scholar 

  7. Dasgupta, A., Kumar, R., Sarlos, T.: On estimating the average degree. In: Proceedings of conference on World wide web, pp. 795–806. ACM (2014)

    Google Scholar 

  8. Erdős, P., Rényi, A.: On random graphs. Publ. Math. Debr. 6, 290–297 (1959)

    Google Scholar 

  9. Gilbert, E.N.: Random graphs. Ann. Math. Stat. 30(4), 1141–1144 (1959)

    Google Scholar 

  10. Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring network structure, dynamics, and function using NetworkX. In: SciPy2008 (2008)

    Google Scholar 

  11. Haveliwala, T.H.: Topic-sensitive pagerank. In: Proceedings of the 11th international conference on WorldWide Web, pp. 517–526. ACM (2002)

    Google Scholar 

  12. Holzmann, H., Nejdl, W., Anand, A.: Exploring web archives through temporal anchor texts. In: Proceedings of ACM Web Science Conference - WebSci 2017 (2017)

    Google Scholar 

  13. Holzmann, H., Nejdl, W., Anand, A.: The dawn of today’s popular domains: a study of the archived german web over 18 years. In: Digital Libraries (JCDL) (2016)

    Google Scholar 

  14. Hübler, C., Kriegel, H.-P., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: Eighth IEEE International Conference on Data Mining, 2008. ICDM 2008, pp. 283–292. IEEE (2008)

    Google Scholar 

  15. Huurdeman, H.C., Ben-David, A., Kamps, J., Samar, T., de Vries, A.P.: Finding pages on the unarchived web. In: IEEE/ACM JCDL (2014)

    Google Scholar 

  16. Kendall, Maurice G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)

    Google Scholar 

  17. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment

    Google Scholar 

  18. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1(1) (2007)

    Google Scholar 

  19. Li, R.-H., Yu, J.X., Qin, L., Mao, R., Jin, T.: On random walk based graph sampling. In: Data Engineering (ICDE), 2015 (2015)

    Google Scholar 

  20. Maiya, A.S., Berger-Wolf, T.Y.: Benefits of bias: towards better characterization of networksampling. In: Proceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, pp. 105–113. ACM (2011)

    Google Scholar 

  21. Ng, A.Y., Zheng, A.X., Jordan, M.I.: Link analysis, eigenvectors and stability. In: Proceedings of International Joint Conference on Artificial Intelligence (2001)

    Google Scholar 

  22. Lawrence, P., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab (1999)

    Google Scholar 

  23. Smith, J.A., Moody, J.: Structural effects of network sampling coverage I: Nodes missing at random. In: Social Networks, vol. 35, pp. 652–668. Elsevier, Amsterdam (2013)

    Google Scholar 

  24. Smith, J.A., Moody, J., Morgan, J.H.: Network sampling coverage II: the effect of non-random missing data on network measurement. In: Social Networks, vol. 48, pp. 78–99. Elsevier, Amsterdam (2017)

    Google Scholar 

  25. The Internet Archive. The Internet Archive, 1996–2017

    Google Scholar 

  26. Vattani, A., Chakrabarti, D., Gurevich, M.: Preserving personalized pagerank in subgraphs. In: Proceedings of ICML (2011)

    Google Scholar 

  27. Wang, D.J., Shi, X., McFarland, D.A., Leskovec, J.: Measurement error in network data: a re-classification. In: Social Networks, vol. 34, pp. 396–409. Elsevier, Amsterdam (2012)

    Google Scholar 

  28. Wang, T., Chen, Y., Zhang, Z., Sun, P., Deng, B., Li, X.: Unbiased sampling in directed social graph. In: ACM SIGCOMM Computer Communication Review, vol. 40, pp. 401–402. ACM (2010)

    Google Scholar 

  29. Zhou, Z., Zhang, N., Gong, Z., Das, G.: Faster random walks by rewiring online social networks on-the-fly. ACM Trans. Database Syst. (TODS) 40(4), 1–36 (2016)

    Google Scholar 

Download references

Acknowledgements

This work is partially funded by ALEXANDRIA (ERC 339233).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Megha Khosla .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Holzmann, H., Anand, A., Khosla, M. (2019). Delusive PageRank in Incomplete Graphs. In: Aiello, L., Cherifi, C., Cherifi, H., Lambiotte, R., Lió, P., Rocha, L. (eds) Complex Networks and Their Applications VII. COMPLEX NETWORKS 2018. Studies in Computational Intelligence, vol 812. Springer, Cham. https://doi.org/10.1007/978-3-030-05411-3_9

Download citation

Publish with us

Policies and ethics