skip to main content
research-article

Bavarian: Betweenness Centrality Approximation with Variance-aware Rademacher Averages

Published: 06 March 2023 Publication History

Abstract

“[A]llain Gersten, Hopfen, und Wasser” — 1516 Reinheitsgebot
We present Bavarian, a collection of sampling-based algorithms for approximating the Betweenness Centrality (BC) of all vertices in a graph. Our algorithms use Monte-Carlo Empirical Rademacher Averages (MCERAs), a concept from statistical learning theory, to efficiently compute tight bounds on the maximum deviation of the estimates from the exact values. The MCERAs provide a sample-dependent approximation guarantee much stronger than the state-of-the-art, thanks to its use of variance-aware probabilistic tail bounds. The flexibility of the MCERAs allows us to introduce a unifying framework that can be instantiated with existing sampling-based estimators of BC, thus allowing a fair comparison between them, decoupled from the sample-complexity results with which they were originally introduced. Additionally, we prove novel sample-complexity results showing that, for all estimators, the sample size sufficient to achieve a desired approximation guarantee depends on the vertex-diameter of the graph, an easy-to-bound characteristic quantity. We also show progressive-sampling algorithms and extensions to other centrality measures, such as percolation centrality. Our extensive experimental evaluation of Bavarian shows the improvement over the state-of-the-art made possible by the MCERAs (2–4× reduction in the error bound), and it allows us to assess the different trade-offs between sample size and accuracy guarantees offered by the different estimators.

References

[1]
Ziyad AlGhamdi, Fuad Jamour, Spiros Skiadopoulos, and Panos Kalnis. 2017. A benchmark for betweenness centrality approximation algorithms on large graphs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, 6:1--6:12.
[2]
Josh Alman and Virginia Vassilevska Williams. 2021. A refined laser method and faster matrix multiplication. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms. SIAM, 522–539.
[3]
Jac. M. Anthonisse. 1971. The Rush in a Directed Graph. Technical Report BN 9/71. Stichting Mathematisch Centrum, Amsterdam, Netherlands.
[4]
David A. Bader, Shiva Kintali, Kamesh Madduri, and Milena Mihail. 2007. Approximating betweenness centrality. In Proceedings of the Algorithms and Models for the Web-Graph. Anthony Bonato and Fan R. K. Chung (Eds.), Lecture Notes in Computer Science, Vol. 4863, Springer Berlin, 124–137.
[5]
Peter L. Bartlett and Shahar Mendelson. 2002. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3, Nov. (2002), 463–482.
[6]
Alex Bavelas. 1950. Communication patterns in task-oriented groups. The Journal of the Acoustical Society of America 22, 6 (1950), 725–730.
[7]
George Bennett. 1962. Probability inequalities for the sum of independent random variables. Journal of the American Statistical Association 57, 297 (1962), 33–45.
[8]
Elisabetta Bergamini and Henning Meyerhenke. 2015. Fully-dynamic approximation of betweenness centrality. In Proceedings of the 23rd European Symposium on Algorithms. 155–166.
[9]
Elisabetta Bergamini and Henning Meyerhenke. 2016. Approximating betweenness centrality in fully-dynamic networks. Internet Mathematics 12, 5 (2016), 281–314.
[10]
Elisabetta Bergamini, Henning Meyerhenke, and Christian L. Staudt. 2015. Approximating betweenness centrality in large evolving networks. In Proceedings of the 17th Workshop on Algorithm Engineering and Experiments. SIAM, 133–146.
[11]
Paolo Boldi and Sebastiano Vigna. 2014. Axioms for centrality. Internet Mathematics 10, 3–4 (2014), 222–262.
[12]
Francesco Bonchi, Gianmarco De Francisci Morales, and Matteo Riondato. 2016. Centrality measures on big graphs: Exact, approximated, and distributed algorithms. In Proceedings of the 25th International Conference Companion on World Wide Web. 1017–1020.
[13]
Michele Borassi and Emanuele Natale. 2019. KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation. Journal of Experimental Algorithmics 24, 1 (2019), 1–35.
[14]
Stephen P. Borgatti and Martin G. Everett. 2006. A graph-theoretic perspective on centrality. Social Networks 28, 4 (2006), 466–484.
[15]
Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. 2000. A sharp concentration inequality with application. Random Structures & Algorithms 16, 3 (2000), 277–292.
[16]
Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford university press.
[17]
Olivier Bousquet. 2002. A Bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Mathematique 334, 6 (2002), 495–500.
[18]
Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25, 2 (2001), 163–177.
[19]
Ulrik Brandes. 2008. On variants of shortest-path betweenness centrality and their generic computation. Social Networks 30, 2 (2008), 136–145.
[20]
Ulrik Brandes and Christian Pich. 2007. Centrality estimation in large networks. International Journal of Bifurcation and Chaos 17, 7 (2007), 2303–2318.
[21]
Mostafa Haghir Chehreghani, Albert Bifet, and Talel Abdessalem. 2018. Efficient exact and approximate algorithms for computing betweenness centrality in directed graphs. In Proceedings of the Advances in Knowledge Discovery and Data Mining. Dinh Phung, Vincent S. Tseng, Geoffrey I. Webb, Bao Ho, Mohadeseh Ganji, and Lida Rashidi (Eds.), Springer International Publishing, Cham, 752–764.
[22]
Flavio Chierichetti, Anirban Dasgupta, Ravi Kumar, Silvio Lattanzi, and Tamás Sarlós. 2016. On sampling nodes in a network. In Proceedings of the 25th International Conference on World Wide Web. 471–481.
[23]
Flavio Chierichetti and Shahrzad Haddadan. 2018. On the complexity of sampling vertices uniformly from a graph. In Proceedings of the 45th International Colloquium on Automata, Languages, and Programming, 149:1--149:13.
[24]
Cyrus Cousins, Shahrzad Haddadan, and Eli Upfal. 2020. Making mean-estimation more efficient using an MCMC trace variance approach: DynaMITE. arXiv:2011.11129. Retrieved from https://arxiv.org/abs/2011.11129.
[25]
Cyrus Cousins and Matteo Riondato. 2020. Sharp uniform convergence bounds through empirical centralization. In Proceedings of the Advances in Neural Information Processing Systems. H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33, Curran Associates, Inc., 15123–15132. Retrieved from https://proceedings.neurips.cc/paper/2020/file/ac457ba972fb63b7994befc83f774746-Paper.pdf.
[26]
Cyrus Cousins, Chloe Wohlgemuth, and Matteo Riondato. 2021. Betweenness centrality approximation with variance-aware rademacher averages. In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 196–206.
[27]
Alane M. de Lima, Murilo V. G. da Silva, and André L. Vignatti. 2020. Estimating the percolation centrality of large networks through pseudo-dimension theory. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1839--1847.
[28]
Lorenzo De Stefani and Eli Upfal. 2019. A rademacher complexity based method for controlling power and confidence level in adaptive statistical analysis. IEEE International Conference on Data Science and Advanced Analytics (DSAA), 71--80.
[29]
Shlomi Dolev, Yuval Elovici, and Rami Puzis. 2010. Routing betweenness centrality. Journal of the ACM 57, 4(2010), 27 pages.
[30]
Dóra Erdős, Vatche Ishakian, Azer Bestavros, and Evimaria Terzi. 2015. A divide-and-conquer algorithm for betweenness centrality. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 433–441.
[31]
Changjun Fan, Li Zeng, Yuhui Ding, Muhao Chen, Yizhou Sun, and Zhong Liu. 2019. Learning to identify high betweenness centrality nodes from scratch. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 559--568. DOI:
[32]
Linton C. Freeman. 1977. A set of measures of centrality based on betweenness. Sociometry 40, 1 (1977), 35–41.
[33]
Robert Geisberger, Peter Sanders, and Dominik Schultes. 2008. Better approximation of betweenness centrality. In Proceedings of the 10th Workshop on Algorithm Engineering and Experiments. SIAM, 90–100.
[34]
Jay Ghurye and Mihai Pop. 2016. Better identification of repeats in metagenomic scaffolding. In Proceedings of the WABI 2016: Algorithms in Bioinformatics. Springer, 174–184.
[35]
Oded Green, Robert McColl, and David A. Bader. 2012. A fast algorithm for streaming betweenness centrality. In Proceedings of the 2012 International Conference on Privacy, Security, Risk and Trust. IEEE, 11–20.
[36]
David Haussler. 1995. Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension. Journal of Combinatorial Theory, Series A 69, 2 (1995), 217–232.
[37]
Takanori Hayashi, Takuya Akiba, and Yuichi Yoshida. 2015. Fully dynamic betweenness centrality maintenance on massive networks. Proceedings of the VLDB Endowment 9, 2 (2015), 48–59.
[38]
Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 301 (1963), 13–30.
[39]
Riko Jacob, Dirk Koschützki, KatharinaAnna Lehmann, Leon Peeters, and Dagmar Tenfelde-Podehl. 2005. Algorithms for centrality indices. In Proceedings of the Network Analysis.Ulrik Brandes and Thomas Erlebach (Eds.), Lecture Notes in Computer Science, Vol. 3418, Springer Berlin, 62–82.
[40]
George H. John and Pat Langley. 1996. Static versus dynamic sampling for data mining. In Proceedings of the 2nd Int. Conf. Knowl. Disc. Data Mining. The AAAI Press, Menlo Park, CA, 367–370.
[41]
Miray Kas, Matthew Wachs, Kathleen M. Carley, and L. Richard Carley. 2013. Incremental algorithm for updating betweenness centrality in dynamically growing networks. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE/ACM, 33–40.
[42]
Liran Katzir, Edo Liberty, Oren Somekh, and Ioana A. Cosma. 2014. Estimating sizes of social networks via biased sampling. Internet Mathematics 10, 3–4 (2014), 335–359.
[43]
Vladimir Koltchinskii. 2001. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory 47, 5(2001), 1902–1914.
[44]
Aryeh Kontorovich and Iosif Pinelis. 2019. Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model. The Annals of Statistics 47, 5 (2019), 2822--2854.
[45]
Nicolas Kourtellis, Tharaka Alahakoon, Ramanuja Simha, Adriana Iamnitchi, and Rahul Tripathi. 2012. Identifying high betweenness centrality nodes in large social networks. Social Network Analysis and Mining 3, 4 (2012), 899–914.
[46]
Nicolas Kourtellis, Gianmarco De Francisci Morales, and Francesco Bonchi. 2015. Scalable online betweenness centrality in evolving graphs. IEEE Transactions on Knowledge and Data Engineering 27, 9 (2015), 2494–2506.
[47]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http://snap.stanford.edu/data. Accessed January 2023.
[48]
Yixia Li, Shudong Li, Yanshan Chen, Peiyan He, Xiaobo Wu, and Weihong Han. 2019. Electric power grid invulnerability under intentional edge-based attacks. In Proceedings of the DependSys 2019: Dependability in Sensor, Cloud, and Big Data Systems and Applications. Springer Singapore, 454–461.
[49]
Yeon-sup Lim, Daniel S. Menasche, Bruno Ribeiro, Don Towsley, and Prithwish Basu. 2011. Online estimating the k central nodes of a network. In Proceedings of the IEEE Network Science Workshop. 118–122.
[50]
Arun S. Maiya and Tanya Y. Berger-Wolf. 2010. Online sampling of high centrality individuals in social networks. In Proceedings of the Advances in Knowl. Disc. Data Mining. Springer Berlin, 91–98.
[51]
John Matta, Gunes Ercal, and Koushik Sinha. 2019. Comparing the speed and accuracy of approaches to betweenness centrality approximation. Computational Social Networks 6, 1 (2019), 2.
[52]
Adam McLaughlin and David A. Bader. 2014. Scalable and high performance betweenness centrality on the GPU. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis (2014), 572--583.
[53]
Mark E. J. Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical Review E 69(2004), 026113. Issue 2.
[54]
Tore Opsahl, Filip Agneessens, and John Skvoretz. 2010. Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks 32, 3 (2010), 245–251.
[55]
Leonardo Pellegrina, Cyrus Cousins, Fabio Vandin, and Matteo Riondato. 2020. MCRapper: Monte-Carlo rademacher averages for poset families and approximate pattern mining. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.Association for Computing Machinery, New York, NY, 2165–2174. DOI:
[56]
Jürgen Pfeffer and Kathleen M. Carley. 2012. k-Centralities: Local approximations of global measures based on shortest paths. In Proceedings of the 21st International Conference on World Wide Web. ACM, New York, NY, 1043–1050.
[57]
David Pollard. 1984. Convergence of Stochastic Processes. Springer-Verlag.
[58]
Matteo Pontecorvi and Vijaya Ramachandran. 2015. Fully dynamic betweenness centrality. In Proceedings of the 26th International Symposium on Algorithms and Computation. 331–342.
[59]
Dimitrios Prountzos and Keshav Pingali. 2013. Betweenness centrality: Algorithms and implementations. In Proceedings of the 18th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming.ACM, New York, NY, 35–46.
[60]
Matteo Riondato and Evgenios M. Kornaropoulos. 2016. Fast approximation of betweenness centrality through sampling. Data Mining and Knowledge Discovery 30, 2 (2016), 438–475.
[61]
Matteo Riondato and Eli Upfal. 2018. ABRA: Approximating betweenness centrality in static and dynamic graphs with rademacher averages. ACM Transactions on Knowledge Discovery from Data 12, 5 (2018), 61.
[62]
Ahmet Erdem Sarıyüce, Kamer Kaya, Erik Saule, and Ümit V. Çatalyürek. 2017. Graph manipulations for fast centrality computation. ACM Transactions on Knowledge Discovery from Data 11, 3 (2017), 1–25.
[63]
Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.
[64]
Nathan Srebro and Karthik Sridharan. 2010. Note on Refined Dudley Integral Covering Number Bound. (2010). Retrieved from http://www.cs.cornell.edu/sridharan/dudley.pdf.
[65]
Christian L. Staudt, Aleksejs Sazonovs, and Henning Meyerhenke. 2016. NetworKit: An interactive tool suite for high-performance network analysis. Network Science 4, 4 (2016), 508--530. http://www.cs.cornell.edu/∼sridharan/dudley.pdf. Accessed January 2023.
[66]
Volker Strassen. 1969. Gaussian elimination is not optimal. Numerische Mathematik 13, 4 (1969), 354–356.
[67]
Vladimir N. Vapnik. 1998. Statistical Learning Theory. Wiley.
[68]
Vladimir N. Vapnik and Alexey J. Chervonenkis. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16, 2 (1971), 264–280.
[69]
Yuichi Yoshida. 2014. Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 1416–1425.

Cited By

View all
  • (2024)Keyword-Based Betweenness Centrality Maximization in Attributed GraphsDatabases Theory and Applications10.1007/978-981-96-1242-0_16(209-223)Online publication date: 17-Dec-2024
  • (2023) SILVAN: Estimating Betweenness Centralities with Progressive Sampling and Non-uniform Rademacher BoundsACM Transactions on Knowledge Discovery from Data10.1145/362860118:3(1-55)Online publication date: 9-Dec-2023
  • (2023)Estimation and update of betweenness centrality with progressive algorithm and shortest paths approximationScientific Reports10.1038/s41598-023-44392-013:1Online publication date: 10-Oct-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 17, Issue 6
July 2023
392 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3582889
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 March 2023
Online AM: 20 December 2022
Accepted: 09 December 2022
Revised: 24 September 2022
Received: 02 December 2021
Published in TKDD Volume 17, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Concentration bounds
  2. dynamic graphs
  3. percolation centrality
  4. random sampling
  5. sample complexity
  6. statistical learning theory

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation
  • DARPA/ARFL

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)146
  • Downloads (Last 6 weeks)10
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Keyword-Based Betweenness Centrality Maximization in Attributed GraphsDatabases Theory and Applications10.1007/978-981-96-1242-0_16(209-223)Online publication date: 17-Dec-2024
  • (2023) SILVAN: Estimating Betweenness Centralities with Progressive Sampling and Non-uniform Rademacher BoundsACM Transactions on Knowledge Discovery from Data10.1145/362860118:3(1-55)Online publication date: 9-Dec-2023
  • (2023)Estimation and update of betweenness centrality with progressive algorithm and shortest paths approximationScientific Reports10.1038/s41598-023-44392-013:1Online publication date: 10-Oct-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media