research-article

Graph cluster randomization: network exposure to multiple universes

Authors:
Johan Ugander

Cornell University, Ithaca, NY, USA

Cornell University, Ithaca, NY, USA
View Profile

,
Brian Karrer

Facebook, Menlo Park, CA, USA

Facebook, Menlo Park, CA, USA
View Profile

,
Lars Backstrom

Facebook, Menlo Park, CA, USA

Facebook, Menlo Park, CA, USA
View Profile

,
Jon Kleinberg

Cornell University, Ithaca, NY, USA

Cornell University, Ithaca, NY, USA
View Profile

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2013Pages 329–337https://doi.org/10.1145/2487575.2487695

Published:11 August 2013Publication History

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 329–337

ABSTRACT

A/B testing is a standard approach for evaluating the effect of online experiments; the goal is to estimate the `average treatment effect' of a new feature or condition by exposing a sample of the overall population to it. A drawback with A/B testing is that it is poorly suited for experiments involving social interference, when the treatment of individuals spills over to neighboring individuals along an underlying social network. In this work, we propose a novel methodology using graph clustering to analyze average treatment effects under social interference. To begin, we characterize graph-theoretic conditions under which individuals can be considered to be `network exposed' to an experiment. We then show how graph cluster randomization admits an efficient exact algorithm to compute the probabilities for each vertex being network exposed under several of these exposure conditions. Using these probabilities as inverse weights, a Horvitz-Thompson estimator can then provide an effect estimate that is unbiased, provided that the exposure model has been properly specified.

Given an estimator that is unbiased, we focus on minimizing the variance. First, we develop simple sufficient conditions for the variance of the estimator to be asymptotically small in n, the size of the graph. However, for general randomization schemes, this variance can be lower bounded by an exponential function of the degrees of a graph. In contrast, we show that if a graph satisfies a restricted-growth condition on the growth rate of neighborhoods, then there exists a natural clustering algorithm, based on vertex neighborhoods, for which the variance of the estimator can be upper bounded by a linear function of the degrees. Thus we show that proper cluster randomization can lead to exponentially lower estimator variance when experimentally measuring average treatment effects under interference.

References

E. Airoldi, E. Kao, P. Toulis, D. Rubin. Causal estimation of peer influence effects. In ICML, 2013.Google Scholar
P. Aronow and C. Samii. Estimating average causal effects under general interference. Working Paper, September 2012.Google Scholar
L. Backstrom and J. Kleinberg. Network bucket testing. In WWW, 2011. Google ScholarDigital Library
B. Bollobás. Random graphs. Cambridge Univ. Press, 2001.Google ScholarCross Ref
D. Cellai, A. Lawlor, K. Dawson, J. Gleeson. Critical phenomena in heterogeneous k-core percolation. Phys Rev E, 87(2):022134, 2013.Google ScholarCross Ref
S. Fienberg. A brief history of statistical models for network analysis and open challenges. J. Comp. Graph. Stat., 2012.Google ScholarCross Ref
S. Fortunato. Community detection in graphs. Physics Reports, 486(3):75--174, 2010.Google ScholarCross Ref
A. Gupta, R. Krauthgamer, J. Lee. Bounded geometries, fractals, and low-distortion embeddings. In FOCS, 2003. Google ScholarDigital Library
D. Horvitz, D. Thompson. A generalization of sampling without replacement from a finite universe. JASA, 1952.Google ScholarCross Ref
D. Karger, M. Ruhl. Finding nearest neighbors in growth-restricted metrics. In STOC, 2002. Google ScholarDigital Library
L. Katzir, E. Liberty, O. Somekh. Framework and algorithms for network bucket testing. In WWW, 2012. Google ScholarDigital Library
R. Kohavi, A. Deng, B. Frasca, R. Longbotham, T. Walker, Y. Xu. Trustworthy online controlled experiments: five puzzling outcomes explained. In KDD, 2012. Google ScholarDigital Library
C. Manski. Identification of treatment response with social interactions. The Econometrics Journal, 16(1):S1--S23, 2013.Google ScholarCross Ref
D. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. J. Ed. Psych., 1974.Google ScholarCross Ref
E. Tchetgen, T. VanderWeele. On causal inference in the presence of interference. Stat. Meth. Med. Res., 2012.Google ScholarCross Ref
J. Ugander, L. Backstrom. Balanced label propagation for partitioning massive graphs. In WSDM, 2013. Google ScholarDigital Library
D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393(6684):440--442, 1998.Google ScholarCross Ref

Index Terms

Graph cluster randomization: network exposure to multiple universes
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms
  2. Probability and statistics
    1. Statistical paradigms
      1. Exploratory data analysis
2. Theory of computation
  1. Randomness, geometry and discrete structures

Recommendations

Testing Cluster Structure of Graphs
STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of Computing

We study the problem of recognizing the cluster structure of a graph in the framework of property testing in the bounded degree model. Given a parameter ε, a d-bounded degree graph is defined to be (k, φ)-clusterable, if it can be partitioned into no ...
Read More
Learning Causal Effects on Hypergraphs
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Hypergraphs provide an effective abstraction for modeling multi-way group interactions among nodes, where each hyperedge can connect any number of nodes. Different from most existing studies which leverage statistical dependencies, we study hypergraphs ...
Read More
Dense subgraph mining with a mixed graph model

In this paper we introduce a graph clustering method based on dense bipartite subgraph mining. The method applies a mixed graph model (both standard and bipartite) in a three-phase algorithm. First a seed mining method is applied to find seeds of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2013
1534 pages
ISBN:9781450321747
DOI:10.1145/2487575
Editors:
Rayid Ghani
University of Chicago
,
Ted E. Senator
SAIC
,
Paul Bradley
MethodCare, Inc.
,
Rajesh Parekh
Groupon
,
Jingrui He
Stevens Institute of Technology
,
General Chairs:
Robert L. Grossman
University of Chicago and Open Data Group
,
Ramasamy Uthurusamy
General Motors Corporation (retired)
,
Program Chairs:
Inderjit S. Dhillon
University of Texas
,
Yehuda Koren
Google
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
a/b testing
bucket testing
causal inference
graph clustering
interference
network effects
social networks
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '13 Paper Acceptance Rate125of726submissions,17%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 103
  Total Citations
  View Citations
- 1,193
  Total Downloads
- Downloads (Last 12 months)95
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Graph cluster randomization: network exposure to multiple universes

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Testing Cluster Structure of Graphs

Learning Causal Effects on Hypergraphs

Dense subgraph mining with a mixed graph model

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Graph cluster randomization: network exposure to multiple universes

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Testing Cluster Structure of Graphs

Learning Causal Effects on Hypergraphs

Dense subgraph mining with a mixed graph model

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media