research-article

Public Access

Statistical Algorithms and a Lower Bound for Detecting Planted Cliques

Authors:

Vitaly Feldman,

Elena Grigorescu,

Santosh S. Vempala,

Ying XiaoAuthors Info & Claims

Journal of the ACM (JACM), Volume 64, Issue 2

Article No.: 8, Pages 1 - 37

https://doi.org/10.1145/3046674

Published: 15 April 2017 Publication History

Abstract

We introduce a framework for proving lower bounds on computational problems over distributions against algorithms that can be implemented using access to a statistical query oracle. For such algorithms, access to the input distribution is limited to obtaining an estimate of the expectation of any given function on a sample drawn randomly from the input distribution rather than directly accessing samples. Most natural algorithms of interest in theory and in practice, for example, moments-based methods, local search, standard iterative methods for convex optimization, MCMC, and simulated annealing, can be implemented in this framework. Our framework is based on, and generalizes, the statistical query model in learning theory [Kearns 1998].

Our main application is a nearly optimal lower bound on the complexity of any statistical query algorithm for detecting planted bipartite clique distributions (or planted dense subgraph distributions) when the planted clique has size O(n^{1/2 − δ}) for any constant δ > 0. The assumed hardness of variants of these problems has been used to prove hardness of several other problems and as a guarantee for security in cryptographic applications. Our lower bounds provide concrete evidence of hardness, thus supporting these assumptions.

References

[1]

N. Alon, A. Andoni, T. Kaufman, K. Matulef, R. Rubinfeld, and N. Xie. 2007. Testing k-wise and almost k-wise independence. In STOC. 496--505.

Digital Library

[2]

Noga Alon, Michael Krivelevich, and Benny Sudakov. 1998. Finding a large hidden clique in a random graph. In SODA. 594--598.

[3]

Brendan P. W. Ames and Stephen A. Vavasis. 2011. Nuclear norm minimization for the planted clique and biclique problems. Math. Program. 129, 1 (2011), 69--89.

Digital Library

[4]

Benny Applebaum, Boaz Barak, and Avi Wigderson. 2010. Public-key cryptography from different assumptions. In STOC. 171--180.

Digital Library

[5]

Sanjeev Arora, Boaz Barak, Markus Brunnermeier, and Rong Ge. 2010. Computational complexity and information asymmetry in financial products (extended abstract). In ICS. 49--65.

[6]

P. Bartlett and S. Mendelson. 2002. Rademacher and gaussian Complexities: Risk Bounds and Structural Results. J. Mach. Learn. Res. 3 (2002), 463--482.

Digital Library

[7]

Alexandre Belloni, Robert M. Freund, and Santosh Vempala. 2009. An efficient rescaled perceptron algorithm for conic systems. Math. Oper. Res. 34, 3 (2009), 621--641.

Digital Library

[8]

Shai Ben-David and Eli Dichterman. 1998. Learning with restricted focus of attention. J. Comput. Syst. Sci. 56, 3 (1998), 277--298.

Digital Library

[9]

Quentin Berthet and Philippe Rigollet. 2013. Complexity theoretic lower bounds for sparse principal component detection. In COLT. 1046--1066.

[10]

Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and Aravindan Vijayaraghavan. 2010. Detecting high log-densities: An o(n^1/4) approximation for densest k-subgraph. In STOC. 201--210.

[11]

Aditya Bhaskara, Moses Charikar, Aravindan Vijayaraghavan, Venkatesan Guruswami, and Yuan Zhou. 2012. Polynomial integrality gaps for strong SDP relaxations of densest k-subgraph. In SODA. 388--405.

[12]

A. Blum, C. Dwork, F. McSherry, and K. Nissim. 2005. Practical privacy: The SuLQ framework. In PODS. 128--138.

[13]

Avrim Blum, Alan M. Frieze, Ravi Kannan, and Santosh Vempala. 1998. A polynomial-time algorithm for learning noisy linear threshold functions. Algorithmica 22, 1/2 (1998), 35--52.

[14]

Avrim Blum, Merrick L. Furst, Jeffrey C. Jackson, Michael J. Kearns, Yishay Mansour, and Steven Rudich. 1994. Weakly learning DNF and characterizing statistical query learning using fourier analysis. In STOC. 253--262.

Digital Library

[15]

Guy Bresler, David Gamarnik, and Devavrat Shah. 2014. Structure learning of antiferromagnetic Ising models. In NIPS. 2852--2860.

[16]

S. Brubaker and S. Vempala. 2009. Random tensors and planted cliques. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. Vol. 5687. 406--419.

Digital Library

[17]

T. T. Cai, T. Liang, and A. Rakhlin. 2015. Computational and statistical boundaries for submatrix localization in a large noisy matrix. ArXiv E-prints (Feb. 2015).

[18]

C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun. 2006. Map-reduce for machine learning on multicore. In NIPS. 281--288.

[19]

Amin Coja-Oghlan. 2010. Graph partitioning via adaptive spectral techniques. Combin. ProbabComput. 19, 2 (2010), 227--284.

Digital Library

[20]

Y. Dekel, O. Gurel-Gurevich, and Y. Peres. 2011. Finding hidden cliques in linear time with high probability. In ANALCO. 67--75.

[21]

A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39, 1 (1977), 1--38.

[22]

Yash Deshpande and Andrea Montanari. 2015a. Finding hidden cliques of size N/e in nearly linear time. Found. Comput. Math. 15, 4 (Aug. 2015), 1069--1128.

Digital Library

[23]

Yash Deshpande and Andrea Montanari. 2015b. Improved sum-of-squares lower bounds for hidden clique and hidden submatrix problems. In COLT. 523--562.

[24]

Shaddin Dughmi. 2014. On the hardness of signaling. In FOCS. 354--363.

Digital Library

[25]

John Dunagan and Santosh Vempala. 2008. A simple polynomial-time rescaling algorithm for solving linear programs. Math. Program. 114, 1 (2008), 101--114.

Digital Library

[26]

Uriel Feige. 2002. Relations between average case complexity and approximation complexity. In IEEE Conference on Computational Complexity. 5.

Digital Library

[27]

U. Feige and R. Krauthgamer. 2000. Finding and certifying a large hidden clique in a semirandom graph. Random Struct. Algor. 16, 2 (2000), 195--208.

Digital Library

[28]

Uriel Feige and Robert Krauthgamer. 2003. The probable value of the Lovász--Schrijver relaxations for maximum independent set. SICOMP 32, 2 (2003), 345--370.

Digital Library

[29]

U. Feige and D. Ron. 2010. Finding hidden cliques in linear time. In AofA. 189--204.

[30]

V. Feldman. 2008. Evolvability from learning algorithms. In STOC. 619--628.

Digital Library

[31]

V. Feldman. 2012. A complete characterization of statistical query learning with applications to evolvability. J. Comput. Syst. Sci. 78, 5 (2012), 1444--1459.

Digital Library

[32]

Vitaly Feldman. 2014. Open problem: The statistical query complexity of learning sparse halfspaces. In COLT. 1283--1289.

[33]

Vitaly Feldman. 2016. A general characterization of the statistical query complexity. CoRR abs/1608.02198 (2016). Retrieved from http://arxiv.org/abs/1608.02198.

[34]

Vitaly Feldman, Cristobal Guzman, and Santosh Vempala. 2015. Statistical query algorithms for stochastic convex optimization. CoRR abs/1512.09170 (2015). Extended abstract in SODA 2017.

[35]

Vitaly Feldman, Will Perkins, and Santosh Vempala. 2013. On the complexity of random satisfiability problems with planted solutions. CoRR abs/1311.4821 (2013). Extended abstract in STOC 2015.

[36]

Alan M. Frieze and Ravi Kannan. 2008. A new approach to the planted clique problem. In FSTTCS. 187--198.

[37]

C. Gao, Z. Ma, and H. H. Zhou. 2014. Sparse CCA: Adaptive estimation and computational barriers. ArXiv E-prints (Sept. 2014).

[38]

A. E. Gelfand and A. F. M. Smith. 1990. Sampling based approaches to calculating marginal densities. J. Am. Statist. Assoc. 85 (1990), 398--409.

[39]

Bruce E. Hajek, Yihong Wu, and Jiaming Xu. 2015. Computational lower bounds for community detection on random graphs. In COLT. 899--928. Retrieved from http://jmlr.org/proceedings/papers/v40/Hajek15.html.

[40]

Johan Håstad. 2001. Some optimal inapproximability results. J. ACM 48 (July 2001), 798--859. Issue 4.

[41]

W. K. Hastings. 1970. Monte carlo sampling methods using markov chains and their applications. Biometrika 57, 1 (1970), 97--109.

[42]

Elad Hazan and Robert Krauthgamer. 2011. How hard is it to approximate the best nash equilibrium? SIAM J. Comput. 40, 1 (2011), 79--91.

Digital Library

[43]

Mark Jerrum. 1992. Large cliques elude the metropolis process. Rand. Struct. Algor. 3, 4 (1992), 347--360.

[44]

Ari Juels and Marcus Peinado. 2000. Hiding cliques for cryptographic security. Des. Codes Cryptogr. 20, 3 (2000), 269--280.

Digital Library

[45]

Ravi Kannan. 2008. Personal communication.

[46]

R. Karp. 1979. Probabilistic analysis of graph-theoretic algorithms. In Proceedings of Computer Science and Statistics 12th Annual Symposium on the Interface. 173.

[47]

Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2011. What can we learn privately? SIAM J. Comput. 40, 3 (June 2011), 793--826.

Digital Library

[48]

M. Kearns. 1998. Efficient noise-tolerant Learning from statistical queries. J. ACM 45, 6 (1998), 983--1006.

Digital Library

[49]

Subhash Khot. 2004. Ruling out PTAS for graph min-bisection, densest subgraph and bipartite clique. In FOCS. 136--145.

Digital Library

[50]

Scott Kirkpatrick, D. Gelatt Jr., and Mario P. Vecchi. 1983. Optimization by simmulated annealing. Science 220, 4598 (1983), 671--680.

[51]

Ludek Kucera. 1995. Expected complexity of graph partitioning problems. Discr. Appl. Math. 57, 2--3 (1995), 193--212.

Digital Library

[52]

Zongming Ma and Yihong Wu. 2015. Computational barriers in minimax submatrix detection. Annals of Statistics 43, 3 (2015), 1089--1116.

[53]

F. McSherry. 2001. Spectral partitioning of random graphs. In FOCS. 529--537.

[54]

R. Meka, A. Potechin, and A. Wigderson. 2015. Sum-of-squares lower bounds for planted clique. In STOC. 87--96.

Digital Library

[55]

Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. 1953. Equations of state calculations by fast computing machines. J. Chem. Phys. 21 (1953), 1087--1092.

[56]

L. Minder and D. Vilenchik. 2009. Small clique detection and approximate Nash equilibria. 5687 (2009), 673--685.

[57]

K. Pearson. 1900. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. Ser. 5 50, 302 (1900), 157--175.

[58]

Bart Selman, Henry Kautz, and Bram Cohen. 1995. Local search strategies for satisfiability testing. In DIMACS Series in Discrete Mathematics and Theoretical Computer Science. 521--532.

[59]

R. Servedio. 2000. Computational sample complexity and attribute-efficient learning. J. Comput. Syst. Sci. 60, 1 (2000), 161--178.

Digital Library

[60]

Jacob Steinhardt and John C. Duchi. 2015. Minimax rates for memory-bounded sparse linear regression. In COLT. 1564--1587. Retrieved from http://jmlr.org/proceedings/papers/v40/Steinhardt15.html.

[61]

J. Steinhardt, G. Valiant, and S. Wager. 2016. Memory, communication, and statistical queries. In COLT. 1490--1516.

[62]

Balázs Szörényi. 2009. Characterizing statistical query learning: Simplified notions and proofs. In ALT. 186--200.

[63]

M. Tanner and W. Wong. 1987. The calculation of posterior distributions by data augmentation (with discussion). J. Amer. Stat. Assoc. 82 (1987), 528--550.

[64]

Leslie G. Valiant. 1984. A theory of the learnable. Commun. ACM 27, 11 (1984), 1134--1142.

Digital Library

[65]

V. Vapnik and A. Chervonenkis. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16, 2 (1971), 264--280.

[66]

V. Černý. 1985. Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. J. Optim. Theory Appl. 45, 1 (Jan. 1985), 41--51.

Digital Library

[67]

T. Wang, Q. Berthet, and R. J. Samworth. 2014. Statistical and computational trade-offs in estimation of sparse principal components. ArXiv E-prints (Aug. 2014).

[68]

Ke Yang. 2001. On learning correlated boolean functions using statistical queries. In ALT. 59--76.

[69]

Ke Yang. 2005. New lower bounds for statistical query learning. J. Comput. Syst. Sci. 70, 4 (2005), 485--509.

Digital Library

[70]

Andrew Yao. 1977. Probabilistic computations: Toward a unified measure of complexity. In FOCS. 222--227.

[71]

Yuchen Zhang, John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2013. Information-theoretic lower bounds for distributed statistical estimation with communication constraints. In NIPS. 2328--2336.

Cited By

Mossel ENiles-Weed JSohn YSun NZadik I(2025)Sharp thresholds in inference of planted subgraphsThe Annals of Applied Probability10.1214/24-AAP212035:1Online publication date: 1-Feb-2025
https://doi.org/10.1214/24-AAP2120
Blanc G(2025)Subsampling Suffices for Adaptive Data AnalysisJournal of the ACM10.1145/369810472:1(1-45)Online publication date: 25-Jan-2025
https://dl.acm.org/doi/10.1145/3698104
Chen ZMossel EZadik I(2025)Almost‐Linear Planted Cliques Elude the Metropolis ProcessRandom Structures & Algorithms10.1002/rsa.2127466:2Online publication date: 21-Feb-2025
https://doi.org/10.1002/rsa.21274
Show More Cited By

Index Terms

Statistical Algorithms and a Lower Bound for Detecting Planted Cliques
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Random graphs
2. Theory of computation
  1. Models of computation
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Models of learning
      2. Sample complexity and generalization bounds

Recommendations

On the Complexity of Random Satisfiability Problems with Planted Solutions
STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of Computing

The problem of identifying a planted assignment given a random k-SAT formula consistent with the assignment exhibits a large algorithmic gap: while the planted solution can always be identified given a formula with O(n log n) clauses, there are ...
Statistical algorithms and a lower bound for detecting planted cliques
STOC '13: Proceedings of the forty-fifth annual ACM symposium on Theory of Computing

We introduce a framework for proving lower bounds on computational problems over distributions, based on a class of algorithms called statistical algorithms. For such algorithms, access to the input distribution is limited to obtaining an estimate of ...
Guaranteed Recovery of Planted Cliques and Dense Subgraphs by Convex Relaxation

We consider the problem of identifying the densest k-node subgraph in a given graph. We write this problem as an instance of rank-constrained cardinality minimization and then relax using the nuclear norm and one norm. Although the original ...

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM

Journal of the ACM Volume 64, Issue 2

April 2017

277 pages

ISSN:0004-5411

EISSN:1557-735X

DOI:10.1145/3080497

Editor:
Éva Tardos
Cornell University

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 April 2017

Accepted: 01 January 2017

Revised: 01 August 2016

Received: 01 June 2015

Published in JACM Volume 64, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Computing Research Association for the CIFellows Project. Research, NSF
National Science Foundation
Simons Postdoctoral Fellowship

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
1,044
Total Downloads

Downloads (Last 12 months)207
Downloads (Last 6 weeks)26

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mossel ENiles-Weed JSohn YSun NZadik I(2025)Sharp thresholds in inference of planted subgraphsThe Annals of Applied Probability10.1214/24-AAP212035:1Online publication date: 1-Feb-2025
https://doi.org/10.1214/24-AAP2120
Blanc G(2025)Subsampling Suffices for Adaptive Data AnalysisJournal of the ACM10.1145/369810472:1(1-45)Online publication date: 25-Jan-2025
https://dl.acm.org/doi/10.1145/3698104
Chen ZMossel EZadik I(2025)Almost‐Linear Planted Cliques Elude the Metropolis ProcessRandom Structures & Algorithms10.1002/rsa.2127466:2Online publication date: 21-Feb-2025
https://doi.org/10.1002/rsa.21274
Luo YGao C(2024)Computational lower bounds for graphon estimation via low-degree polynomialsThe Annals of Statistics10.1214/24-AOS243752:5Online publication date: 1-Oct-2024
https://doi.org/10.1214/24-AOS2437
Han YChen RYang DZhang C(2024)Tensor factor model estimation by iterative projectionThe Annals of Statistics10.1214/24-AOS241252:6Online publication date: 1-Dec-2024
https://doi.org/10.1214/24-AOS2412
Gamarnik DZadik I(2024)The landscape of the planted clique problem: Dense subgraphs and the overlap gap propertyThe Annals of Applied Probability10.1214/23-AAP200334:4Online publication date: 1-Aug-2024
https://doi.org/10.1214/23-AAP2003
Hirahara SShimizu NMohar BShinkar IO'Donnell R(2024)Planted Clique Conjectures Are EquivalentProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649751(358-366)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3618260.3649751
Rotenberg AHuleihel WShayevitz O(2024)Planted Bipartite Graph DetectionIEEE Transactions on Information Theory10.1109/TIT.2024.338222870:6(4319-4334)Online publication date: Jun-2024
https://doi.org/10.1109/TIT.2024.3382228
Pensia AJog VLoh P(2024)Communication-Constrained Hypothesis Testing: Optimality, Robustness, and Reverse Data Processing InequalitiesIEEE Transactions on Information Theory10.1109/TIT.2023.333402470:1(389-414)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIT.2023.3334024
Błasiok JBuhai RKothari PSteurer D(2024)Semirandom Planted Clique and the Restricted Isometry Property2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS61266.2024.00064(959-969)Online publication date: 27-Oct-2024
https://doi.org/10.1109/FOCS61266.2024.00064
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents