skip to main content
research-article
Public Access

Statistical Algorithms and a Lower Bound for Detecting Planted Cliques

Published: 15 April 2017 Publication History

Abstract

We introduce a framework for proving lower bounds on computational problems over distributions against algorithms that can be implemented using access to a statistical query oracle. For such algorithms, access to the input distribution is limited to obtaining an estimate of the expectation of any given function on a sample drawn randomly from the input distribution rather than directly accessing samples. Most natural algorithms of interest in theory and in practice, for example, moments-based methods, local search, standard iterative methods for convex optimization, MCMC, and simulated annealing, can be implemented in this framework. Our framework is based on, and generalizes, the statistical query model in learning theory [Kearns 1998].
Our main application is a nearly optimal lower bound on the complexity of any statistical query algorithm for detecting planted bipartite clique distributions (or planted dense subgraph distributions) when the planted clique has size O(n1/2 − δ) for any constant δ > 0. The assumed hardness of variants of these problems has been used to prove hardness of several other problems and as a guarantee for security in cryptographic applications. Our lower bounds provide concrete evidence of hardness, thus supporting these assumptions.

References

[1]
N. Alon, A. Andoni, T. Kaufman, K. Matulef, R. Rubinfeld, and N. Xie. 2007. Testing k-wise and almost k-wise independence. In STOC. 496--505.
[2]
Noga Alon, Michael Krivelevich, and Benny Sudakov. 1998. Finding a large hidden clique in a random graph. In SODA. 594--598.
[3]
Brendan P. W. Ames and Stephen A. Vavasis. 2011. Nuclear norm minimization for the planted clique and biclique problems. Math. Program. 129, 1 (2011), 69--89.
[4]
Benny Applebaum, Boaz Barak, and Avi Wigderson. 2010. Public-key cryptography from different assumptions. In STOC. 171--180.
[5]
Sanjeev Arora, Boaz Barak, Markus Brunnermeier, and Rong Ge. 2010. Computational complexity and information asymmetry in financial products (extended abstract). In ICS. 49--65.
[6]
P. Bartlett and S. Mendelson. 2002. Rademacher and gaussian Complexities: Risk Bounds and Structural Results. J. Mach. Learn. Res. 3 (2002), 463--482.
[7]
Alexandre Belloni, Robert M. Freund, and Santosh Vempala. 2009. An efficient rescaled perceptron algorithm for conic systems. Math. Oper. Res. 34, 3 (2009), 621--641.
[8]
Shai Ben-David and Eli Dichterman. 1998. Learning with restricted focus of attention. J. Comput. Syst. Sci. 56, 3 (1998), 277--298.
[9]
Quentin Berthet and Philippe Rigollet. 2013. Complexity theoretic lower bounds for sparse principal component detection. In COLT. 1046--1066.
[10]
Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and Aravindan Vijayaraghavan. 2010. Detecting high log-densities: An o(n1/4) approximation for densest k-subgraph. In STOC. 201--210.
[11]
Aditya Bhaskara, Moses Charikar, Aravindan Vijayaraghavan, Venkatesan Guruswami, and Yuan Zhou. 2012. Polynomial integrality gaps for strong SDP relaxations of densest k-subgraph. In SODA. 388--405.
[12]
A. Blum, C. Dwork, F. McSherry, and K. Nissim. 2005. Practical privacy: The SuLQ framework. In PODS. 128--138.
[13]
Avrim Blum, Alan M. Frieze, Ravi Kannan, and Santosh Vempala. 1998. A polynomial-time algorithm for learning noisy linear threshold functions. Algorithmica 22, 1/2 (1998), 35--52.
[14]
Avrim Blum, Merrick L. Furst, Jeffrey C. Jackson, Michael J. Kearns, Yishay Mansour, and Steven Rudich. 1994. Weakly learning DNF and characterizing statistical query learning using fourier analysis. In STOC. 253--262.
[15]
Guy Bresler, David Gamarnik, and Devavrat Shah. 2014. Structure learning of antiferromagnetic Ising models. In NIPS. 2852--2860.
[16]
S. Brubaker and S. Vempala. 2009. Random tensors and planted cliques. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. Vol. 5687. 406--419.
[17]
T. T. Cai, T. Liang, and A. Rakhlin. 2015. Computational and statistical boundaries for submatrix localization in a large noisy matrix. ArXiv E-prints (Feb. 2015).
[18]
C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun. 2006. Map-reduce for machine learning on multicore. In NIPS. 281--288.
[19]
Amin Coja-Oghlan. 2010. Graph partitioning via adaptive spectral techniques. Combin. ProbabComput. 19, 2 (2010), 227--284.
[20]
Y. Dekel, O. Gurel-Gurevich, and Y. Peres. 2011. Finding hidden cliques in linear time with high probability. In ANALCO. 67--75.
[21]
A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39, 1 (1977), 1--38.
[22]
Yash Deshpande and Andrea Montanari. 2015a. Finding hidden cliques of size N/e in nearly linear time. Found. Comput. Math. 15, 4 (Aug. 2015), 1069--1128.
[23]
Yash Deshpande and Andrea Montanari. 2015b. Improved sum-of-squares lower bounds for hidden clique and hidden submatrix problems. In COLT. 523--562.
[24]
Shaddin Dughmi. 2014. On the hardness of signaling. In FOCS. 354--363.
[25]
John Dunagan and Santosh Vempala. 2008. A simple polynomial-time rescaling algorithm for solving linear programs. Math. Program. 114, 1 (2008), 101--114.
[26]
Uriel Feige. 2002. Relations between average case complexity and approximation complexity. In IEEE Conference on Computational Complexity. 5.
[27]
U. Feige and R. Krauthgamer. 2000. Finding and certifying a large hidden clique in a semirandom graph. Random Struct. Algor. 16, 2 (2000), 195--208.
[28]
Uriel Feige and Robert Krauthgamer. 2003. The probable value of the Lovász--Schrijver relaxations for maximum independent set. SICOMP 32, 2 (2003), 345--370.
[29]
U. Feige and D. Ron. 2010. Finding hidden cliques in linear time. In AofA. 189--204.
[30]
V. Feldman. 2008. Evolvability from learning algorithms. In STOC. 619--628.
[31]
V. Feldman. 2012. A complete characterization of statistical query learning with applications to evolvability. J. Comput. Syst. Sci. 78, 5 (2012), 1444--1459.
[32]
Vitaly Feldman. 2014. Open problem: The statistical query complexity of learning sparse halfspaces. In COLT. 1283--1289.
[33]
Vitaly Feldman. 2016. A general characterization of the statistical query complexity. CoRR abs/1608.02198 (2016). Retrieved from http://arxiv.org/abs/1608.02198.
[34]
Vitaly Feldman, Cristobal Guzman, and Santosh Vempala. 2015. Statistical query algorithms for stochastic convex optimization. CoRR abs/1512.09170 (2015). Extended abstract in SODA 2017.
[35]
Vitaly Feldman, Will Perkins, and Santosh Vempala. 2013. On the complexity of random satisfiability problems with planted solutions. CoRR abs/1311.4821 (2013). Extended abstract in STOC 2015.
[36]
Alan M. Frieze and Ravi Kannan. 2008. A new approach to the planted clique problem. In FSTTCS. 187--198.
[37]
C. Gao, Z. Ma, and H. H. Zhou. 2014. Sparse CCA: Adaptive estimation and computational barriers. ArXiv E-prints (Sept. 2014).
[38]
A. E. Gelfand and A. F. M. Smith. 1990. Sampling based approaches to calculating marginal densities. J. Am. Statist. Assoc. 85 (1990), 398--409.
[39]
Bruce E. Hajek, Yihong Wu, and Jiaming Xu. 2015. Computational lower bounds for community detection on random graphs. In COLT. 899--928. Retrieved from http://jmlr.org/proceedings/papers/v40/Hajek15.html.
[40]
Johan Håstad. 2001. Some optimal inapproximability results. J. ACM 48 (July 2001), 798--859. Issue 4.
[41]
W. K. Hastings. 1970. Monte carlo sampling methods using markov chains and their applications. Biometrika 57, 1 (1970), 97--109.
[42]
Elad Hazan and Robert Krauthgamer. 2011. How hard is it to approximate the best nash equilibrium? SIAM J. Comput. 40, 1 (2011), 79--91.
[43]
Mark Jerrum. 1992. Large cliques elude the metropolis process. Rand. Struct. Algor. 3, 4 (1992), 347--360.
[44]
Ari Juels and Marcus Peinado. 2000. Hiding cliques for cryptographic security. Des. Codes Cryptogr. 20, 3 (2000), 269--280.
[45]
Ravi Kannan. 2008. Personal communication.
[46]
R. Karp. 1979. Probabilistic analysis of graph-theoretic algorithms. In Proceedings of Computer Science and Statistics 12th Annual Symposium on the Interface. 173.
[47]
Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2011. What can we learn privately? SIAM J. Comput. 40, 3 (June 2011), 793--826.
[48]
M. Kearns. 1998. Efficient noise-tolerant Learning from statistical queries. J. ACM 45, 6 (1998), 983--1006.
[49]
Subhash Khot. 2004. Ruling out PTAS for graph min-bisection, densest subgraph and bipartite clique. In FOCS. 136--145.
[50]
Scott Kirkpatrick, D. Gelatt Jr., and Mario P. Vecchi. 1983. Optimization by simmulated annealing. Science 220, 4598 (1983), 671--680.
[51]
Ludek Kucera. 1995. Expected complexity of graph partitioning problems. Discr. Appl. Math. 57, 2--3 (1995), 193--212.
[52]
Zongming Ma and Yihong Wu. 2015. Computational barriers in minimax submatrix detection. Annals of Statistics 43, 3 (2015), 1089--1116.
[53]
F. McSherry. 2001. Spectral partitioning of random graphs. In FOCS. 529--537.
[54]
R. Meka, A. Potechin, and A. Wigderson. 2015. Sum-of-squares lower bounds for planted clique. In STOC. 87--96.
[55]
Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. 1953. Equations of state calculations by fast computing machines. J. Chem. Phys. 21 (1953), 1087--1092.
[56]
L. Minder and D. Vilenchik. 2009. Small clique detection and approximate Nash equilibria. 5687 (2009), 673--685.
[57]
K. Pearson. 1900. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. Ser. 5 50, 302 (1900), 157--175.
[58]
Bart Selman, Henry Kautz, and Bram Cohen. 1995. Local search strategies for satisfiability testing. In DIMACS Series in Discrete Mathematics and Theoretical Computer Science. 521--532.
[59]
R. Servedio. 2000. Computational sample complexity and attribute-efficient learning. J. Comput. Syst. Sci. 60, 1 (2000), 161--178.
[60]
Jacob Steinhardt and John C. Duchi. 2015. Minimax rates for memory-bounded sparse linear regression. In COLT. 1564--1587. Retrieved from http://jmlr.org/proceedings/papers/v40/Steinhardt15.html.
[61]
J. Steinhardt, G. Valiant, and S. Wager. 2016. Memory, communication, and statistical queries. In COLT. 1490--1516.
[62]
Balázs Szörényi. 2009. Characterizing statistical query learning: Simplified notions and proofs. In ALT. 186--200.
[63]
M. Tanner and W. Wong. 1987. The calculation of posterior distributions by data augmentation (with discussion). J. Amer. Stat. Assoc. 82 (1987), 528--550.
[64]
Leslie G. Valiant. 1984. A theory of the learnable. Commun. ACM 27, 11 (1984), 1134--1142.
[65]
V. Vapnik and A. Chervonenkis. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16, 2 (1971), 264--280.
[66]
V. Černý. 1985. Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. J. Optim. Theory Appl. 45, 1 (Jan. 1985), 41--51.
[67]
T. Wang, Q. Berthet, and R. J. Samworth. 2014. Statistical and computational trade-offs in estimation of sparse principal components. ArXiv E-prints (Aug. 2014).
[68]
Ke Yang. 2001. On learning correlated boolean functions using statistical queries. In ALT. 59--76.
[69]
Ke Yang. 2005. New lower bounds for statistical query learning. J. Comput. Syst. Sci. 70, 4 (2005), 485--509.
[70]
Andrew Yao. 1977. Probabilistic computations: Toward a unified measure of complexity. In FOCS. 222--227.
[71]
Yuchen Zhang, John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2013. Information-theoretic lower bounds for distributed statistical estimation with communication constraints. In NIPS. 2328--2336.

Cited By

View all
  • (2025)Sharp thresholds in inference of planted subgraphsThe Annals of Applied Probability10.1214/24-AAP212035:1Online publication date: 1-Feb-2025
  • (2025)Subsampling Suffices for Adaptive Data AnalysisJournal of the ACM10.1145/369810472:1(1-45)Online publication date: 25-Jan-2025
  • (2025)Almost‐Linear Planted Cliques Elude the Metropolis ProcessRandom Structures & Algorithms10.1002/rsa.2127466:2Online publication date: 21-Feb-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 64, Issue 2
April 2017
277 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/3080497
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 April 2017
Accepted: 01 January 2017
Revised: 01 August 2016
Received: 01 June 2015
Published in JACM Volume 64, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Learning theory
  2. lower bounds
  3. planted clique
  4. statistical algorithms
  5. statistical dimension

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)207
  • Downloads (Last 6 weeks)26
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Sharp thresholds in inference of planted subgraphsThe Annals of Applied Probability10.1214/24-AAP212035:1Online publication date: 1-Feb-2025
  • (2025)Subsampling Suffices for Adaptive Data AnalysisJournal of the ACM10.1145/369810472:1(1-45)Online publication date: 25-Jan-2025
  • (2025)Almost‐Linear Planted Cliques Elude the Metropolis ProcessRandom Structures & Algorithms10.1002/rsa.2127466:2Online publication date: 21-Feb-2025
  • (2024)Computational lower bounds for graphon estimation via low-degree polynomialsThe Annals of Statistics10.1214/24-AOS243752:5Online publication date: 1-Oct-2024
  • (2024)Tensor factor model estimation by iterative projectionThe Annals of Statistics10.1214/24-AOS241252:6Online publication date: 1-Dec-2024
  • (2024)The landscape of the planted clique problem: Dense subgraphs and the overlap gap propertyThe Annals of Applied Probability10.1214/23-AAP200334:4Online publication date: 1-Aug-2024
  • (2024)Planted Clique Conjectures Are EquivalentProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649751(358-366)Online publication date: 10-Jun-2024
  • (2024)Planted Bipartite Graph DetectionIEEE Transactions on Information Theory10.1109/TIT.2024.338222870:6(4319-4334)Online publication date: Jun-2024
  • (2024)Communication-Constrained Hypothesis Testing: Optimality, Robustness, and Reverse Data Processing InequalitiesIEEE Transactions on Information Theory10.1109/TIT.2023.333402470:1(389-414)Online publication date: 1-Jan-2024
  • (2024)Semirandom Planted Clique and the Restricted Isometry Property2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS61266.2024.00064(959-969)Online publication date: 27-Oct-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media