Skip to main content
Log in

Finding Hidden Cliques of Size \(\sqrt{N/e}\) in Nearly Linear Time

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

Consider an Erdös–Renyi random graph in which each edge is present independently with probability \(1/2\), except for a subset \(\mathsf{C}_N\) of the vertices that form a clique (a completely connected subgraph). We consider the problem of identifying the clique, given a realization of such a random graph. The algorithm of Dekel et al. (ANALCO. SIAM, pp 67–75, 2011) provably identifies the clique \(\mathsf{C}_N\) in linear time, provided \(|\mathsf{C}_N|\ge 1.261\sqrt{N}\). Spectral methods can be shown to fail on cliques smaller than \(\sqrt{N}\). In this paper we describe a nearly linear-time algorithm that succeeds with high probability for \(|\mathsf{C}_N|\ge (1+{\varepsilon })\sqrt{N/e}\) for any \({\varepsilon }>0\). This is the first algorithm that provably improves over spectral methods. We further generalize the hidden clique problem to other background graphs (the standard case corresponding to the complete graph on \(N\) vertices). For large-girth regular graphs of degree \((\varDelta +1)\) we prove that so-called local algorithms succeed if \(|\mathsf{C}_N|\ge (1+{\varepsilon })N/\sqrt{e\varDelta }\) and fail if \(|\mathsf{C}_N|\le (1-{\varepsilon })N/\sqrt{e\varDelta }\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. The problem is somewhat more subtle because \(|\mathsf{C}_N|\ll N\); see next section.

  2. If \(Q_1\) is singular with respect to \(Q_0\), the problem is simpler but requires a bit more care.

References

  1. Louigi Addario-Berry, Nicolas Broutin, Luc Devroye, and Gábor Lugosi. On combinatorial testing problems. The Annals of Statistics, 38(5):3063–3092, 2010.

  2. Noga Alon, Michael Krivelevich, and Benny Sudakov. Finding a large hidden clique in a random graph. In Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, pages 594–598. Society for Industrial and Applied Mathematics, 1998.

  3. Noga Alon, Michael Krivelevich, and Van H Vu. On the concentration of eigenvalues of random symmetric matrices. Israel Journal of Mathematics, 131(1):259–267, 2002.

  4. Brendan PW Ames and Stephen A Vavasis. Nuclear norm minimization for the planted clique and biclique problems. Mathematical programming, 129(1):69–89, 2011.

  5. Dana Angluin. Local and global properties in networks of processors. In Proceedings of the twelfth annual ACM symposium on Theory of computing, pages 82–93. ACM, 1980.

  6. Ery Arias-Castro, Emmanuel J Candès, and Arnaud Durand. Detection of an anomalous cluster in a network. The Annals of Statistics, 39(1):278–304, 2011.

  7. Ery Arias-Castro, David L Donoho, and Xiaoming Huo. Near-optimal detection of geometric objects by fast multiscale methods. Information Theory, IEEE Transactions on, 51(7):2402–2425, 2005.

  8. M. Bayati and A. Montanari. The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. on Inform. Theory, 57:764–785, 2011.

  9. Mohsen Bayati, Marc Lelarge, and Andrea Montanari. Universality in polytope phase transitions and message passing algorithms. arXiv preprint arXiv:1207.7321, 2012.

  10. Quentin Berthet and Philippe Rigollet. Computational lower bounds for sparse pca. arXiv preprint arXiv:1304.0828, 2013.

  11. Shankar Bhamidi, Partha S Dey, and Andrew B Nobel. Energy landscape for large average submatrix detection problems in gaussian random matrices. arXiv preprint arXiv:1211.2284, 2012.

  12. Patrick Billingsley. Probability and measure. John Wiley & Sons, 2008.

  13. Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM (JACM), 58(3):11, 2011.

  14. Alexandre d’Aspremont, Francis Bach, and Laurent El Ghaoui. Optimal solutions for sparse principal component analysis. The Journal of Machine Learning Research, 9:1269–1294, 2008.

  15. Alexandre d’Aspremont, Laurent El Ghaoui, Michael I Jordan, and Gert RG Lanckriet. A direct formulation for sparse pca using semidefinite programming. SIAM review, 49(3):434–448, 2007.

  16. Chandler Davis and W. M. Kahan. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):pp. 1–46, 1970.

  17. Yael Dekel, Ori Gurel-Gurevich, and Yuval Peres. Finding hidden cliques in linear time with high probability. In ANALCO, pages 67–75. SIAM, 2011.

  18. Amir Dembo. Probability Theory. http://www.stanford.edu/~montanar/TEACHING/Stat310A/lnotes.pdf, 2013.

  19. David L Donoho, Arian Maleki, and Andrea Montanari. Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences, 106(45):18914–18919, 2009.

  20. Uriel Feige and Dorit Ron. Finding hidden cliques in linear time. DMTCS Proceedings, (01):189–204, 2010.

  21. Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh Vempala, and Ying Xiao. Statistical algorithms and a lower bound for planted clique. arXiv preprint arXiv:1201.1214, 2012.

  22. Zoltán Füredi and János Komlós. The eigenvalues of random symmetric matrices. Combinatorica, 1(3):233–241, 1981.

  23. Geoffrey R Grimmett and Colin JH McDiarmid. On colouring random graphs. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 77, pages 313–324. Cambridge Univ Press, 1975.

  24. Dongning Guo and Chih-Chun Wang. Asymptotic mean-square optimality of belief propagation for sparse linear systems. In Information Theory Workshop, 2006. ITW’06 Chengdu. IEEE, pages 194–198. IEEE, 2006.

  25. Mark Jerrum. Large cliques elude the metropolis process. Random Structures & Algorithms, 3(4):347–359, 1992.

  26. Iain M Johnstone and Arthur Yu Lu. On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486), 2009.

  27. Antti Knowles and Jun Yin. The isotropic semicircle law and deformation of wigner matrices. arXiv preprint arXiv:1110.6449, 2011.

  28. Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.

  29. Nathan Linial. Locality in distributed graph algorithms. SIAM Journal on Computing, 21(1):193–201, 1992.

  30. Marc Mezard and Andrea Montanari. Information, physics, and computation. Oxford University Press, 2009.

  31. Andrea Montanari. Graphical Models Concepts in Compressed Sensing. In Y.C. Eldar and G. Kutyniok, editors, Compressed Sensing: Theory and Applications. Cambridge University Press, 2012.

  32. Andrea Montanari and David Tse. Analysis of belief propagation for non-linear problems: The example of cdma (or: How to prove tanaka’s formula). In Information Theory Workshop, 2006. ITW’06 Punta del Este. IEEE, pages 160–164. IEEE, 2006.

  33. Moni Naor and Larry Stockmeyer. What can be computed locally? SIAM Journal on Computing, 24(6):1259–1277, 1995.

  34. Sundeep Rangan and Alyson K Fletcher. Iterative estimation of constrained rank-one matrices in noise. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, pages 1246–1250. IEEE, 2012.

  35. Tom Richardson and Rüdiger Leo Urbanke. Modern coding theory. Cambridge University Press, 2008.

  36. Andrey A Shabalin, Victor J Weigman, Charles M Perou, and Andrew B Nobel. Finding large average submatrices in high dimensional data. The Annals of Applied Statistics, pages 985–1012, 2009.

  37. Xing Sun and Andrew B Nobel. On the size and recovery of submatrices of ones in a random binary matrix. J. Mach. Learn. Res, 9:2431–2453, 2008.

  38. Jukka Suomela. Survey of local algorithms. ACM Computing Surveys (CSUR), 45(2):24, 2013.

  39. Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Y.C. Eldar and G. Kutyniok, editors, Compressed Sensing: Theory and Applications, pages 210–268. Cambridge University Press, 2012.

  40. Martin J Wainwright and Michael I Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1–2):1–305, 2008.

  41. Hui Zou, Trevor Hastie, and Robert Tibshirani. Sparse principal component analysis. Journal of computational and graphical statistics, 15(2):265–286, 2006.

Download references

Acknowledgments

This work was partially supported by the NSF CAREER award CCF-0743978, NSF Grant DMS-0806211, and Grants AFOSR/DARPA FA9550-12-1-0411 and FA9550-13-1-0036.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Montanari.

Additional information

Communicated by Andrew Odlyzko.

Appendices

Appendix 1: Some Tools in Probability Theory

This appendix contains some useful facts in probability theory.

Lemma 7.1

Let \(h:\mathbb {R}\rightarrow \mathbb {R}\) be a bounded function with the first three derivatives uniformly bounded. Let \(X_{n, k}\) be mutually independent random variables for \(1\le k\le n\) with zero mean and variance \(v_{n, k}\). Define

$$\begin{aligned} v_n&\equiv \sum _{k=1}^n v_{n, k} \\ \delta _n({\varepsilon })&\equiv \sum _{k=1}^n {\mathbb E}[X^2_{n, k}\mathbb {I}_{|X_{n, k}|\ge {\varepsilon }}]\\ S_n&\equiv \sum _{k=1}^n X_{n, k}. \end{aligned}$$

Also, let \({\mathcal G}_n = \mathsf{N}(0, v_n)\). Then, for every \(n\) and \({\varepsilon }>0\)

$$\begin{aligned} |{\mathbb E}h(S_n) - {\mathbb E}h(G_n)|\le \left( \frac{{\varepsilon }}{6} + \frac{\sqrt{{\varepsilon }^2 + \delta _n}}{2}\right) v_n||h'''||_\infty + \delta _n||h'' ||_\infty . \end{aligned}$$

Proof

The lemma is proved using a standard swapping trick. The proof can be found in Amir Dembo’s lecture notes [18]. \(\square \)

Lemma 7.2

Given a random variable \(X\) such that \({\mathbb E}(X) = \mu \), suppose \(X\) satisfies

$$\begin{aligned} {\mathbb E}(e^{\lambda X}) \le e^{\mu \lambda + \rho \lambda ^2/2} \end{aligned}$$

for all \(\lambda >0\) and some constant \(\rho >0\). Then we have for all \(s > 0\)

$$\begin{aligned} {\mathbb E}(|X|^s)&\le 2s!e^{(s + \lambda \mu )/2}\lambda ^{-s}, \\ \mathrm{where} \lambda&= \frac{1}{2\rho }\left( \sqrt{\mu ^2+ 4s\rho } -\mu \right) . \end{aligned}$$

Further, if \(\mu = 0\), then we have for \(t < 1/e\rho \)

$$\begin{aligned} {\mathbb E}\left( e^{tX^2} \right) \le \frac{1}{1-e\rho t}. \end{aligned}$$

Proof

By an application of the Markov inequality and the given condition on \(X\),

$$\begin{aligned} {\mathbb P}(X\ge t)&\le e^{-\lambda t}{\mathbb E}(e^{\lambda X}) \\&\le e^{-\lambda t + \mu \lambda + \rho \lambda ^2/2} \end{aligned}$$

for all \(\lambda > 0\). By a symmetric argument,

$$\begin{aligned} {\mathbb P}(X \le -t) \le e^{\lambda t + \mu \lambda + \rho \lambda ^2/2}. \end{aligned}$$

By the standard integration formula, we have

$$\begin{aligned} {\mathbb E}(|X|^s)&= \int _0^\infty \! st^{s-1}{\mathbb P}(|X|\ge t)\,\mathrm{d}t \\&= \int _0^\infty \!st^{s-1}{\mathbb P}(X\ge t)\,\mathrm{d}t + \int _0^\infty \! st^{s-1}{\mathbb P}(X\le -t)\,\mathrm{d}t\\&\le 2e^{\mu \lambda + \rho \lambda ^2/2} \int _0^\infty \! st^{s-1}e^{-\lambda t}\,\mathrm{d}t \\&= 2 s!\, e^{\mu \lambda +\rho \lambda ^2/2}\lambda ^{-s}. \end{aligned}$$

Optimizing over \(\lambda \) yields the desired result.

If \(\mu = 0\), the optimization yields \(\lambda = \sqrt{s/\rho }\). Using this, the Taylor expansion of \(g(x) = e^{x^2}\), and monotone convergence we obtain

$$\begin{aligned} {\mathbb E}\left( e^{tX^2} \right)&= \sum _{k=0}^\infty \frac{t^k}{k!} {\mathbb E}(X^{2k})\\&\le \sum _{k=0}^\infty (e\rho t)^k \frac{(2k)!}{k!(2k)^k} \\&\le \sum _{k=0}^\infty (e\rho t)^k \\&= \frac{1}{1-e\rho t}. \end{aligned}$$

Notice that here we remove the factor of \(2\) in the inequality since this is not required for even moments of \(X\). \(\square \)

The following lemma is standard; see, for instance, [3, 39].

Lemma 7.3

Let \(M \in \mathbb {R}^{N\times N}\) be a symmetric matrix with entries \(M_{ij}\) (for \(i\ge j\)) that are centered subgaussian random variables of scale factor \(\rho \). Then, uniformly in \(N\),

$$\begin{aligned} {\mathbb P}\left( ||M||_2 \ge t \right) \le (5\lambda )^N e^{-N(\lambda - 1)}, \end{aligned}$$

where \(\lambda = t^2/16N\rho e\) and \(||{M}||_2\) denotes the spectral norm (or largest singular value) of \(M\).

Proof

Divide \(M\) into its upper and lower triangular portions \(M^u\) and \(M^l\) so that \(M = M^u+M_l\). We deal with each separately. Let \(m_i\) denote the \(i{\text {th}}\) row of \(M^l\). For a unit vector \(x\), since \(M_{ij}\) are all independent and subgaussian with scale \(\rho \), it is easy to see that \(\langle m_j, x\rangle \) are also subgaussian with the same scale. We now bound the square exponential moment of \(||{M^lx}||\) as follows. For small enough \(c\ge 0\),

$$\begin{aligned} {\mathbb E}\left( e^{c||{M^lx}||^2}\right)&= {\mathbb E}\left( \prod _{j=1}^N e^{c\langle m_j, x\rangle ^2} \right) \nonumber \\&= \prod _{j=1}^N{\mathbb E}\left( e^{c\langle m_j, x\rangle ^2} \right) \nonumber \\&\le \left( 1 - e\rho c \right) ^N. \end{aligned}$$
(7.1)

Using this, we obtain for any unit vector \(x\)

$$\begin{aligned} {\mathbb P}(||{M^l x}|| \ge t ) \le \left( \frac{t^2}{N\rho e}\right) ^N e^{-N(t^2/N\rho e - 1)}, \end{aligned}$$

where we used the Markov inequality and Eq. (7.1) with an appropriate \(c\). Let \(\Upsilon \) be a maximal \(1/2\)-net of the unit sphere. From a volume-packing argument we have that \(|\Upsilon | \le 5^N\). Then from the fact that \(g(x) = M^lx\) is \(||{M^l}||\)-Lipschitz in \(x\) we obtain

$$\begin{aligned} {\mathbb P}\left( ||{M^l}||_2 \ge t \right)&\le {\mathbb P}\left( \max _{x\in \Upsilon }||{M^lx}|| \ge t/2\right) \\&\le |\Upsilon | {\mathbb P}(||{M^lx}|| \ge t/2). \end{aligned}$$

The same inequality holds for \(M^u\). Now, using the fact that \(||{\cdot }||_2\) is a convex function and that \(M^u\) and \(M^l\) are independent we obtain

$$\begin{aligned} {\mathbb P}\left( ||{M}||_2 \ge t \right)&\le {\mathbb P}\left( ||{M^u}||_2 \ge t/2 \right) + {\mathbb P}\left( ||{M^l}||_2 \ge t/2 \right) \\&\le 2 \left( 5^N \left( \frac{t^2}{16N\rho e}\right) ^N e^{-N(t^2/16N\rho e - 1)} \right) . \end{aligned}$$

Substituting for \(\lambda \) yields the result. \(\square \)

Appendix 2: Additional Proofs

In this section we provide, for the sake of completeness, some additional proofs that are known results. We begin with Proposition 1.1.

1.1 Proof of Proposition 1.1

We assume that the set \(\mathsf{C}_N\) is generated as follows: let \(X_i\in \{0, 1\}\) be the label of the index \(i\in [N]\). Then \(X_i\) are i.i.d. Bernoulli with parameter \(\kappa /\sqrt{N}\) and the set \(\mathsf{C}_N = \{i : X_i = 1\}\). The model of choosing \(\mathsf{C}_N\) uniformly random of size \(\kappa \sqrt{N}\) is similar to this model, and asymptotically in \(N\) there is no difference. Notice that since \(e_{\mathsf{C}_N} = u_{\mathsf{C}_N}/N^{1/4}\), we have that \(\Vert e_{\mathsf{C}_N}\Vert ^2\) concentrates sharply around \(\kappa \), and we are interested in the regime \(\kappa =\varTheta (1)\).

We begin with the first part of the proposition, where \(\kappa = 1+{\varepsilon }\). Let \(W_N = W/\sqrt{N},\,Z_N = Z/\sqrt{N}\), and \(e_{\mathsf{C}_N} = u_{\mathsf{C}_N}/N^{1/4}\). Since this normalization does not make a difference to the eigenvectors of \(W\) and \(Z\), we obtain from the eigenvalue equation \(W_N v_1 = \lambda _1 v_1\) that

$$\begin{aligned} e_{\mathsf{C}_N}\langle e_{\mathsf{C}_N}, v_1\rangle + Z_Nv_1 = \lambda _1v_1. \end{aligned}$$
(8.1)

Multiplying by \(v_1\) on either side yields

$$\begin{aligned} \langle e_{\mathsf{C}_N}, v_1\rangle ^2&= \lambda _1 - \langle v_1, Z_Nv_1\rangle \\&\ge \lambda _1 - \Vert Z_N\Vert _2. \end{aligned}$$

The fact that \(Z_N = Z/\sqrt{N}\) is a standard Wigner matrix with subgaussian entries [3] yields that \(||{Z}||_2 \le 2 + \delta \), with a probability of at least \(C_1e^{-c_1N}\) for some constants \(C_1(\delta ), c_1(\delta )>0\). Further, by Theorem 2.7 of [27], we have that \(\lambda _1 \ge 2 + \min ({\varepsilon },{\varepsilon }^2)\), with a probability of at least \(1 - N^{-c_2\log \log N}\) for some constant \(c_2\) and every \(N\) sufficiently large. It follows from this and the union bound that for \(N\) large enough, we have

$$\begin{aligned} \langle e_{\mathsf{C}_N}, v_1\rangle ^2 \ge \min ({\varepsilon },{\varepsilon }^2)/2, \end{aligned}$$

with a probability of at least \(1 - N^{-c_4}\) for some constant \(c_4>0\). The first claim then follows.

For the second claim, we start with the same eigenvalue Eq. (8.1). Let \(\varphi _1\)be the eigenvector corresponding to the largest eigenvalue of \(Z_N\). Multiplying Eq. (8.1) by \(\varphi _1\) on either side we obtain

$$\begin{aligned} \langle e_{\mathsf{C}_N}, v_1\rangle \langle e_{\mathsf{C}_N}, \varphi _1\rangle + \theta _1\langle v_1, \varphi _1\rangle = \lambda _1 \langle v_1,\varphi _1\rangle , \end{aligned}$$

where \(\theta _1\) is the eigenvalue of \(Z_N\) corresponding to \(\varphi _1\). With this and Cauchy–Schwartz we obtain

$$\begin{aligned} |\langle e_{\mathsf{C}_N}, v_1\rangle | \le \frac{|\lambda _1 - \theta _1|}{|\langle \varphi _1, e_{\mathsf{C}_N}\rangle |}. \end{aligned}$$

Let \(\phi = (\log N)^{\log \log N}\). Then, using Theorem 2.7 of [27], for any \(\delta > 0\), there exists a constant \(C_1\) such that \(|\lambda _1 - \theta _1| \le N^{-1 + \delta }\), with a probability of at least \(1 - N^{-c_3\log \log N}\).

Since \(\varphi _1\) is independent of \(e_{\mathsf{C}_N}\), we observe that

$$\begin{aligned} {\mathbb E}_e\left( \sum _{i=1}^N \varphi _1^i e_{\mathsf{C}_N}^i \right)&= N^{-3/4}(1-{\varepsilon }) \sum _{i=1}^{N}\varphi ^i_1\\ {\mathbb E}_{e}\left( \sum _{i=1}^N (\varphi _1^i e_{\mathsf{C}_N}^i)^2 \right)&= \frac{1-{\varepsilon }}{N}, \end{aligned}$$

where \(\varphi _1^i\) (\(e_{\mathsf{C}_N}^i\)) denotes the \(i{\mathrm{th}}\) entry of \(\varphi _1\) (resp. \(e_{\mathsf{C}_N}\)) and \({\mathbb E}_e(\cdot )\) is the expectation with respect to \(e_{\mathsf{C}_N}\) holding \(Z_N\) (hence \(\varphi _1\)) constant. Using Theorem 2.5 of [27], it follows that there exist constants \(c_4, c_5, c_6, c_7\) such that the following two events happen with a probability of at least \(1- N^{-c_4\log \log N}\). First, the first expectation mentioned previously is at most \((1-{\varepsilon })\phi ^{c_5}N^{-7/4}\). Second,

$$\begin{aligned} \left[ {\mathbb E}_e\left( \sum _{i=1}^N (\varphi _1^ie_{\mathsf{C}_N}^i)^2 \right) \right] ^{-1/2} \max _i |e_{\mathsf{C}_N}^i\varphi ^1_i| \le \frac{(1-{\varepsilon })\phi ^{c_7}}{N^{1/4}}. \end{aligned}$$

Now, using the Berry–Esseen central limit theorem for \(\langle \varphi _1, e_{\mathsf{C}_N}\rangle \):

$$\begin{aligned} {\mathbb P}\left( |\langle \varphi _1, e_{\mathsf{C}_N}\rangle |\le c(N)^{1/2-\delta }\right) \le \frac{1}{N^\delta } \end{aligned}$$

for an appropriate constant \(c = c({\varepsilon })\) and \(\delta \in (0,1/4)\). Using this and the earlier bound for \(|\lambda _1-\theta _1|\) we obtain that

$$\begin{aligned} |\langle e_{\mathsf{C}_N}, v_1\rangle | \le cN^{-1/2+3\delta }, \end{aligned}$$

with a probability of at least \(1 - c'N^{-\delta }\), for some \(c'\) and sufficiently large \(N\). The claim then follows using the union bound and the same argument for the first \(\ell \) eigenvectors.

1.2 Proof of Proposition 4.2

For any fixed \(t\), let \({\mathcal {E}}_N^t\) denote the set of vertices in \(G_N\) such that their \(t\)-neighborhoods are not a tree, i.e.,

$$\begin{aligned} {\mathcal {E}}^t_N = \left\{ i\in [N]:\mathsf{Ball}_{G_N}(i; t) \text { is not a tree}\right\} . \end{aligned}$$

For notational simplicity, we will omit the subscript \(G_N\) in the neighborhood of \(i\). The relative size \({\varepsilon }^t_N = |{\mathcal {E}}^t_N|/N\) vanishes asymptotically in \(N\) since the sequence \(\{G_N\}_{N\ge 1}\) is locally treelike. We let \(\mathsf{F}_{BP}(W_{\mathsf{Ball}(i;t)})\) denote the decision according to belief propagation at the \(i{\mathrm{th}}\) vertex.

From Proposition 4.1, Eqs. (4.1), (4.2), (4.5), and (5.1), and induction we observe that for any \(i\in [N]\backslash {\mathcal {E}}^t_N\)

$$\begin{aligned} \frac{{\mathbb P}(X_i = 1|W_{\mathsf{Ball}(i;t)})}{{\mathbb P}(X_i = 0 |W_{\mathsf{Ball}(i;t)})} \mathop {=}\limits ^{\mathrm{d}}\frac{\tilde{\gamma }^t(X_i)}{\sqrt{\varDelta }}. \end{aligned}$$

We also have that

$$\begin{aligned} |{\hat{\mathsf{C}}}_N\triangle \mathsf{C}_N| = \sum _{i=1}^N \mathbb {I}(\mathsf{F}_{BP}(W_{\mathsf{Ball}(i;t)}) \ne X_i). \end{aligned}$$

Using both of these identities, the fact that \({\varepsilon }^t_N\rightarrow 0\), and the linearity of expectation, we have the first claim:

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{{\mathbb E}|{\hat{\mathsf{C}}}_N\triangle \mathsf{C}_N|}{N} = \frac{\tilde{\kappa }}{\sqrt{\varDelta }} {\mathbb P}\left( \gamma ^t(1) < \sqrt{\varDelta } \right) + \left( 1- \frac{\tilde{\kappa }}{\sqrt{\varDelta }} \right) {\mathbb P}\left( \gamma ^t(0) \ge \sqrt{\varDelta } \right) . \end{aligned}$$

For any other decision rule \(\mathsf{F}(W_{\mathsf{Ball}(i;t)})\), we have that

$$\begin{aligned} \frac{{\mathbb E}[ |{\hat{\mathsf{C}}}_N\triangle \mathsf{C}_N| ]}{N}&\ge (1-{\varepsilon }^t_N) {\mathbb P}( \mathsf{F}(W_{\tilde{\mathsf{T}}\mathsf{ree}(t)}) \ne X_\circ ) \\&\ge (1-{\varepsilon }^t_N) {\mathbb P}( \mathsf{F}_{BP}(W_{\tilde{\mathsf{T}}\mathsf{ree}(t)}) \ne X_\circ ) \end{aligned}$$

since BP computes the correct posterior marginal on the root of the tree \(\tilde{\mathsf{T}}\mathsf{ree}(t)\) and maximizing the posterior marginal minimizes the misclassification error. The second claim follows by taking the limits.

1.3 Equivalence of i.i.d. and Uniform Set Model

In Sect. 2 the hidden set \(\mathsf{C}_N\) was assumed to be uniformly random given its size. However, in Sect. 4 we considered a slightly different model to choose \(\mathsf{C}_N\), wherein \(X_i\) are i.i.d. Bernoulli random variables with parameter \(\tilde{\kappa }/\sqrt{\varDelta }\). This leads to a set, \(\mathsf{C}_N = \{i: X_i = 1\}\), that has a random size, sharply concentrated around \(N\tilde{\kappa }/\sqrt{\varDelta }\). The uniform set model can be obtained from the i.i.d. model by simply conditioning on the size \(|\mathsf{C}_N|\). In the limit of large \(N\) it is well known that these two models are “equivalent.” However, for completeness, we provide a proof that the results of Proposition 4.2 do not change when conditioned on the size \(|\mathsf{C}_N| = \sum _{i = 1}^{N}X_i\):

$$\begin{aligned} {\mathbb E}\left[ |{\hat{\mathsf{C}}}_N\triangle \mathsf{C}_N|\,\bigg |\, |\mathsf{C}_N| = \frac{N\tilde{\kappa }}{\sqrt{\varDelta }} \right] = \sum _{i=1}^N {\mathbb P}\bigg ( \mathsf{F}(W_{\mathsf{Ball}(i;t)}) \ne X_i \,\bigg |\, \sum _{j=1}^N X_j = \frac{N\tilde{\kappa }}{\sqrt{\varDelta }} \bigg ). \end{aligned}$$

Let \(S\) be the event \(\{\sum _{i=1}^N X_i=N\tilde{\kappa }/\sqrt{\varDelta }\}\). Notice that \(\mathsf{F}(W_{\mathsf{Ball}(i;t)})\) is a function of \(\{X_j, j\in \mathsf{Ball}(i;t)\}\), which is a discrete vector of dimension \(K_t \le (\varDelta +1)^{t+1}\). A straightforward direct calculation yields that \((X_j, j\in \mathsf{Ball}(i;t))|S \) converges in distribution to \((X_j, j\in \mathsf{Ball}(i;t))\) asymptotically in \(N\). This implies that \(W_{\mathsf{Ball}(i;t)} |S\) converges in distribution to \(W_{\mathsf{Ball}(i;t)}\). Further, using the locally treelike property of \(G_N\) one obtains

$$\begin{aligned} \lim _{N\rightarrow \infty } \frac{1}{N}{\mathbb E}\left[ |{\hat{\mathsf{C}}}_N\triangle \mathsf{C}_N|\,\bigg |\, |\mathsf{C}_N| = \frac{N\tilde{\kappa }}{\sqrt{\varDelta }} \right] = {\mathbb P}\left( \mathsf{F}(W_{\tilde{\mathsf{T}}\mathsf{ree}(t)})\ne X_\circ \right) , \end{aligned}$$

as required.

1.4 Importance of Message-Passing Modification in Algorithm 1

In this section we provide a simple counterexample that demonstrates the importance of the message-passing or so-called cavity modification we employ, i.e., to remove the contribution of the incoming message \(\theta ^t_{j\rightarrow i}\) in the computation of \(\theta ^{t+1}_{i\rightarrow j}\), cf. Eq. (2.2). This modification is crucial for Lemma 2.2 to hold. Indeed, our counterexample will demonstrate that state evolution does not hold without this modification.

For the sake of simplicity, we consider a pure noise data matrix \(W\,W_{ij} \sim \mathsf{N}(0, 1)\) i.i.d. for \(i<j,\,W_{ii} = 0\) and \(W=W^{\mathsf{T}}\). Using our notations, we have \(\lambda =0, \kappa =0\). In other words, our observations contain no signal. Further, we consider the initial conditions \(\theta ^0_i = \theta ^0_{i\rightarrow j} = 1\) for all \(i, j\) distinct and the simple function \(f(z; t) = z^3\) for each iteration \(t\). We stress that we make these choices to simplify calculations as much as possible: it should be clear from our subsequent argument that the same phenomenon takes place generically.

State evolution reads, for \(t\ge 0\),

$$\begin{aligned} \mu _{t+1}&= 0,\\ \tau _{t+1}^2&= {\mathbb E}\{(\mu _t + \tau _t Z)^6\}, \end{aligned}$$

where \(Z\sim \mathsf{N}(0, 1)\). The initial condition is \(\mu _0 = 1, \tau _0 = 0\). Lemma 2.2 establishes that, for \(\theta ^t_{i}\) given by the message-passing iteration (2.2), (2.3), we have, for any bounded Lipschitz function \(\psi \),

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{i=1}^N\psi (\theta ^t_i) = {\mathbb E}[\psi (\tau _tZ)]\,, \end{aligned}$$
(8.2)

in probability. In other words, \(\theta ^t_i\) is approximately Gaussian with mean zero and variance \(\tau _t^2\).

In particular, taking \(\psi _M(x) = x\) for \(|x|\le M,\,\psi _M(x) = M\) for \(x>M\) and \(\psi (x) = -M\) for \(x<M\) and using dominated convergence we obtain

$$\begin{aligned} \lim _{N\rightarrow \infty }{\mathbb E}[\psi _M(\theta _1^t)] = \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{i=1}^N{\mathbb E}[\psi (\theta ^t_i)]= {\mathbb E}[\psi _M(\tau _tZ)] = 0\,. \end{aligned}$$
(8.3)

Consider now the iteration without cavity modification (denoted subsequently by \(\vartheta ^t_i\)):

$$\begin{aligned} \vartheta ^{t+1}_i = \sum _{\ell \ne i}A_{i\ell } f(\vartheta ^t_\ell ). \end{aligned}$$
(8.4)

Without loss of generality, we will focus on coordinate 1 of the iterate \(\vartheta ^t\):

$$\begin{aligned} \vartheta ^{1}_1&= \sum _{\ell \ne 1}A_{1\ell }, \\ \vartheta ^{2}_1&= \sum _{\ell \ne 1}A_{1\ell } (\vartheta ^1_\ell )^3 \\&= \sum _{\ell \ne 1} A_{1\ell } \bigg (\sum _{i \ne \ell } A_{i\ell }\bigg )^3\\&= \sum _{\ell \ne 1} A_{1\ell }\bigg (A_{1\ell } + \sum _{i\ne 1, \ell } A_{i\ell }\bigg )^3. \end{aligned}$$

We can explicitly compute the expectation of \(\vartheta ^{2}_1\) as follows. Since \(A_{1\ell }\sim \mathsf{N}(0, 1/N)\), an application of Stein’s lemma gives

$$\begin{aligned} {\mathbb E}\{\vartheta ^2_1\}&= \sum _{\ell \ne 1}\frac{3}{N}{\mathbb E}\bigg \{(A_{1\ell } + \sum _{i\ne 1,\ell }A_{i\ell })^2\bigg \} \\&= \sum _{\ell \ne 1}\frac{3(N-1)}{N^2} \\&= \frac{3(N-1)^2}{N^2}. \end{aligned}$$

In particular, in the limit of large \(N\),

$$\begin{aligned} \lim _{N\rightarrow \infty }{\mathbb E}\left\{ \vartheta ^2_1\right\} = 3. \end{aligned}$$
(8.5)

This appears to be in contradiction with the state evolution prediction that would suggest that \(\vartheta ^t_i\) is approximately \(\mathsf{N}(0,\tau _t^2)\) and, hence, \({\mathbb E}\vartheta ^t_i\rightarrow 0\) as by formally setting \(M=\infty \) in Eq. (8.3). The reader will notice that a more careful argument is needed to reach a contradiction since we cannot set \(M=\infty \) in Eq. (8.3) (because Lemma 2.2 only applies to bounded Lipschitz functions). We proceed to present the required additional steps.

We first show that

$$\begin{aligned} \lim _{N\rightarrow \infty }{\mathbb E}\big \{(\vartheta ^2_1)^2\big \}\le C \end{aligned}$$
(8.6)

for a constant \(C<\infty \). Note that

$$\begin{aligned} {\mathbb E}\{(\vartheta ^2_1)^2\}&= {\mathbb E}\bigg \{\sum _{T_1, T_2}A(T_1)A(T_2) \bigg \}\\&\quad \sum _{T_1, T_2}{\mathbb E}\{A(T_1)A(T_2)\}, \end{aligned}$$

where the sum is over labeled trees \(T_1, T_2\) with two generations and four vertices: a root labeled \(1\) with a single child, which in turn has three children. Let \(\mathbf{G}\) denote the graph formed from the pair \(T_1, T_2\) by identifying vertices of the same label. Then \({\mathbb E}\{A(T_1)A(T_2)\}\) is nonzero only if every edge in \(\mathbf{G}\) is covered at least twice. We prove that the total contribution of such terms is bounded. Let \(e(\mathbf{G})\) indicate the number of edges (with multiplicity) in \(\mathbf{G}\) and \(n(\mathbf{G})\) its number of vertices. Since \(\mathbf{G}\) is connected by the root and every edge has a multiplicity of at least two, we have that \(n(\mathbf{G}) - 1 \le e(\mathbf{G})/2\). Using Lemma 7.2 we have \({\mathbb E}\{A(T_1)A(T_2)\} \le {\mathbb E}\{|A(T_1)A(T_2)|\} = O(N^{-e(\mathbf{G})/2})\). Further, the number of contributing terms is \(O(N^{n(\mathbf{G})-1})\). It follows that the total nonzero contribution is bounded by a constant, and hence \(\lim _{N\rightarrow \infty }{\mathbb E}\{\vartheta ^2_1)^2\} \le C\) for an appropriate constant \(C\).

Note that \(|\psi _M(x)-x| =(|x|-M)_+\le x^2/M\). We then have

$$\begin{aligned} \lim \sup _{N\rightarrow \infty }{\mathbb E}\big \{\big |\psi _M(\vartheta ^2_1)-\vartheta ^2_i\big |\big \} \le \frac{1}{M}\lim _{N\rightarrow \infty }{\mathbb E}\big \{(\vartheta ^t_1)^2\big \}\le \frac{C}{M}. \end{aligned}$$
(8.7)

Hence,

$$\begin{aligned} \lim \inf _{N\rightarrow \infty }{\mathbb E}[\psi _M(\vartheta _1^2)] \ge \lim _{N\rightarrow \infty }{\mathbb E}\{\vartheta ^2_1\} - \lim \sup _{N\rightarrow \infty }{\mathbb E}\big \{\big |\psi _M(\vartheta ^2_1)-\vartheta ^2_i\big |\big \} \ge 3-\frac{C}{M}.\nonumber \\ \end{aligned}$$
(8.8)

By choosing \(M\ge C\) we conclude that the state evolution prediction does not hold for the naive iteration (8.4).

It is a useful exercise to repeat the preceding calculation for the message-passing sequence \(\theta ^t_i\) (with the cavity modification). The final result confirms the state evolution prediction as in Lemma 2.2. For higher iterations, the effect of the cavity modification is analogous but somewhat more subtle, as indicated by our proof via the moment method.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deshpande, Y., Montanari, A. Finding Hidden Cliques of Size \(\sqrt{N/e}\) in Nearly Linear Time. Found Comput Math 15, 1069–1128 (2015). https://doi.org/10.1007/s10208-014-9215-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-014-9215-y

Keywords

Mathematics Subject Classification

Navigation