Regularity lemmas for clustering graphs

doi:10.1016/j.aam.2019.101961

Advances in Applied Mathematics

Volume 126, May 2021, 101961

https://doi.org/10.1016/j.aam.2019.101961 Get rights and content

Abstract

For a graph G with a positive clustering coefficient C, it is proved that for any positive constant ϵ, the vertex set of G can be partitioned into finitely many parts, say $S_{1}, S_{2}, \dots, S_{m}$ , such that all but an ϵ fraction of the triangles in G are contained in the projections of tripartite subgraphs induced by $(S_{i}, S_{j}, S_{k})$ which are ϵ-Δ-regular, where the size m of the partition depends only on ϵ and C. The notion of ϵ-Δ-regular, which is a variation of ϵ-regular for the original regularity lemma, concerns triangle density instead of edge density. Several generalizations and variations of the regularity lemma for clustering graphs are derived.

Introduction

One of the celebrated results of Szemerédi [19] is the so-called regularity lemma which asserts that for any graph on n vertices, the vertex set can be partitioned into finitely many parts so that almost all but $ϵ n^{2}$ edges are contained in the union of bipartite subgraphs between pairs of the parts that are random-like under the notion of ϵ-regular. A bipartite graph is said to be ϵ-regular, if the edge density on any induced sub-bipartite graph on at least ϵn vertices differs from the edge density of the bipartite graph by at most ϵ. The regularity lemma has been a powerful tool in graph theory with numerous applications [11], [14], [17] because any graph (with more than $ϵ n^{2}$ edges) can be approximated by a finite graph in the sense that each vertex of the finite graph can be replaced by a subset of vertices and the bipartite subgraphs between any two subsets are quasirandom.

A major deficiency of the regularity lemma is the fact that it is useful only for graphs with a positive edge density since the error bound of approximation is of order $ϵ n^{2}$ . There have been numerous attempts for possible extensions of the regularity lemma to sparse graphs, mostly with either additional assumptions [13] or weakened conditions [9], [18].

In this paper, we give a regularity lemma for clustering graphs without any restriction on edge density. We note that many information networks and social network graphs contain a large number of triangles and thus have nontrivial clustering coefficients [16], [20]. Such a clustering effect is one of the main characteristics of the so-called “small world phenomenon” that appear in a variety of real world graphs [15]. There are many research papers concerning finding dense subgraphs [2], [3] or partitioning into dense clique-like subgraphs [12] for such small-world graphs.

In this paper, we focus on graphs with nontrivial clustering coefficients (or triangle density). Let $t_{\begin{matrix} G \end{matrix}}$ denote the number of triangles in G and $p_{\begin{matrix} G \end{matrix}}$ denote the number of paths of two edges. The clustering coefficient $C_{\begin{matrix} G \end{matrix}}$ is defined to be (see [16]) $C_{\begin{matrix} G \end{matrix}} = \frac{3 t_{\begin{matrix} G \end{matrix}}}{p_{\begin{matrix} G \end{matrix}}} .$ If $p_{\begin{matrix} G \end{matrix}} = 0$ , we define $C_{\begin{matrix} G \end{matrix}} = 0$ . We say G is a clustering graph if its clustering coefficient $C_{\begin{matrix} G \end{matrix}}$ is a positive constant independent of the number of vertices of G.

Theorem 1

For any $ϵ > 0$ and any graph G with clustering coefficient C, the vertex set of G can be partitioned into $S_{1}, S_{2}, \dots, S_{m}$ for some m depending only on ϵ and C, such that all but $ϵ t_{G}$ triangles in G are contained in the projections of tripartite subgraphs with vertex set $(S_{i}, S_{j}, S_{k})$ that are ϵ-Δ-regular.

The detailed definitions of various terms above will be given in Section 2. The proof of the regularity lemma for clustering graphs are quite similar to the previous proofs for the original regularity lemma [4], [14], [19] except for using an index function involving clustering coefficients. In Section 3 we give a proof of the regularity lemma for tripartite graphs with nontrivial clustering coefficient. The proof is self-contained and relatively short. In Section 4 we then consider a strong version of ϵ-Δ-regular for tripartite graphs. In Section 5 we give a proof of Theorem 1 and a weighted version of the regularity lemma both of which are straightforward applications of the regularity lemma for tripartite graphs with nontrivial clustering coefficients. In Section 6, we consider several generalizations of the regularity lemma. We will give a regularity for graphs which is dense in 4-cycles and, in general, graphs which contain a relatively large number of any specified graph (in comparison with its subgraphs). Some remarks and problems are mentioned in Section 7.

Section snippets

Preliminaries

We consider a tripartite graph $H$ with the vertex set as the disjoint union $V_{1} ⊔ V_{2} ⊔ V_{3}$ . Any triangle in $H$ has one vertex in each $V_{i}$ for $i = 1, 2, 3$ . Let $t_{\begin{matrix} H \end{matrix}}$ denote the number of triangles in $H$ . Let $p_{\begin{matrix} H \end{matrix}}$ denote the number of triples $(v_{1}, v_{2}, v_{3})$ with $v_{i} \in V_{i}$ and ${v_{1}, v_{2}}, {v_{2}, v_{3}}$ are edges in $H$ . The clustering coefficient of a tripartite graph is defined to be $c_{\begin{matrix} H \end{matrix}} = \frac{t_{\begin{matrix} H \end{matrix}}}{p_{\begin{matrix} H \end{matrix}}}$

For a graph $G = (V, E)$ , it is helpful to consider the associated tripartite graph $G$ which has vertex set as the disjoint union $V_{1} ⊔ V_{2} ⊔ V_{3}$ where $V_{i}$ is a

A regularity lemma for tripartite graphs

We first prove the following version of the regularity lemma for tripartite clustering graphs.

Theorem 2

For any $ϵ > 0$ and any tripartite graph $H$ with clustering coefficient c, the vertex set $V_{1} ⊔ V_{2} ⊔ V_{3}$ of $H$ can be partitioned into $S_{1}, S_{2}, \dots, S_{m}$ for some m depending only on ϵ and c, such that all but $ϵ t_{\begin{matrix} H \end{matrix}}$ triangles in $H$ are contained in the ϵ-Δ-regular tripartite subgraphs with vertex set $S_{i} ⊔ S_{j} ⊔ S_{k}$ .

Proof

For a partition $P$ consisting of partitions $P_{i}$ of $V_{i}$ , for $i = 1, 2, 3$ , we define the index function $I (P)$ : $I (P) = I (P_{1}, P_{2}, P_{3})$

A strong regularity lemma for tripartite graphs

For a tripartite graph $H$ with vertex set $T_{1} ⊔ T_{2} ⊔ T_{3}$ , we consider some variations of clustering coefficient. Recall that $c_{\begin{matrix} H \end{matrix}} = c_{\begin{matrix} H \end{matrix}}^{(1)} = c (T_{1}, T_{2}, T_{3}) = \frac{t (T_{1}, T_{2}, T_{3})}{p (T_{1}, T_{2}, T_{3})} . We define p_{\begin{matrix} H \end{matrix}}^{(2)} = p (T_{2}, T_{3}, T_{1}), c_{\begin{matrix} H \end{matrix}}^{(2)} = c (T_{2}, T_{3}, T_{1}), and p_{\begin{matrix} H \end{matrix}}^{(3)} = p (T_{3}, T_{1}, T_{2}), c_{\begin{matrix} H \end{matrix}}^{(3)} = c (T_{3}, T_{1}, T_{2}) .$ For $j = 1, 2, 3$ , we say a tripartite graph with vertex set $T_{1} ⊔ T_{2} ⊔ T_{3}$ is ϵ- $Δ^{(j)}$ -regular if for any $S_{i} \subset T_{i}$ for $i = 1, 2, 3$ , with $p^{(j)} (S_{1}, S_{2}, S_{3}) \geq ϵ p^{(j)} (T_{1}, T_{2}, T_{3})$ , we have $| c^{(j)} (S_{1}, S_{2}, S_{3}) - c^{(j)} (T_{1}, T_{2}, T_{3}) | \leq ϵ .$

We say a tripartite graph with vertex set $T_{1} ⊔ T_{2} ⊔ T_{3}$ is strongly ϵ

Regularity lemmas for triangle-dense graphs

In a graph $G = (V, E)$ , we consider the associated tripartite graph $G = G (V_{1}, V_{2}, V_{3})$ , where $V_{i}$ 's are copies of V. For any three subsets $S_{1}, S_{2}, S_{3} \subseteq V$ , not necessarily distinct, we consider the associated induced subgraph of $G$ , denoted by $G (T_{1}, T_{2}, T_{3})$ , where $T_{i}$ is the copy of $S_{i}$ in $V_{i}$ . For a triple $(v_{1}, v_{2}, v_{3})$ where $v_{i} \in S_{i}$ , we note that $v_{1}, v_{2}, v_{3}$ form a triangle in G if and only if $(v_{1}, v_{2}, v_{3})$ forms a triangle in $G (T_{1}, T_{2}, T_{3})$ . In other words, the set of triangles in $G$ are in one-to-one correspondence with

Several regularity lemmas for general clustering graphs

Many information networks are bipartite and therefore do not have nontrivial clustering coefficient as defined in (1). Nevertheless, some of these graphs contain a relatively large number of 4-cycles $C_{4}$ . For a graph G, we can define the $C_{4}$ -clustering coefficient of G, defined by $C (G; C_{4}) = \frac{4 N (G; C_{4})}{N (G; P_{4})}$ where $N (G, H)$ denotes the number of subgraph of G isomorphic to H. The usual clustering coefficient is just $C (G; C_{3})$ .

Before we define ϵ- $C_{4}$ -regular, we consider the 4-partite graph with vertex set $V_{1} ⊔$

Problems and remarks

A natural question is to derive a reasonable upper bound for the size of the ϵ-Δ-regular partition for clustering graphs. A crude upper bound as mentioned in the proof of Theorem 1 is of tower type, namely, a tower of 2's of height proportional to $1 / ϵ^{5}$ where C is the clustering coefficient and ϵ is the desired accuracy. For the original regularity lemma, Gowers [10] gave a lower bound for the size of the partition as a tower of 2's of height $1 / ϵ^{1 / 16}$ . With a slightly different definition of

References (20)

N. Alon et al.
The algorithmic aspects of the regularity lemma
J. Algorithms
(1994)
R. Andersen
A local algorithm for finding dense subgraphs
M. Charikar
Greedy approximation algorithms for finding dense components in a graph
F. Chung
Regularity lemmas for hypergraphs and quasi-randomness
Random Structures Algorithms
(1991)
F. Chung et al.
Quasi-random hypergraphs
Random Structures Algorithms
(1990)
F. Chung et al.
Sparse quasi-random graphs
Combinatorica
(2002)
F. Chung et al.
Quasi-random graphs
Combinatorica
(1989)
J. Fox et al.
A tight lower bound for Szemerédi's regularity lemma
Combinatorica
(2017)
A. Frieze et al.
A simple algorithm for constructing Szemerédi's regularity partition
Electron. J. Combin.
(1999)
W.T. Gowers
Lower bounds of tower type for Szemerédi's uniformity lemma
Geom. Funct. Anal. GAFA
(1997)

There are more references available in the full text version of this article.

Cited by (2)

Limitations on regularity lemmas for clustering graphs
2021, Advances in Applied Mathematics
Szemerédi's regularity lemma is one instance in a family of regularity lemmas, replacing the definition of density of a graph by a more general coefficient. Recently, Fan Chung proved another instance, a regularity lemma for clustering graphs, and asked whether good upper bounds could be derived for the quantitative estimates it supplies. We answer this question in the negative, for every generalized regularity lemma.
Identifying function modules from protein-protein interaction networks based on Szemerédi's Regularity Lemma
2023, International Journal of Biomathematics

View full text

Regularity lemmas for clustering graphs

Abstract

Introduction

Section snippets

Preliminaries

A regularity lemma for tripartite graphs

A strong regularity lemma for tripartite graphs

Regularity lemmas for triangle-dense graphs

Several regularity lemmas for general clustering graphs

Problems and remarks

J. Algorithms

A local algorithm for finding dense subgraphs

Greedy approximation algorithms for finding dense components in a graph

Regularity lemmas for hypergraphs and quasi-randomness

Random Structures Algorithms

Quasi-random hypergraphs

Random Structures Algorithms

Sparse quasi-random graphs

Combinatorica

Quasi-random graphs

Combinatorica

A tight lower bound for Szemerédi's regularity lemma

Combinatorica

A simple algorithm for constructing Szemerédi's regularity partition

Electron. J. Combin.

Lower bounds of tower type for Szemerédi's uniformity lemma

Geom. Funct. Anal. GAFA