Abstract
The number of triangles is a fundamental metric for analyzing the structure and function of a network. In this paper, for the first time, we investigate the triangle minimization problem in a network under edge (node) attack, where the attacker aims to minimize the number of triangles in the network by removing \(k\) edges (nodes). We show that the triangle minimization problem under edge (node) attack is a submodular function maximization problem, which can be solved efficiently. Specifically, we propose a degree-based edge (node) removal algorithm and a near-optimal greedy edge (node) removal algorithm for approximately solving the triangle minimization problem under edge (node) attack. In addition, we introduce two pruning strategies and an approximate marginal gain evaluation technique to further speed up the greedy edge (node) removal algorithm. We conduct extensive experiments over 12 real-world datasets to evaluate the proposed algorithms, and the results demonstrate the effectiveness, efficiency and scalability of our algorithms.
Similar content being viewed by others
References
Albert R et al (2000) Error and attack tolerance of complex networks. Nature 406:378–382
Alon N et al (1997) Finding and counting given length cycles. Algorithmica 17(3):209–223
Avron H (2010) Counting triangles in large graphs using randomized matrix trace estimation. In: Proceedings of KDD-LDMTA’10
Bar-Yossef Z et al (2002) Reductions in streaming algorithms, with an application to counting triangles in graphs. In: SODA
Barabasi A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Becchetti L et al (2008) Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: KDD
Brin S, Page L (1997) PageRank: bringing order to the web. Tech. rep, Stanford Digital Library Project
Buriol LS et al (2006) Counting triangles in data streams. In: PODS
Callaway DS et al (2000) Network robustness and fragility: percolation on random graphs. Phys Rev Lett 85(25):5468–5471
Chu S, Cheng J (2011) Triangle listing in massive networks and its applications. In: KDD
Cohen R et al (2000) Resilience of the internet to random breakdowns. Phys Rev Lett 85(21):5626–5628
Coleman JS (1988) Social capital in the creation of human capital. Am J Sociol 94:95–120
Durand M, Flajolet P (2003) Loglog counting of large cardinalities (extended abstract). In: ESA, pp 605–617
Feige U (1998) A threshold of in n for approximating set cover. J ACM 45(4):634–652
Flajolet P et al (2003) Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In: ESA, pp 605–617
Flajolet P, Martin GN (1985) Probabilistic counting algorithms for data base applications. J Comput Syst Sci 31(2):182–209
Godsil C, Royle GF (2001) Algebraic graph theory. Springer, Berlin
Hanneman RA, Riddle M (2005) Introduction to social network methods. University of California, Riverside. http://faculty.ucr.edu/~hanneman/nettext/
Hochbaum DS (1996) Approximation algorithms for NP-hard problems. PWS Publishing Company, Boston, MA
Itai A, Rodeh M (1978) Finding a minimum circuit in a graph. SIAM J Comput 7(4):413–423
Jowhari H, Ghodsi M (2005) New streaming algorithms for counting triangles in graphs. In: COCOON
Kempe D et al (2003) Maximizing the spread of influence through a social network. In: KDD
Krause A, Guestrin C (2007) Near-optimal observation selection using submodular functions. In: AAAI
Krause A, Horvitz E (2008) A utility-theoretic approach to privacy and personalization. In: AAAI
Krause A et al (2008) Near-optimal sensor placements in Gaussian processes: theory, efficient algorithms and empirical studies. J Mach Learn Res 9:235–284
Latapy M (2008) Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor Comput Sci 407:1–3
Leskovec J (2010) Standford network analysis project
Leskovec J et al (2007) Cost-effective outbreak detection in networks. In: KDD
Li R-H, Yu JX (2011) Scalable diversified ranking on large graphs. In: ICDM
Li R-H, Yu JX (2013) Scalable diversified ranking on large graphs. IEEE Trans Knowl Data Eng 25(9):2133–2146
Li R-H et al (2014a) Random-walk domination in large graphs. In: ICDE
Li R-H et al (2012) Measuring robustness of complex networks under MVC attack. In: CIKM
Li R-H et al (2014b) Measuring the impact of MVC attack in large complex networks. Inf Sci 278:685–702
Lin H, Bilmes J (2010) Multi-document summarization via budgeted maximization of submodular functions. In: HLT-NAACL
Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: ACL
McPherson M et al (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444
Minoux M (1978) Accelerated greedy algorithms for maximizing submodular set functions. Lecture Notes in Control and Information Sciences. Springer, Berlin
Nemhauser GL et al (1978) An analysis of approximations for maximizing submodular set functions-I. Math Program 14:265–294
Palmer CR et al (2002) ANF: a fast and scalable tool for data mining in massive graphs. In: KDD, pp 81–90
Schank T (2007) Algorithmic aspects of triangle-based network analysis. PhD Thesis, University Karlsruhe (TH)
Schank T, Wagner D (2005) Finding, counting and listing all triangles in large graphs, an experimental study. In: WEA
Schneider CM et al (2011) Mitigation of malicious attacks on networks. PNAS 108(10):3838–3841
Seshadhri C et al (2012) Fast triangle counting through wedge sampling. CoRR abs/1202.5230
Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: WWW
Tong H et al (2012) Gelling, and melting, large graphs by edge manipulation. In: CIKM
Tong H et al (2010) On the vulnerability of large graphs. In: ICDM
Tsourakakis CE et al (2009) DOULION: counting triangles in massive graphs with a coin. In: KDD
Vazirani VV (2001) Approximation algorithms. Springer, Berlin
Vondrak J (2010) Submodularity and curvature: the optimal algorithm. RIMS Kokyuroku Bessatsu B23:253–266
Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’ networks. Nature 393:440–442
Zafarani R, Liu H (2009) Social Computing Data Repository at ASU
Acknowledgments
We thank anonymous reviewers for their helpful comments. The work was supported in part by (1) NSFC Grant 61402292, Natural Science Foundation of SZU (Grant No. 201438) and (2) Research Grants Council of the Hong Kong SAR, China, 14209314 and 418512.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Theorem 2.1
First, it is known that the set cover with frequency constraint (SCFC) problem is NP-hard [19, 48]. Given a ground set \(\mathcal {U}\), a collection of \(n\) subsets \(\mathcal {S}=\{S_1, S_2, \cdots , S_n\}\) where \(\bigcup _i S_i=\mathcal {U}\), and a frequency parameter \(t\) (\(t < n\)), the SCFC problem is to find the minimum number of subsets in \(\mathcal {S}\) that covers all elements in \(\mathcal {U}\). Here, the frequency parameter \(t\) denotes that every element in \(\mathcal {U}\) is included in \(t\) subsets in \(\mathcal {S}\). Let us consider a special case of the SCFC problem, which has an additional constraint that the intersection of any three subsets in \(\mathcal {S}\) has at most one element (i.e., for any \(i, j, k\) and \(i \ne j \ne k\), \(|S_i \cap S_j \cap S_k| \le 1\)). For convenience, we refer to this problem as the intersection-bounded SCFC (IBSCFC) problem. Below, we show that the IBSCFC problem is also NP-hard. Suppose to the contrary that there is a polynomial algorithm \(\mathcal {A}\) to solve the IBSCFC problem. For any \(|S_i \cap S_j \cap S_k| > 1\) in the SCFC problem, we can discard the “redundant-common elements” in the subsets \(S_i, S_j, S_k\) so that \(| \tilde{S}_i \cap \tilde{S}_j \cap \tilde{S}_k|=1\) where \(\tilde{S}_i\) denotes the subset \(S_i\) after discarding the redundant-common elements (i.e., for \(S_i, S_j, S_k\), only one common element is left). Then, the SCFC problem becomes the IBSCFC problem, and we invoke algorithm \(\mathcal {A}\) to solve it. It is important to note that the optimal solution (the selected subsets ID) obtained by algorithm \(\mathcal {A}\) is the optimal solution for the SCFC problem. The reason is as follows. For any \(S_i, S_j, S_k\) with \(|S_i \cap S_j \cap S_k| > 1\), the redundant-common elements are only in these three subsets (by our constraint, each element is included in three subsets), thus they do not affect the optimal solution. Moreover, the optimal solution obtained by algorithm \(\mathcal {A}\) must contain at least one subset from \(S_i, S_j, S_k\), because these three subsets have one common element left which must be covered by a subset in the optimal solution. By the above process, there is a polynomial algorithm for the SCFC problem, which is a contradiction.
Second, we consider the maximum coverage version of the IBSCFC problem, called IBMCFC, where the goal is to find \(k\) subsets in \(\mathcal {S}\) to maximize the cardinality of their union. It is easy to show that this problem is also NP-hard. Because if not, there is a polynomial algorithm \(\mathcal {B}\) to solve the IBMCFC problem. Since \(\bigcup _i S_i=\mathcal {U}\), we can invoke \(\mathcal {B}\) at most \(n\) times to get an optimal solution of the IBSCFC problem (enumerating \(k\) from \(1\) to \(n\)). That is to say, there is a polynomial algorithm for the IBSCFC problem, which is a contradiction.
Third, to prove the theorem, we show a reduction from the IBMCFC problem. Specifically, for each subset \(S_i\), we create an edge \(e_i\) with \(2|S_i|\) stubs, which are used to combine the end nodes of different edges. Each end node of an edge is associated with \(|S_i|\) stubs, and these stubs are labeled by the element ID in \(S_i\). Then, for any three subsets \(S_i\), \(S_j\), and \(S_k\) (\(i \ne j \ne k\)) with \(|S_i \cap S_j \cap S_k| =1\), we combine the end nodes of their corresponding edges with the same stub labels so that they can form a triangle. As an example, let \(\mathcal {U}=\{u_1, u_2\}\), \(S_1=\{u_1\}\), \(S_2=\{u_1\}\), \(S_3=\{u_1, u_2\}\), \(S_4=\{u_2\}\), \(S_5=\{u_2\}\). Clearly, each element in \(\mathcal {U}\) is in three subsets and any three subsets have at most one common element. Then, for each subset, we create an edge with stubs as shown in the left part of Fig. 5. Then, we can construct a graph as shown in the right part of Fig. 5. By this construction, each triangle is represented by an element in \(\mathcal {U}\), and each edge \(e_i\) in the resulting graph is represented by a subset \(S_i\) in \(\mathcal {S}\). As a result, the optimal solution of the triangle minimization problem (by edge removal) in the resulting graph is the optimal solution of the IBMCFC problem. Since IBMCFC is NP-hard, the triangle minimization problem by edge removal is also NP-hard. This completes the proof. \(\square \)
Proof of Theorem 2.2
Similar to the proof of Theorem 2.1, we can show a reduction from the IBMCFC problem. Following the notations used in the proof of Theorem 2.1, we create a graph \(G\) for the instance of triangle minimization problem by node removal as follows. Specifically, for each \(S_i\) in \(\mathcal {S}\), we create a node \(v_i\). For each pair \(S_i\) and \(S_j\) (\(i \ne j\)), we create an edge \((v_i, v_j)\) if and only if \(S_i \bigcap S_j \ne \emptyset \). By this construction, each node is represented by a subset, and each triangle is represented by an element. One can easily check that the optimal solution of the triangle minimization problem by node removal is the optimal solution of the IBMCFC problem. Thus, the theorem is established.
Rights and permissions
About this article
Cite this article
Li, RH., Yu, J.X. Triangle minimization in large networks. Knowl Inf Syst 45, 617–643 (2015). https://doi.org/10.1007/s10115-014-0800-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0800-9