Triangle minimization in large networks

Li, Rong-Hua; Yu, Jeffrey Xu

doi:10.1007/s10115-014-0800-9

Triangle minimization in large networks

Regular Paper
Published: 04 December 2014

Volume 45, pages 617–643, (2015)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Rong-Hua Li¹ &
Jeffrey Xu Yu²

469 Accesses
16 Citations
Explore all metrics

Abstract

The number of triangles is a fundamental metric for analyzing the structure and function of a network. In this paper, for the first time, we investigate the triangle minimization problem in a network under edge (node) attack, where the attacker aims to minimize the number of triangles in the network by removing \(k\) edges (nodes). We show that the triangle minimization problem under edge (node) attack is a submodular function maximization problem, which can be solved efficiently. Specifically, we propose a degree-based edge (node) removal algorithm and a near-optimal greedy edge (node) removal algorithm for approximately solving the triangle minimization problem under edge (node) attack. In addition, we introduce two pruning strategies and an approximate marginal gain evaluation technique to further speed up the greedy edge (node) removal algorithm. We conduct extensive experiments over 12 real-world datasets to evaluate the proposed algorithms, and the results demonstrate the effectiveness, efficiency and scalability of our algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new semi-local centrality for identifying influential nodes based on local average shortest path with extended neighborhood

Article Open access 13 April 2024

Graph based anomaly detection and description: a survey

Article 05 July 2014

Impacts of link removal on the synchronization of higher-order networks

Article 15 April 2024

Notes

http://news.cnet.com/delete-10-facebook-friends-get-a-free-whopper/.

References

Albert R et al (2000) Error and attack tolerance of complex networks. Nature 406:378–382
Alon N et al (1997) Finding and counting given length cycles. Algorithmica 17(3):209–223
Avron H (2010) Counting triangles in large graphs using randomized matrix trace estimation. In: Proceedings of KDD-LDMTA’10
Bar-Yossef Z et al (2002) Reductions in streaming algorithms, with an application to counting triangles in graphs. In: SODA
Barabasi A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Becchetti L et al (2008) Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: KDD
Brin S, Page L (1997) PageRank: bringing order to the web. Tech. rep, Stanford Digital Library Project
Buriol LS et al (2006) Counting triangles in data streams. In: PODS
Callaway DS et al (2000) Network robustness and fragility: percolation on random graphs. Phys Rev Lett 85(25):5468–5471
Chu S, Cheng J (2011) Triangle listing in massive networks and its applications. In: KDD
Cohen R et al (2000) Resilience of the internet to random breakdowns. Phys Rev Lett 85(21):5626–5628
Coleman JS (1988) Social capital in the creation of human capital. Am J Sociol 94:95–120
Article Google Scholar
Durand M, Flajolet P (2003) Loglog counting of large cardinalities (extended abstract). In: ESA, pp 605–617
Feige U (1998) A threshold of in n for approximating set cover. J ACM 45(4):634–652
Flajolet P et al (2003) Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In: ESA, pp 605–617
Flajolet P, Martin GN (1985) Probabilistic counting algorithms for data base applications. J Comput Syst Sci 31(2):182–209
Article MathSciNet MATH Google Scholar
Godsil C, Royle GF (2001) Algebraic graph theory. Springer, Berlin
Book MATH Google Scholar
Hanneman RA, Riddle M (2005) Introduction to social network methods. University of California, Riverside. http://faculty.ucr.edu/~hanneman/nettext/
Hochbaum DS (1996) Approximation algorithms for NP-hard problems. PWS Publishing Company, Boston, MA
Itai A, Rodeh M (1978) Finding a minimum circuit in a graph. SIAM J Comput 7(4):413–423
Jowhari H, Ghodsi M (2005) New streaming algorithms for counting triangles in graphs. In: COCOON
Kempe D et al (2003) Maximizing the spread of influence through a social network. In: KDD
Krause A, Guestrin C (2007) Near-optimal observation selection using submodular functions. In: AAAI
Krause A, Horvitz E (2008) A utility-theoretic approach to privacy and personalization. In: AAAI
Krause A et al (2008) Near-optimal sensor placements in Gaussian processes: theory, efficient algorithms and empirical studies. J Mach Learn Res 9:235–284
Latapy M (2008) Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor Comput Sci 407:1–3
Article MathSciNet Google Scholar
Leskovec J (2010) Standford network analysis project
Leskovec J et al (2007) Cost-effective outbreak detection in networks. In: KDD
Li R-H, Yu JX (2011) Scalable diversified ranking on large graphs. In: ICDM
Li R-H, Yu JX (2013) Scalable diversified ranking on large graphs. IEEE Trans Knowl Data Eng 25(9):2133–2146
Article Google Scholar
Li R-H et al (2014a) Random-walk domination in large graphs. In: ICDE
Li R-H et al (2012) Measuring robustness of complex networks under MVC attack. In: CIKM
Li R-H et al (2014b) Measuring the impact of MVC attack in large complex networks. Inf Sci 278:685–702
Article Google Scholar
Lin H, Bilmes J (2010) Multi-document summarization via budgeted maximization of submodular functions. In: HLT-NAACL
Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: ACL
McPherson M et al (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444
Minoux M (1978) Accelerated greedy algorithms for maximizing submodular set functions. Lecture Notes in Control and Information Sciences. Springer, Berlin
Nemhauser GL et al (1978) An analysis of approximations for maximizing submodular set functions-I. Math Program 14:265–294
Palmer CR et al (2002) ANF: a fast and scalable tool for data mining in massive graphs. In: KDD, pp 81–90
Schank T (2007) Algorithmic aspects of triangle-based network analysis. PhD Thesis, University Karlsruhe (TH)
Schank T, Wagner D (2005) Finding, counting and listing all triangles in large graphs, an experimental study. In: WEA
Schneider CM et al (2011) Mitigation of malicious attacks on networks. PNAS 108(10):3838–3841
Seshadhri C et al (2012) Fast triangle counting through wedge sampling. CoRR abs/1202.5230
Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: WWW
Tong H et al (2012) Gelling, and melting, large graphs by edge manipulation. In: CIKM
Tong H et al (2010) On the vulnerability of large graphs. In: ICDM
Tsourakakis CE et al (2009) DOULION: counting triangles in massive graphs with a coin. In: KDD
Vazirani VV (2001) Approximation algorithms. Springer, Berlin
Vondrak J (2010) Submodularity and curvature: the optimal algorithm. RIMS Kokyuroku Bessatsu B23:253–266
MathSciNet MATH Google Scholar
Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’ networks. Nature 393:440–442
Zafarani R, Liu H (2009) Social Computing Data Repository at ASU

Download references

Acknowledgments

We thank anonymous reviewers for their helpful comments. The work was supported in part by (1) NSFC Grant 61402292, Natural Science Foundation of SZU (Grant No. 201438) and (2) Research Grants Council of the Hong Kong SAR, China, 14209314 and 418512.

Author information

Authors and Affiliations

Guangdong Province Key Laboratory of Popular High Performance Computers, Shenzhen University, Shenzhen, China
Rong-Hua Li
The Chinese University of Hong Kong, Shatin, Hong Kong
Jeffrey Xu Yu

Authors

Rong-Hua Li
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Xu Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rong-Hua Li.

Appendix

Proof of Theorem 2.1

First, it is known that the set cover with frequency constraint (SCFC) problem is NP-hard [19, 48]. Given a ground set \(\mathcal {U}\), a collection of \(n\) subsets \(\mathcal {S}=\{S_1, S_2, \cdots , S_n\}\) where \(\bigcup _i S_i=\mathcal {U}\), and a frequency parameter \(t\) (\(t < n\)), the SCFC problem is to find the minimum number of subsets in \(\mathcal {S}\) that covers all elements in \(\mathcal {U}\). Here, the frequency parameter \(t\) denotes that every element in \(\mathcal {U}\) is included in \(t\) subsets in \(\mathcal {S}\). Let us consider a special case of the SCFC problem, which has an additional constraint that the intersection of any three subsets in \(\mathcal {S}\) has at most one element (i.e., for any \(i, j, k\) and \(i \ne j \ne k\), \(|S_i \cap S_j \cap S_k| \le 1\)). For convenience, we refer to this problem as the intersection-bounded SCFC (IBSCFC) problem. Below, we show that the IBSCFC problem is also NP-hard. Suppose to the contrary that there is a polynomial algorithm \(\mathcal {A}\) to solve the IBSCFC problem. For any \(|S_i \cap S_j \cap S_k| > 1\) in the SCFC problem, we can discard the “redundant-common elements” in the subsets \(S_i, S_j, S_k\) so that \(| \tilde{S}_i \cap \tilde{S}_j \cap \tilde{S}_k|=1\) where \(\tilde{S}_i\) denotes the subset \(S_i\) after discarding the redundant-common elements (i.e., for \(S_i, S_j, S_k\), only one common element is left). Then, the SCFC problem becomes the IBSCFC problem, and we invoke algorithm \(\mathcal {A}\) to solve it. It is important to note that the optimal solution (the selected subsets ID) obtained by algorithm \(\mathcal {A}\) is the optimal solution for the SCFC problem. The reason is as follows. For any \(S_i, S_j, S_k\) with \(|S_i \cap S_j \cap S_k| > 1\), the redundant-common elements are only in these three subsets (by our constraint, each element is included in three subsets), thus they do not affect the optimal solution. Moreover, the optimal solution obtained by algorithm \(\mathcal {A}\) must contain at least one subset from \(S_i, S_j, S_k\), because these three subsets have one common element left which must be covered by a subset in the optimal solution. By the above process, there is a polynomial algorithm for the SCFC problem, which is a contradiction.

Second, we consider the maximum coverage version of the IBSCFC problem, called IBMCFC, where the goal is to find \(k\) subsets in \(\mathcal {S}\) to maximize the cardinality of their union. It is easy to show that this problem is also NP-hard. Because if not, there is a polynomial algorithm \(\mathcal {B}\) to solve the IBMCFC problem. Since \(\bigcup _i S_i=\mathcal {U}\), we can invoke \(\mathcal {B}\) at most \(n\) times to get an optimal solution of the IBSCFC problem (enumerating \(k\) from \(1\) to \(n\)). That is to say, there is a polynomial algorithm for the IBSCFC problem, which is a contradiction.

Third, to prove the theorem, we show a reduction from the IBMCFC problem. Specifically, for each subset \(S_i\), we create an edge \(e_i\) with \(2|S_i|\) stubs, which are used to combine the end nodes of different edges. Each end node of an edge is associated with \(|S_i|\) stubs, and these stubs are labeled by the element ID in \(S_i\). Then, for any three subsets \(S_i\), \(S_j\), and \(S_k\) (\(i \ne j \ne k\)) with \(|S_i \cap S_j \cap S_k| =1\), we combine the end nodes of their corresponding edges with the same stub labels so that they can form a triangle. As an example, let \(\mathcal {U}=\{u_1, u_2\}\), \(S_1=\{u_1\}\), \(S_2=\{u_1\}\), \(S_3=\{u_1, u_2\}\), \(S_4=\{u_2\}\), \(S_5=\{u_2\}\). Clearly, each element in \(\mathcal {U}\) is in three subsets and any three subsets have at most one common element. Then, for each subset, we create an edge with stubs as shown in the left part of Fig. 5. Then, we can construct a graph as shown in the right part of Fig. 5. By this construction, each triangle is represented by an element in \(\mathcal {U}\), and each edge \(e_i\) in the resulting graph is represented by a subset \(S_i\) in \(\mathcal {S}\). As a result, the optimal solution of the triangle minimization problem (by edge removal) in the resulting graph is the optimal solution of the IBMCFC problem. Since IBMCFC is NP-hard, the triangle minimization problem by edge removal is also NP-hard. This completes the proof. \(\square \)

Proof of Theorem 2.2

Similar to the proof of Theorem 2.1, we can show a reduction from the IBMCFC problem. Following the notations used in the proof of Theorem 2.1, we create a graph \(G\) for the instance of triangle minimization problem by node removal as follows. Specifically, for each \(S_i\) in \(\mathcal {S}\), we create a node \(v_i\). For each pair \(S_i\) and \(S_j\) (\(i \ne j\)), we create an edge \((v_i, v_j)\) if and only if \(S_i \bigcap S_j \ne \emptyset \). By this construction, each node is represented by a subset, and each triangle is represented by an element. One can easily check that the optimal solution of the triangle minimization problem by node removal is the optimal solution of the IBMCFC problem. Thus, the theorem is established.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, RH., Yu, J.X. Triangle minimization in large networks. Knowl Inf Syst 45, 617–643 (2015). https://doi.org/10.1007/s10115-014-0800-9

Download citation

Received: 06 November 2013
Revised: 14 July 2014
Accepted: 29 October 2014
Published: 04 December 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10115-014-0800-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Triangle minimization in large networks

Abstract

Access this article

Similar content being viewed by others

A new semi-local centrality for identifying influential nodes based on local average shortest path with extended neighborhood

Graph based anomaly detection and description: a survey

Impacts of link removal on the synchronization of higher-order networks

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of Theorem 2.1

Proof of Theorem 2.2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Triangle minimization in large networks

Abstract

Access this article

Similar content being viewed by others

A new semi-local centrality for identifying influential nodes based on local average shortest path with extended neighborhood

Graph based anomaly detection and description: a survey

Impacts of link removal on the synchronization of higher-order networks

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Theorem 2.1

Proof of Theorem 2.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation