Abstract
How can we effectively detect fake reviews or fraudulent connections on a website? How can we spot communities that suddenly appear based on users’ interaction? And how can we efficiently find the minimum cut in a big graph? All of these are related to the problem of finding dense subgraphs, an important primitive problem in graph data analysis with extensive applications across various domains.
We focus on formulating the problem of detecting the densest subgraph in real-world large graphs, and we theoretically compare and contrast several closely related problems. Moreover, we propose a unified framework for the densest subgraph detection (GenDS) and devise a simple and computationally efficient algorithm, SpecGreedy, to solve it by leveraging the graph spectral properties with a greedy approach. We conduct thorough experiments on 40 real-world networks with up to 1.47 billion edges from various domains, and demonstrate that our algorithm yields up to \(58.6 \times \) speedup and achieves better or approximately equal-quality solutions for the densest subgraph detection compared to the baselines. Moreover, SpecGreedy scales linearly with the graph size and is proved effective in applications, such as finding collaborations that appear suddenly in a big, time-evolving co-authorship network.
W. Feng, S. Liu, H. Shen and X. Cheng—They are also with CAS Key Laboratory of Network Data Science & Technology, CAS, and University of Chinese Academy of Sciences, Beijing 100049, China.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In the other setting with \(\mathbf {Q}= \mathbf {D}\), this problem is also equivalent to set \(\mathbf {P}= -\mathbf {D}^{-1/2} \mathbf {L}\mathbf {D}^{-1/2}\), i.e., the normalized Laplacian matrix of \(\mathcal {G}\), and \(\mathbf {Q}= \mathbf {I}\).
- 2.
- 3.
The proof details of the theorem refer to [10].
- 4.
If \(\mathbf {A}_r\) is the symmetric matrix as in Eq. (9), \(|{L}| = |{R}| = n\) and \(\varDelta _{{L}} = \varDelta _{{R}} = \nicefrac {1}{\sqrt{n}}\).
References
Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey. Data Min. Knowl. Discov. 29(3), 626–688 (2015). https://doi.org/10.1007/s10618-014-0365-y
Andersen, R., Chellapilla, K.: Finding dense subgraphs with size bounds. In: Avrachenkov, K., Donato, D., Litvak, N. (eds.) WAW 2009. LNCS, vol. 5427, pp. 25–37. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-95995-3_3
Andersen, R., Cioaba, S.M.: Spectral densest subgraph and independence number of a graph. J. UCS 13(11), 1501–1513 (2007)
Asahiro, Y., Iwama, K., Tamaki, H., Tokuyama, T.: Greedily finding a dense subgraph. J. Algorithms 34(2), 203–221 (2000)
Boob, D., et al.: Flowless: Extracting densest subgraphs without flow computations. In: WWW 2020 (2020)
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: SDM, pp. 442–446. SIAM (2004)
Charikar, M.: Greedy approximation algorithms for finding dense components in a graph. In: Jansen, K., Khuller, S. (eds.) APPROX 2000. LNCS, vol. 1913, pp. 84–95. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44436-X_10
Chen, J., Saad, Y.: Dense subgraph extraction with application to community detection. In: IEEE TKDE (2010)
Chu, L., Wang, Z., Pei, J., Wang, J., Zhao, Z., Chen, E.: Finding gangs in war from signed networks. In: KDD, pp. 1505–1514. ACM (2016)
Fan, R.K.C.: Spectral graph theory. American Mathematical Society (1996)
Dax, A.: From eigenvalues to singular values: a review. APM 3, 17 (2013)
Eikmeier, N., Gleich, D.F.: Revisiting power-law distributions in spectra of real world networks. In: KDD, pp. 817–826 (2017)
Goldberg, A.V.: Finding a maximum density subgraph. UCB (1984)
Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press, Baltimore (2012)
Hooi, B., Song, H.A., Beutel, A., Shah, N., Shin, K., Faloutsos, C.: FRAUDAR: bounding graph fraud in the face of camouflage. In: SIGKDD, pp. 895–904 (2016)
Lee, V.E., Ruan, N., Jin, R., Aggarwal, C.: A survey of algorithms for dense subgraph discovery. In: Aggarwal, C., Wang, H. (eds.) Managing and Mining Graph Data. Advances in Database Systems, vol. 40, pp. 303–336. Springer, Boston (2010). https://doi.org/10.1007/978-1-4419-6045-0_10
Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., Ghahramani, Z.: Kronecker graphs: an approach to modeling networks. JMLR 11, 985–1042 (2010)
Li, Z., Zhang, S., Wang, R.-S., Zhang, X.-S., Chen, L.: Erratum: quantitative function for community detection. Phys. Rev. E 91(1), 019901 (2015)
Liu, S., Hooi, B., Faloutsos, C.: A contrast metric for fraud detection in rich graphs. TKDE 31(12), 2235–2248 (2018)
Liu, Y., Zhu, L., Szekely, P.A., Galstyan, A., Koutra, D.: Coupled clustering of time-series and networks. In: SDM, pp. 531–539. SIAM (2019)
Miyauchi, A., Kakimura, N.: Finding a dense subgraph with sparse cut. In: CIKM (2018)
Papailiopoulos, D., Mitliagkas, I., Dimakis, A., Caramanis, C.: Finding dense subgraphs via low-rank bilinear optimization. In: ICML, pp. 1890–1898 (2014)
Pavan, M., Pelillo, M.: Dominant sets and pairwise clustering. IEEE Trans. Pattern Anal. Mach. Intell. 29(1), 167–172 (2006)
Prakash, B.A., Sridharan, A., Seshadri, M., Machiraju, S., Faloutsos, C.: EigenSpokes: surprising patterns and scalable community chipping in large graphs. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS (LNAI), vol. 6119, pp. 435–448. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13672-6_42
Shen, H.-W., Cheng, X.-Q.: Spectral methods for the detection of network community structure: a comparative analysis. JSTAT 2010(10), P10020 (2010)
Tsourakakis, C.E.: Fast counting of triangles in large real networks without counting: algorithms and laws. In: ICDM, pp. 608–617. IEEE (2008)
Tsourakakis, C.E., Chen, T., Kakimura, N., Pachocki, J.: Novel dense subgraph discovery primitives: risk aversion and exclusion queries. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11906, pp. 378–394. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46150-8_23
Wan, H., Zhang, Y., Zhang, J., Tang, J.: AMiner: search and mining of academic social networks. Data Intell. 1(1), 58–76 (2019)
Wang, Z., Chu, L., Pei, J., Al-Barakati, A., Chen, E.: Tradeoffs between density and size in extracting dense subgraphs: a unified framework. In: ASONAM (2016)
Wong, S.W., Pastrello, C., Kotlyar, M., Faloutsos, C., Jurisica, I.: SDREGION: fast spotting of changing communities in biological networks. In: SIGKDD (2018)
Yang, Y., Chu, L., Zhang, Y., Wang, Z., Pei, J., Chen, E.: Mining density contrast subgraphs. In: ICDE, pp. 221–232. IEEE (2018)
Yin, H., Benson, A.R., Leskovec, J., Gleich, D.F.: Local higher-order graph clustering. In: KDD, pp. 555–564 (2017)
Acknowledgments
This work was upported by the Strategic Priority Research Program of Chinese Academy of Sciences, Grant No. XDA19020400, NSF of China No. 61772498, U1911401, 61872206, 91746301, National Science Foundation under Grant No. IIS 1845491, and Army Young Investigator Award No. W911NF1810397.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Feng, W., Liu, S., Koutra, D., Shen, H., Cheng, X. (2021). SpecGreedy: Unified Dense Subgraph Detection. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12457. Springer, Cham. https://doi.org/10.1007/978-3-030-67658-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-67658-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67657-5
Online ISBN: 978-3-030-67658-2
eBook Packages: Computer ScienceComputer Science (R0)