Identifying Relevant Subgraphs in Large Networks

Liu, Zheng; Guo, Shuting; Li, Tao; Chen, Wenyan

doi:10.1007/978-3-319-45835-9_13

Zheng Liu²²,
Shuting Guo²²,
Tao Li²² &
…
Wenyan Chen²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9865))

Included in the following conference series:

Asia-Pacific Web Conference

1021 Accesses

Abstract

Structural relationships between objects are used to model as graphs in many applications. In this paper, we study the problem of identifying relevant subgraphs in large networks. Relevant subgraphs in large networks contain network elements which are maintained by network administrators. We formalize the problem and propose a framework consisting of two major phases. The relevance scores of all vertex pairs are computed in the offline phase, while relevant subgraphs are identified in the online phase. We analyze the relevance score measure carefully and design an efficient algorithm for relevant subgraph identification by repeatedly expanding candidate subgraphs and merging overlapping ones. Our experiments based on real data sets show that our relevant subgraphs are of high quality and can be found efficiently, which are useful for network administrators during network operation and maintenance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Top-k overlapping densest subgraphs

Article 26 May 2016

ArcMatch: high-performance subgraph matching for labeled graphs by exploiting edge domains

Article Open access 07 August 2024

Towards Efficient k-TriPeak Decomposition on Large Graphs

Notes

1.
Information Technology Infrastructure Library, http://www.axelos.com/itil.
2.
ITU Telecommunication Standardization Sector, http://www.itu.int/en/ITU-T.
3.
TM Forum, https://www.tmforum.org.

References

Çamoğlu, O., Can, T., Singh, A.K.: Integrating multi-attribute similarity networks for robust representation of the protein space. Bioinformatics 22(13), 1585–1592 (2006)
Article Google Scholar
Chakrabarti, D.: AutoPart: parameter-free graph partitioning and outlier detection. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 112–124. Springer, Heidelberg (2004)
Chapter Google Scholar
Cheng, J., Ke, Y., Ng, W., Yu, J.X.: Context-aware object connection discovery in large graphs. In: ICDE, pp. 856–867. IEEE (2009)
Google Scholar
Dourisboure, Y., Geraci, F., Pellegrini, M.: Extraction and classification of dense communities in the web. In: Proceedings of the 16th WWW, pp. 461–470. ACM (2007)
Google Scholar
Faloutsos, C., McCurley, K.S., Tomkins, A.: Fast discovery of connection subgraphs. In: Proceedings of the Tenth ACM SIGKDD, pp. 118–127. ACM (2004)
Google Scholar
Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: Proceedings of the 31st VLDB, pp. 721–732. VLDB Endowment (2005)
Google Scholar
Hintsanen, P., Toivonen, H., Sevon, P.: Fast discovery of reliable subnetworks. In: ASONAM, pp. 104–111. IEEE (2010)
Google Scholar
Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD, pp. 538–543. ACM (2002)
Google Scholar
Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the 12th WWW, pp. 271–279. ACM (2003)
Google Scholar
Koren, Y., North, S.C., Volinsky, C.: Measuring and extracting proximity graphs in networks. ACM TKDD 1(3), 12 (2007)
Article Google Scholar
Lovász, L., et al.: Random walks on graphs: a survey. Comb. Paul Erdos Eighty 2, 353–398 (1996)
MathSciNet MATH Google Scholar
Palmer, C.R., Faloutsos, C.: Electricity based external similarity of categorical attributes. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS, vol. 2637, pp. 486–500. Springer, Heidelberg (2003)
Chapter Google Scholar
Pan, J.-Y., Yang, H.-J., Faloutsos, C., Duygulu, P.: Automatic multimedia cross-modal correlation discovery. In: Proceedings of the Tenth ACM SIGKDD, pp. 653–658. ACM (2004)
Google Scholar
Pons, P., Latapy, M.: Computing communities in large networks using random walks. In: Yolum, I., Güngör, T., Gürgen, F., Özturan, C. (eds.) ISCIS 2005. LNCS, vol. 3733, pp. 284–293. Springer, Heidelberg (2005)
Chapter Google Scholar
Ramakrishnan, C., Milnor, W.H., Perry, M., Sheth, A.P.: Discovering informative connection subgraphs in multi-relational graphs. ACM SIGKDD Explor. Newslett. 7(2), 56–63 (2005)
Article Google Scholar
Tang, L., Li, T., Shwartz, L., Pinel, F., Grabarnik, G.Y.: An integrated framework for optimizing automatic monitoring systems in large it infrastructures. In: Proceedings of the 19th ACM SIGKDD, pp. 1249–1257. ACM (2013)
Google Scholar

Download references

Acknowledgments

This work was supported in part by Nanjing University of Posts and Telecommunications under Grants No. NY215045 and NY214135, and Ministry of Education/China Mobile joint research grant under Project No. 5–10.

Author information

Authors and Affiliations

School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, China
Zheng Liu, Shuting Guo, Tao Li & Wenyan Chen

Authors

Zheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shuting Guo
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenyan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Liu .

Editor information

Editors and Affiliations

University of Tsukuba , Tsukuba, Japan
Atsuyuki Morishima
East China Normal University , Shanghai, China
Rong Zhang
University of New South Wales , Sydney, New South Wales, Australia
Wenjie Zhang
The University of New South Wales, School of Computer Science and Engineering, Sydney, Australia
Lijun Chang
Advanced Digital Sciences Center, Illinois at Singapore Pte Ltd, Advanced Digital Sciences Center, Fusionopolies Way, Singapore
Tom Z. J Fu
Hyundai Motor Tower, Pivotal Inc 15/F, Hyundai Motor Tower, Chaoyang, China
Kuien Liu
Illinois at Singapore Pte Ltd, Advanced Digital Sciences Center , Singapore, Singapore
Xiaoyan Yang
South China Normal University, School of Computer Science, Guangzhou, China
Jia Zhu
Department of Computer Science, Hong Kong Baptist University Department of Computer Science, Kowloon Tong, Hong Kong
Zhiwei Zhang

Appendices

Appendix

A Relationship Between Expected f-Distance and Random Walk with Restart

The vertex relevance matrix using the expected f-distance is not much different from one using random walk with restart. The proof is presented below. Based on the iterative form of the definition of random walk with restart, the vertex relevance score matrix ${\varPi '^l}$ of graph $G_i$ can be expressed as following.

$$\begin{aligned} \begin{aligned} {\varPi '}^l&= (1-c) {\varPi '}^{l-1} P + cI ,\\ \end{aligned} \end{aligned}$$

(6)

where c is the restart probability, P is the transition matrix of G and I is identity matrix. Then we have

$$\begin{aligned} \begin{aligned} {\varPi '}^l&= (1-c) {\varPi '}^{l-1} P + cI \\&= (1-c) ( (1-c) {\varPi '}^{l-2} P + cI)P + cI \\&= (1-c)^l P^l + c \sum _{\gamma =1}^{l-1}(1-c)^\gamma P^\gamma + cI \\&= c(1-c)^l P^l + c \sum _{\gamma =1}^{l-1}(1-c)^\gamma P^\gamma + (1-c)^{l+1}P^l + cI \\&= \sum _{\gamma =1}^{l}c(1-c)^\gamma P^\gamma + (1-c)^{l+1}P^l + cI \\&= \varPi ^l + (1-c)^{l+1}P^l + cI .\\ \end{aligned} \end{aligned}$$

(7)

The last line of Eq. (7) contains three items. The first item is the vertex relevance matrix ${\varPi ^l_i}$ using the expected f-distance. The third item cI affects only the diagonal entries of the vertex relevance matrix, which is ignored since we do not consider the vertex self-relevance. Then, the difference using random walk with restart and the expected f-distance results in the second item $(1-c)^{l+1}P^l_i$. When l goes to infinity, the vertex relevance matrices using expected f-Distance and random walk with restart are the same except the diagonal entries. Even when l is small, the corresponding entries of two matrices do not differ so much since $(1-c)^{l+1}P^l$ is very small comparing with $\varPi ^l = \sum _{\gamma =1}^{l}c(1-c)^\gamma P^\gamma $.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Z., Guo, S., Li, T., Chen, W. (2016). Identifying Relevant Subgraphs in Large Networks. In: Morishima, A., et al. Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9865. Springer, Cham. https://doi.org/10.1007/978-3-319-45835-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-45835-9_13
Published: 22 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45834-2
Online ISBN: 978-3-319-45835-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics