Skip to main content

Identifying Relevant Subgraphs in Large Networks

  • Conference paper
  • First Online:
Web Technologies and Applications (APWeb 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9865))

Included in the following conference series:

  • 1009 Accesses

Abstract

Structural relationships between objects are used to model as graphs in many applications. In this paper, we study the problem of identifying relevant subgraphs in large networks. Relevant subgraphs in large networks contain network elements which are maintained by network administrators. We formalize the problem and propose a framework consisting of two major phases. The relevance scores of all vertex pairs are computed in the offline phase, while relevant subgraphs are identified in the online phase. We analyze the relevance score measure carefully and design an efficient algorithm for relevant subgraph identification by repeatedly expanding candidate subgraphs and merging overlapping ones. Our experiments based on real data sets show that our relevant subgraphs are of high quality and can be found efficiently, which are useful for network administrators during network operation and maintenance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Information Technology Infrastructure Library, http://www.axelos.com/itil.

  2. 2.

    ITU Telecommunication Standardization Sector, http://www.itu.int/en/ITU-T.

  3. 3.

    TM Forum, https://www.tmforum.org.

References

  1. Çamoğlu, O., Can, T., Singh, A.K.: Integrating multi-attribute similarity networks for robust representation of the protein space. Bioinformatics 22(13), 1585–1592 (2006)

    Article  Google Scholar 

  2. Chakrabarti, D.: AutoPart: parameter-free graph partitioning and outlier detection. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 112–124. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  3. Cheng, J., Ke, Y., Ng, W., Yu, J.X.: Context-aware object connection discovery in large graphs. In: ICDE, pp. 856–867. IEEE (2009)

    Google Scholar 

  4. Dourisboure, Y., Geraci, F., Pellegrini, M.: Extraction and classification of dense communities in the web. In: Proceedings of the 16th WWW, pp. 461–470. ACM (2007)

    Google Scholar 

  5. Faloutsos, C., McCurley, K.S., Tomkins, A.: Fast discovery of connection subgraphs. In: Proceedings of the Tenth ACM SIGKDD, pp. 118–127. ACM (2004)

    Google Scholar 

  6. Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: Proceedings of the 31st VLDB, pp. 721–732. VLDB Endowment (2005)

    Google Scholar 

  7. Hintsanen, P., Toivonen, H., Sevon, P.: Fast discovery of reliable subnetworks. In: ASONAM, pp. 104–111. IEEE (2010)

    Google Scholar 

  8. Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD, pp. 538–543. ACM (2002)

    Google Scholar 

  9. Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the 12th WWW, pp. 271–279. ACM (2003)

    Google Scholar 

  10. Koren, Y., North, S.C., Volinsky, C.: Measuring and extracting proximity graphs in networks. ACM TKDD 1(3), 12 (2007)

    Article  Google Scholar 

  11. Lovász, L., et al.: Random walks on graphs: a survey. Comb. Paul Erdos Eighty 2, 353–398 (1996)

    MathSciNet  MATH  Google Scholar 

  12. Palmer, C.R., Faloutsos, C.: Electricity based external similarity of categorical attributes. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS, vol. 2637, pp. 486–500. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  13. Pan, J.-Y., Yang, H.-J., Faloutsos, C., Duygulu, P.: Automatic multimedia cross-modal correlation discovery. In: Proceedings of the Tenth ACM SIGKDD, pp. 653–658. ACM (2004)

    Google Scholar 

  14. Pons, P., Latapy, M.: Computing communities in large networks using random walks. In: Yolum, I., Güngör, T., Gürgen, F., Özturan, C. (eds.) ISCIS 2005. LNCS, vol. 3733, pp. 284–293. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Ramakrishnan, C., Milnor, W.H., Perry, M., Sheth, A.P.: Discovering informative connection subgraphs in multi-relational graphs. ACM SIGKDD Explor. Newslett. 7(2), 56–63 (2005)

    Article  Google Scholar 

  16. Tang, L., Li, T., Shwartz, L., Pinel, F., Grabarnik, G.Y.: An integrated framework for optimizing automatic monitoring systems in large it infrastructures. In: Proceedings of the 19th ACM SIGKDD, pp. 1249–1257. ACM (2013)

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by Nanjing University of Posts and Telecommunications under Grants No. NY215045 and NY214135, and Ministry of Education/China Mobile joint research grant under Project No. 5–10.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Liu .

Editor information

Editors and Affiliations

Appendices

Appendix

A Relationship Between Expected f-Distance and Random Walk with Restart

The vertex relevance matrix using the expected f-distance is not much different from one using random walk with restart. The proof is presented below. Based on the iterative form of the definition of random walk with restart, the vertex relevance score matrix \({\varPi '^l}\) of graph \(G_i\) can be expressed as following.

$$\begin{aligned} \begin{aligned} {\varPi '}^l&= (1-c) {\varPi '}^{l-1} P + cI ,\\ \end{aligned} \end{aligned}$$
(6)

where c is the restart probability, P is the transition matrix of G and I is identity matrix. Then we have

$$\begin{aligned} \begin{aligned} {\varPi '}^l&= (1-c) {\varPi '}^{l-1} P + cI \\&= (1-c) ( (1-c) {\varPi '}^{l-2} P + cI)P + cI \\&= (1-c)^l P^l + c \sum _{\gamma =1}^{l-1}(1-c)^\gamma P^\gamma + cI \\&= c(1-c)^l P^l + c \sum _{\gamma =1}^{l-1}(1-c)^\gamma P^\gamma + (1-c)^{l+1}P^l + cI \\&= \sum _{\gamma =1}^{l}c(1-c)^\gamma P^\gamma + (1-c)^{l+1}P^l + cI \\&= \varPi ^l + (1-c)^{l+1}P^l + cI .\\ \end{aligned} \end{aligned}$$
(7)

The last line of Eq. (7) contains three items. The first item is the vertex relevance matrix \({\varPi ^l_i}\) using the expected f-distance. The third item cI affects only the diagonal entries of the vertex relevance matrix, which is ignored since we do not consider the vertex self-relevance. Then, the difference using random walk with restart and the expected f-distance results in the second item \((1-c)^{l+1}P^l_i\). When l goes to infinity, the vertex relevance matrices using expected f-Distance and random walk with restart are the same except the diagonal entries. Even when l is small, the corresponding entries of two matrices do not differ so much since \((1-c)^{l+1}P^l\) is very small comparing with \(\varPi ^l = \sum _{\gamma =1}^{l}c(1-c)^\gamma P^\gamma \).

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Liu, Z., Guo, S., Li, T., Chen, W. (2016). Identifying Relevant Subgraphs in Large Networks. In: Morishima, A., et al. Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9865. Springer, Cham. https://doi.org/10.1007/978-3-319-45835-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45835-9_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45834-2

  • Online ISBN: 978-3-319-45835-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics