Skip to main content

Graph Clustering Based on Optimization of a Macroscopic Structure of Clusters

  • Conference paper
Discovery Science (DS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6926))

Included in the following conference series:

  • 1357 Accesses

Abstract

A graph is a flexible data structure for various data, such as the Web, SNSs and molecular architectures. Not only the data expressed naturally by a graph, it is also used for data which does not have explicit graph structures by extracting implicit relationships hidden in data, e.g. co-occurrence relationships of words in text and similarity relationships of pixels of an image. By the extraction, we can make full use of many sophisticated methods for graphs to solve a wide range of problems. In analysis of graphs, the graph clustering problem is one of the most important problems, which is to divide all vertices of a given graph into some groups called clusters. Existing algorithms for the problem typically assume that the number of intra-cluster edges is large while the number of inter-cluster edges is absolutely small. Therefore these algorithms fail to do clustering in case of noisy graphs, and the extraction of implicit relationships tends to yield noisy ones because it is subject to a definition of a relation among vertices. Instead of such an assumption, we introduce a macroscopic structure (MS), which is a graph of clusters and roughly describes a structure of a given graph. This paper presents a graph clustering algorithm which, given a graph and the number of clusters, tries to find a set of clusters such that the distance between an MS induced from calculated clusters and the ideal MS for the given number of clusters is minimized. In other words, it solves the clustering problem as an optimization problem. For the m-clustering problem, the ideal MS is defined as an m-vertex graph such that each vertex has only a self-loop. To confirm the performance improvements exhaustively, we conducted experiments with artificial graphs with different amounts of noise. The results show that our method can handle very noisy graphs correctly while existing algorithms completely failed to do clustering. Furthermore, even for graphs with less noise, our algorithm treats them well if the difference between edge densities of intra-cluster edges and those of inter-cluster edges are sufficiently big. We also did experiments on graphs transformed from vector data as a more practical case. From the results we found that our algorithm, indeed, works much better on noisy graphs than the existing ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ammann, P., Wijesekera, D., Kaushik, S.: Scalable, graph-based network vulnerability analysis. In: Proceedings of the 9th ACM Conference on Computer and Communications Security, pp. 217–224. ACM, New York (2002)

    Google Scholar 

  2. Angelova, R., Weikum, G.: Graph-based text classification: learn from your neighbors. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 485–492. ACM, New York (2006)

    Google Scholar 

  3. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)

    Article  Google Scholar 

  4. Csárdi, G., Nepusz, T.: The igraph software package for complex network research. InterJournal Complex Systems 1695 (2006), http://cneurocvs.rmki.kfki.hu/igraph

  5. van Dongen, S.: Graph clustering by flow simulation. Ph.D. thesis, University of Utrecht (May 2000)

    Google Scholar 

  6. Dorow, B., Widdows, D., Ling, K., Eckmann, J.P., Sergi, D., Moses, E.: Using curvature and markov clustering in graphs for lexical acquisition and word sense discrimination. Arxiv preprint cond-mat/0403693 (2004)

    Google Scholar 

  7. Dutt, S., Deng, W.: Cluster-aware iterative improvement techniques for partitioning large VLSI circuits. ACM Transactions on Design Automation of Electronic Systems (TODAES) 7(1), 91–121 (2002)

    Article  Google Scholar 

  8. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annals of Human Genetics 7(2), 179–188 (1936), http://dx.doi.org/10.1111/j.1469-1809.1936.tb02137.x

    Google Scholar 

  9. Gerhardt, G.J.L., Lemke, N., Corso, G.: Network clustering coefficient approach to DNA sequence analysis. Chaos, Solitons & Fractals 28(4), 1037–1045 (2006)

    Article  MATH  Google Scholar 

  10. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America 99(12), 7821–7826 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  11. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  12. Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S.: The web as a graph: Measurements, models, and methods. In: Asano, T., Imai, H., Lee, D.T., Nakano, S.-i., Tokuyama, T. (eds.) COCOON 1999. LNCS, vol. 1627, pp. 1–17. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  13. Liu, D.C., Nocedal, J.: On the limited memory bfgs method for large scale optimization. Mathematical Programming 45(1), 503–528 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  14. Liu, X., Bollen, J., Nelson, M.L., Van de Sompel, H.: Co-authorship networks in the digital library research community. Information Processing & Management 41(6), 1462–1480 (2005)

    Article  Google Scholar 

  15. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Physical review E 69(2), 26113 (2004)

    Article  Google Scholar 

  16. Otte, E., Rousseau, R.: Social network analysis: a powerful strategy, also for the information sciences. Journal of Information Science 28(6), 441 (2002)

    Article  Google Scholar 

  17. Rual, J.F., Venkatesan, K., Hao, T., Hirozane-Kishikawa, T., Dricot, A., Li, N., Berriz, G.F., Gibbons, F.D., Dreze, M., Ayivi-Guedehoussou, N., et al.: Towards a proteome-scale map of the human protein–protein interaction network. Nature 437(7062), 1173–1178 (2005)

    Article  Google Scholar 

  18. Sharon, E., Brandt, A., Basri, R.: Fast multiscale image segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 70–77. IEEE, Los Alamitos (2000)

    Google Scholar 

  19. Strehl, A., Strehl, E., Ghosh, J., Mooney, R.: Impact of similarity measures on Web-page clustering. In: Workshop on Artificial Intelligence for Web Search, AAAI 2000 (2000)

    Google Scholar 

  20. Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming 106(1), 25–57 (2006)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Taniguchi, Y., Ikeda, D. (2011). Graph Clustering Based on Optimization of a Macroscopic Structure of Clusters. In: Elomaa, T., Hollmén, J., Mannila, H. (eds) Discovery Science. DS 2011. Lecture Notes in Computer Science(), vol 6926. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24477-3_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24477-3_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24476-6

  • Online ISBN: 978-3-642-24477-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics