skip to main content
10.1145/1281192.1281262acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

A spectral clustering approach to optimally combining numericalvectors with a modular network

Published:12 August 2007Publication History

ABSTRACT

We address the issue of clustering numerical vectors with a network. The problem setting is basically equivalent to constrained clustering by Wagstaff and Cardie and semi-supervised clustering by Basu et al., but our focus is more on the optimal combination of two heterogeneous data sources. An application of this setting is web pages which can be numerically vectorized by their contents, e.g. term frequencies, and which are hyperlinked to each other, showing a network. Another typical application is genes whose behavior can be numerically measured and a gene network can be given from another data source.We first define a new graph clustering measure which we call normalized network modularity, by balancing the cluster size of the original modularity. We then propose a new clustering method which integrates the cost of clustering numerical vectors with the cost of maximizing the normalized network modularity into a spectral relaxation problem. Our learning algorithm is based on spectral clustering which makes our issue an eigenvalue problem and uses k-means for final cluster assignments. A significant advantage of our method is that we can optimize the weight parameter for balancing the two costs from the given data by choosing the minimum total cost. We evaluated the performance of our proposed method using a variety of datasets including synthetic data as well as real-world data from molecular biology. Experimental results showed that our method is effective enough to have good results for clustering by numerical vectors and a network.

References

  1. A.-L. Barabási and A. Reka. Emergence of scaling in random networks. Science, 286: 509--512, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  2. S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In KDD, pages 59--68, August 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I. S. Dhillon, Y. Guan, and B. Kulis. Kernel k-means, spectral clustering and normalized cuts. In KDD, pages 551--556, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. S. Dhillon and S. Sra. Modeling data using directional distributions. Technical Report TR--06--03, University of Texas, Dept. of Computer Sciences, 2003.Google ScholarGoogle Scholar
  5. R. Edgar, M. Domrachev, and A. E. Lash. Gene expression omnibus: {NCBI gene expression and hybridization array data repository. NAR, 30(1): 207--210, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  6. R. Guimera and L. A. Nunes Amaral. Functional cartography of complex metabolic networks. Nature, 433(7028): 895--900, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  7. R. Guimera, M. Sales-Pardo, and L. A. N. Amaral. Modularity from fluctuations in random graphs and complex networks. Phys. Rev. E, 70: 025101, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  8. L. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE TCAD, 11: 1074--1085, 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. R. Hughes et al. Functional discovery via a compendium of expression profiles. Cell, 102(1): 109--126, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. Kanehisa et al. From genomics to chemical genomics: new developments in KEGG. NAR, 34: D354--357, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  11. B. Kulis, S. Basu, I. Dhillon, and R. J. Mooney. Semi-supervised graph clustering: A kernel approach. In ICML, pages 457--464, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. V. Mardia and P. E. Jupp. Directional Statistics. John Wiley & Sons, second edition, 2000.Google ScholarGoogle Scholar
  13. M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69: 026113, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  14. E. Ravasz et al. Hierarchical organization of modularity in metabolic networks. Science, 297(5589): 1551--1555, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE PAMI, 22(8): 888--905, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Shiga, I. Takigawa and H. Mamitsuka. Annotating gene function by combining expression data with a modular gene network. To appear in ISMB, 2007.Google ScholarGoogle Scholar
  17. C. Song, S. Havlin, and H. A. Makse. Self-similarity of complex networks. Nature, 433: 392--395, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  18. A. Strehl and J. Ghosh. Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing, 15(2):208--230, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. O. Troyanskaya et al. Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520--525, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  20. K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In ICML, pages 1103--1110, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393: 440--442, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  22. S. White and P. Smyth. A spectral clustering approach to finding communities in graphs. In SDM, pages 76--84, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  23. L. F. Wu et al. Large-scale prediction of saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat. Genet., 31(3):255--265, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Zhong and J. Ghosh. A unified framework for model--based clustering. JMLR, 4:1001--1037, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Zhong and J. Ghosh. Generative model-based document clustering: A comparative study. KAIS, 8(3):374--384, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Zhou, M. C. Kao, and W. H. Wong. Transitive functional annotation by shortest-path analysis of gene expression data. PNAS, 99(20):12783--12788, 2002.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A spectral clustering approach to optimally combining numericalvectors with a modular network

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2007
        1080 pages
        ISBN:9781595936097
        DOI:10.1145/1281192

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 August 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader