Article

A spectral clustering approach to optimally combining numericalvectors with a modular network

Authors:
Motoki Shiga

Kyoto University

Kyoto University
View Profile

,
Ichigaku Takigawa

Kyoto University

Kyoto University
View Profile

,
Hiroshi Mamitsuka

Kyoto University

Kyoto University
View Profile

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2007Pages 647–656https://doi.org/10.1145/1281192.1281262

Published:12 August 2007Publication History

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 647–656

ABSTRACT

We address the issue of clustering numerical vectors with a network. The problem setting is basically equivalent to constrained clustering by Wagstaff and Cardie and semi-supervised clustering by Basu et al., but our focus is more on the optimal combination of two heterogeneous data sources. An application of this setting is web pages which can be numerically vectorized by their contents, e.g. term frequencies, and which are hyperlinked to each other, showing a network. Another typical application is genes whose behavior can be numerically measured and a gene network can be given from another data source.We first define a new graph clustering measure which we call normalized network modularity, by balancing the cluster size of the original modularity. We then propose a new clustering method which integrates the cost of clustering numerical vectors with the cost of maximizing the normalized network modularity into a spectral relaxation problem. Our learning algorithm is based on spectral clustering which makes our issue an eigenvalue problem and uses k-means for final cluster assignments. A significant advantage of our method is that we can optimize the weight parameter for balancing the two costs from the given data by choosing the minimum total cost. We evaluated the performance of our proposed method using a variety of datasets including synthetic data as well as real-world data from molecular biology. Experimental results showed that our method is effective enough to have good results for clustering by numerical vectors and a network.

References

A.-L. Barabási and A. Reka. Emergence of scaling in random networks. Science, 286: 509--512, 1999.Google ScholarCross Ref
S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In KDD, pages 59--68, August 2004. Google ScholarDigital Library
I. S. Dhillon, Y. Guan, and B. Kulis. Kernel k-means, spectral clustering and normalized cuts. In KDD, pages 551--556, 2004. Google ScholarDigital Library
I. S. Dhillon and S. Sra. Modeling data using directional distributions. Technical Report TR--06--03, University of Texas, Dept. of Computer Sciences, 2003.Google Scholar
R. Edgar, M. Domrachev, and A. E. Lash. Gene expression omnibus: {NCBI gene expression and hybridization array data repository. NAR, 30(1): 207--210, 2002.Google ScholarCross Ref
R. Guimera and L. A. Nunes Amaral. Functional cartography of complex metabolic networks. Nature, 433(7028): 895--900, 2005.Google ScholarCross Ref
R. Guimera, M. Sales-Pardo, and L. A. N. Amaral. Modularity from fluctuations in random graphs and complex networks. Phys. Rev. E, 70: 025101, 2004.Google ScholarCross Ref
L. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE TCAD, 11: 1074--1085, 1992.Google ScholarDigital Library
T. R. Hughes et al. Functional discovery via a compendium of expression profiles. Cell, 102(1): 109--126, 2000.Google ScholarCross Ref
M. Kanehisa et al. From genomics to chemical genomics: new developments in KEGG. NAR, 34: D354--357, 2006.Google ScholarCross Ref
B. Kulis, S. Basu, I. Dhillon, and R. J. Mooney. Semi-supervised graph clustering: A kernel approach. In ICML, pages 457--464, 2005. Google ScholarDigital Library
K. V. Mardia and P. E. Jupp. Directional Statistics. John Wiley & Sons, second edition, 2000.Google Scholar
M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69: 026113, 2004.Google ScholarCross Ref
E. Ravasz et al. Hierarchical organization of modularity in metabolic networks. Science, 297(5589): 1551--1555, 2002.Google ScholarCross Ref
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE PAMI, 22(8): 888--905, 2000. Google ScholarDigital Library
M. Shiga, I. Takigawa and H. Mamitsuka. Annotating gene function by combining expression data with a modular gene network. To appear in ISMB, 2007.Google Scholar
C. Song, S. Havlin, and H. A. Makse. Self-similarity of complex networks. Nature, 433: 392--395, 2005.Google ScholarCross Ref
A. Strehl and J. Ghosh. Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing, 15(2):208--230, 2003. Google ScholarDigital Library
O. Troyanskaya et al. Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520--525, 2001.Google ScholarCross Ref
K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In ICML, pages 1103--1110, 2000. Google ScholarDigital Library
D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393: 440--442, 1998.Google ScholarCross Ref
S. White and P. Smyth. A spectral clustering approach to finding communities in graphs. In SDM, pages 76--84, 2005.Google ScholarCross Ref
L. F. Wu et al. Large-scale prediction of saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat. Genet., 31(3):255--265, 2002.Google ScholarCross Ref
S. Zhong and J. Ghosh. A unified framework for model--based clustering. JMLR, 4:1001--1037, 2003. Google ScholarDigital Library
S. Zhong and J. Ghosh. Generative model-based document clustering: A comparative study. KAIS, 8(3):374--384, 2005. Google ScholarDigital Library
X. Zhou, M. C. Kao, and W. H. Wong. Transitive functional annotation by shortest-path analysis of gene expression data. PNAS, 99(20):12783--12788, 2002.Google ScholarCross Ref

Index Terms

A spectral clustering approach to optimally combining numericalvectors with a modular network
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Spectral Clustering Algorithm for Navie Users
ICARCSET '15: Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering & Technology (ICARCSET 2015)

Spectral Clustering is a graph theoretic technique to find groupings within the data. Mostly all the users will choose K-means clustering algorithm to finding the groups as it is easy to implement. To apply K-means algorithm user has to specify the ...
Read More
Local k-proximal plane clustering

k-Plane clustering (kPC) and k-proximal plane clustering (kPPC) cluster data points to the center plane, instead of clustering data points to cluster center in k-means. However, the cluster center plane constructed by kPC and kPPC is infinitely ...
Read More
Study on multi-center fuzzy C-means algorithm based on transitive closure and spectral clustering

Fuzzy C-means (FCM) clustering has been widely used successfully in many real-world applications. However, the FCM algorithm is sensitive to the initial prototypes, and it cannot handle non-traditional curved clusters. In this paper, a multi-center ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2007
1080 pages
ISBN:9781595936097
DOI:10.1145/1281192
General Chair:
Pavel Berkhin
Yahoo!, USA
,
Program Chairs:
Rich Caruana
Cornell University, USA
,
Xindong Wu
University of Vermont, USA
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
eigenvalue problem
heterogeneous data sources
k-means
network modularity
spectral clustering
Qualifiers
- Article
Conference

Acceptance Rates
KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 49
  Total Citations
  View Citations
- 931
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A spectral clustering approach to optimally combining numericalvectors with a modular network

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Spectral Clustering Algorithm for Navie Users

Local k-proximal plane clustering

Study on multi-center fuzzy C-means algorithm based on transitive closure and spectral clustering