Article

CURLER: finding and visualizing nonlinear correlation clusters

Authors:
Anthony K. H. Tung

National University of Singapore

National University of Singapore
View Profile

,
Xin Xu

National University of Singapore

National University of Singapore
View Profile

,
Beng Chin Ooi

National University of Singapore

National University of Singapore
View Profile

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of dataJune 2005Pages 467–478https://doi.org/10.1145/1066157.1066211

Published:14 June 2005Publication History

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

Pages 467–478

ABSTRACT

While much work has been done in finding linear correlation among subsets of features in high-dimensional data, work on detecting nonlinear correlation has been left largely untouched. In this paper, we present an algorithm for finding and visualizing nonlinear correlation clusters in the subspace of high-dimensional databases.Unlike the detection of linear correlation in which clusters are of unique orientations, finding nonlinear correlation clusters of varying orientations requires merging clusters of possibly very different orientations. Combined with the fact that spatial proximity must be judged based on a subset of features that are not originally known, deciding which clusters to be merged during the clustering process becomes a challenge. To avoid this problem, we propose a novel concept called co-sharing level which captures both spatial proximity and cluster orientation when judging similarity between clusters. Based on this concept, we develop an algorithm which not only detects nonlinear correlation clusters but also provides a way to visualize them. Experiments on both synthetic and real-life datasets are done to show the effectiveness of our method.

References

Hinneburg A. and Keim D. Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering. In Proc. of the 25th Int. Conf. on Very Large Data Bases, pages 506 - 517, 1999.]] Google ScholarDigital Library
Hinneburg A. and Keim D. A. An efficient approach to cluster in large multimedia databases with noise. In Proc. of the Int. Conf. on Knowledge Discovery and Data Mining, 1998.]]Google Scholar
Yu P. S. Aggarwal C. C. Finding generalized projected clusters in high dimensional spaces. In Proc. of ACM SIGMOD Conf. Proceedings, volume 29, 2000.]] Google ScholarDigital Library
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. of ACM-SIGMOD Int. Conf. on Management of Data, pages 94--105, June 1998.]] Google ScholarDigital Library
M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. In Proc. 1999 ACM-SIGMOD Int. Conf. on Management of Data, pages 49--60, June 1999.]] Google ScholarDigital Library
C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998.]]Google Scholar
Christian Bohm, Karin Kailing, Peer Kroger, and Arthur Zimek. Computing clusters of correlation connected objects. In Proc. of ACM-SIGMOD Int. Conf. on Management of Data, June 2004.]] Google ScholarDigital Library
P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD'98), pages 9--15, Aug. 1998.]]Google Scholar
Agrawal C. C., Procopiuc C., Wolf J. L., Yu P. S., and Park J. S. Fast algorithms for projected clustering. In Proc. of ACM SIGMOD Int. conf. on Management of Data, pages 61--72, 1999.]] Google ScholarDigital Library
C. H. Cheng, A. C. Fu, and Y. Zhang. Entropy-based subspace clustering for mining numerical data. In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, 1996.]] Google ScholarDigital Library
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), pages 226--231, Portland, Oregon, Aug. 1996.]]Google Scholar
Patrik D Haeseleer, Xiling Wen, Stefanie Fuhrman, and Roland Somogyi. Mining the gene expression matrix: Inferring gene relationships from large scale gene expression data. Information Processing in Cells and Tissues, pages 203--212, 1998.]] Google ScholarDigital Library
V. R. Iyer, M. B. Eisen, D. T. Ross, G. Schuler, T. Moore, J.C.F. Lee, J. M. Trent, L. M. Staudt, J. Jr Hudson, M. S. Boguski, D. Lashkari, D. Shalon, D. Botstein, and P. O. Brown. The transcriptional program in the response of human fibroblasts to serum. Science, 283:83--87, 1999.]]Google ScholarCross Ref
Han J. and Kamber M. Data mining concepts and techniques. Morgan Kaufmann, August 2001.]] Google ScholarDigital Library
Banfield J. D. and Raftery A. E. Model-based gaussian and non-gaussian clustering. Biometrics, 49:803--821, September, 1993.]]Google ScholarCross Ref
I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, 2002.]]Google Scholar
Kaufman L. and Rousseeuw P. J. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience, 1990.]]Google Scholar
C. M. Procopiuc, M. Jones, P. K. Agarwal, and M. T. M. A monte carlo algorithm for fast projective clustering. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2002.]] Google ScholarDigital Library
J. Roy. A fast improvement to the em algorithm on its own terms. JRSS(B), 51:127--138, 1989.]]Google Scholar
Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2323--2326, 2000.]]Google Scholar
A. K. H. Tung, J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-based clustering in large databases. In Proc. 2001 Int. Conf. on Database Theory, Jan. 2001.]] Google ScholarDigital Library
A. K. H. Tung, J. Hou, and J. Han. Spatial clustering in the presence of obstacles. In Proc. 2001 Int. Conf. on Data Engineering, Heidelberg, Germany, April 2001.]] Google ScholarDigital Library
XU X., Ester M., Kriegel H-P., and Sander J. A distributed-based clustering algorithm for mining in large spatial databases. In Proc. 1998 Int. Conf. on Data Engineering, 1998.]] Google ScholarDigital Library

CURLER: finding and visualizing nonlinear correlation clusters
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Hybrid Bisect K-Means Clustering Algorithm
BCGIN '11: Proceedings of the 2011 International Conference on Business Computing and Global Informatization

In this paper, we present a hybrid clustering algorithm that combines divisive and agglomerative hierarchical clustering algorithm. Our method uses bisect K-means for divisive clustering algorithm and Unweighted Pair Group Method with Arithmetic Mean (...
Read More
Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number ...
Read More
On cluster tree for nested and multi-density data clustering

Clustering is one of the important data mining tasks. Nested clusters or clusters of multi-density are very prevalent in data sets. In this paper, we develop a hierarchical clustering approach-a cluster tree to determine such cluster structure and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
June 2005
990 pages
ISBN:1595930604
DOI:10.1145/1066157
Conference Chair:
Fatma Ozcan
IBM Almaden Research Center
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 50
  Total Citations
  View Citations
- 943
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CURLER: finding and visualizing nonlinear correlation clusters

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

Hybrid Bisect K-Means Clustering Algorithm

Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology

On cluster tree for nested and multi-density data clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

CURLER: finding and visualizing nonlinear correlation clusters

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

Hybrid Bisect K-Means Clustering Algorithm

Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology

On cluster tree for nested and multi-density data clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media