skip to main content
10.1145/1066157.1066211acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

CURLER: finding and visualizing nonlinear correlation clusters

Published:14 June 2005Publication History

ABSTRACT

While much work has been done in finding linear correlation among subsets of features in high-dimensional data, work on detecting nonlinear correlation has been left largely untouched. In this paper, we present an algorithm for finding and visualizing nonlinear correlation clusters in the subspace of high-dimensional databases.Unlike the detection of linear correlation in which clusters are of unique orientations, finding nonlinear correlation clusters of varying orientations requires merging clusters of possibly very different orientations. Combined with the fact that spatial proximity must be judged based on a subset of features that are not originally known, deciding which clusters to be merged during the clustering process becomes a challenge. To avoid this problem, we propose a novel concept called co-sharing level which captures both spatial proximity and cluster orientation when judging similarity between clusters. Based on this concept, we develop an algorithm which not only detects nonlinear correlation clusters but also provides a way to visualize them. Experiments on both synthetic and real-life datasets are done to show the effectiveness of our method.

References

  1. Hinneburg A. and Keim D. Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering. In Proc. of the 25th Int. Conf. on Very Large Data Bases, pages 506 - 517, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Hinneburg A. and Keim D. A. An efficient approach to cluster in large multimedia databases with noise. In Proc. of the Int. Conf. on Knowledge Discovery and Data Mining, 1998.]]Google ScholarGoogle Scholar
  3. Yu P. S. Aggarwal C. C. Finding generalized projected clusters in high dimensional spaces. In Proc. of ACM SIGMOD Conf. Proceedings, volume 29, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. of ACM-SIGMOD Int. Conf. on Management of Data, pages 94--105, June 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. In Proc. 1999 ACM-SIGMOD Int. Conf. on Management of Data, pages 49--60, June 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998.]]Google ScholarGoogle Scholar
  7. Christian Bohm, Karin Kailing, Peer Kroger, and Arthur Zimek. Computing clusters of correlation connected objects. In Proc. of ACM-SIGMOD Int. Conf. on Management of Data, June 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD'98), pages 9--15, Aug. 1998.]]Google ScholarGoogle Scholar
  9. Agrawal C. C., Procopiuc C., Wolf J. L., Yu P. S., and Park J. S. Fast algorithms for projected clustering. In Proc. of ACM SIGMOD Int. conf. on Management of Data, pages 61--72, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. H. Cheng, A. C. Fu, and Y. Zhang. Entropy-based subspace clustering for mining numerical data. In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), pages 226--231, Portland, Oregon, Aug. 1996.]]Google ScholarGoogle Scholar
  12. Patrik D Haeseleer, Xiling Wen, Stefanie Fuhrman, and Roland Somogyi. Mining the gene expression matrix: Inferring gene relationships from large scale gene expression data. Information Processing in Cells and Tissues, pages 203--212, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. R. Iyer, M. B. Eisen, D. T. Ross, G. Schuler, T. Moore, J.C.F. Lee, J. M. Trent, L. M. Staudt, J. Jr Hudson, M. S. Boguski, D. Lashkari, D. Shalon, D. Botstein, and P. O. Brown. The transcriptional program in the response of human fibroblasts to serum. Science, 283:83--87, 1999.]]Google ScholarGoogle ScholarCross RefCross Ref
  14. Han J. and Kamber M. Data mining concepts and techniques. Morgan Kaufmann, August 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Banfield J. D. and Raftery A. E. Model-based gaussian and non-gaussian clustering. Biometrics, 49:803--821, September, 1993.]]Google ScholarGoogle ScholarCross RefCross Ref
  16. I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, 2002.]]Google ScholarGoogle Scholar
  17. Kaufman L. and Rousseeuw P. J. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience, 1990.]]Google ScholarGoogle Scholar
  18. C. M. Procopiuc, M. Jones, P. K. Agarwal, and M. T. M. A monte carlo algorithm for fast projective clustering. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Roy. A fast improvement to the em algorithm on its own terms. JRSS(B), 51:127--138, 1989.]]Google ScholarGoogle Scholar
  20. Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2323--2326, 2000.]]Google ScholarGoogle Scholar
  21. A. K. H. Tung, J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-based clustering in large databases. In Proc. 2001 Int. Conf. on Database Theory, Jan. 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. K. H. Tung, J. Hou, and J. Han. Spatial clustering in the presence of obstacles. In Proc. 2001 Int. Conf. on Data Engineering, Heidelberg, Germany, April 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. XU X., Ester M., Kriegel H-P., and Sander J. A distributed-based clustering algorithm for mining in large spatial databases. In Proc. 1998 Int. Conf. on Data Engineering, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. CURLER: finding and visualizing nonlinear correlation clusters

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
        June 2005
        990 pages
        ISBN:1595930604
        DOI:10.1145/1066157
        • Conference Chair:
        • Fatma Ozcan

        Copyright © 2005 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 June 2005

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader