skip to main content
10.1145/1458082.1458209acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

REDUS: finding reducible subspaces in high dimensional data

Authors Info & Claims
Published:26 October 2008Publication History

ABSTRACT

Finding latent patterns in high dimensional data is an important research problem with numerous applications. The most well known approaches for high dimensional data analysis are feature selection and dimensionality reduction. Being widely used in many applications, these methods aim to capture global patterns and are typically performed in the full feature space. In many emerging applications, however, scientists are interested in the local latent patterns held by feature subspaces, which may be invisible via any global transformation.

In this paper, we investigate the problem of finding strong linear and nonlinear correlations hidden in feature subspaces of high dimensional data. We formalize this problem as identifying reducible subspaces in the full dimensional space. Intuitively, a reducible subspace is a feature subspace whose intrinsic dimensionality is smaller than the number of features. We present an effective algorithm, REDUS, for finding the reducible subspaces. Two key components of our algorithm are finding the overall reducible subspace, and uncovering the individual reducible subspaces from the overall reducible subspace. A broad experimental evaluation demonstrates the effectiveness of our algorithm.

References

  1. C. Aggarwal and P. Yu. Finding generalized projected clusters in high dimensional spaces. SIGMOD, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Alizadeh and et al. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403:503--11, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  3. D. Barbara and P. Chen. Using the fractal dimension to cluster datasets. KDD, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Belkin and P. Niyogi. Şlaplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Belussi and C. Faloutsos. Self-spacial join selectivity estimation using fractal concepts. ACM Transactions on Information Systems, 16(2):161--201, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245--271, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Bohm, K. Kailing, P. Kroger, and A. Zimek. Computing clusters of correlation connected objects. SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. Borg and P. Groenen. Modern multidimensional scaling. New York: Springer, 1997.Google ScholarGoogle Scholar
  9. F. Camastra and A. Vinciarelli. Estimating intrinsic dimension of data with a fractal-based approach. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(10):1404--1407, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. M. Cover and J. A. Thomas. The Elements of Information Theory. Wiley & Sons, New York, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 95:14863--68, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  12. C. Faloutsos and I. Kamel. Beyond uniformity and independence: analysis of r-trees using the concept of fractal dimension. PODS, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Fukunaga. Intrinsic dimensionality extraction. Classification, Pattern recongnition and Reduction of Dimensionality, Volume 2 of Handbook of Statistics, pages 347--360, P. R. Krishnaiah and L. N. Kanal eds., Amsterdam, North Holland, 1982.Google ScholarGoogle Scholar
  14. K. Fukunaga and D. R. Olsen. An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, 20(2):165--171, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Gionis, A. Hinneburg, S. Papadimitriou, and P. Tsaparas. Dimension induced clustering. KDD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Golub and A. Loan. Matrix computations. Johns Hopkins University Press, Baltimore, Maryland, 1996.Google ScholarGoogle Scholar
  17. V. Iyer and et. al. The transcriptional program in the response of human fibroblasts to serum. Science, 283:83--87, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  18. I. Jolliffe. Principal component analysis. New York: Springer, 1986.Google ScholarGoogle Scholar
  19. M. Kendall and J. D. Gibbons. Rank Correlation Methods. New York: Oxford University Press, 1990.Google ScholarGoogle Scholar
  20. D. C. Lay. Linear Algebra and Its Applications. Addison Wesley, 2005.Google ScholarGoogle Scholar
  21. E. Levina and P. J. Bickel. Maximum likelihood estimation of intrinsic dimension. Advances in Neural Information Processing Systems, 2005.Google ScholarGoogle Scholar
  22. H. Liu and H. Motoda. Feature selection for knowledge discovery and data mining. Boston: Kluwer Academic Publishers, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B.-U. Pagel, F. Korn, and C. Faloutsos. De ating the dimensionality curse using multiple fractal dimensions. ICDE, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Papadimitriou, H. Kitawaga, P. B. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral. ICDE, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  25. S. N. Rasband. Chaotic Dynamics of Nonlinear Systems. Wiley-Interscience, 1990.Google ScholarGoogle Scholar
  26. H. T. Reynolds. The analysis of cross-classifications. The Free Press, New York, 1977.Google ScholarGoogle Scholar
  27. S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290 (5500):2323--2326, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  28. M. Schroeder. Fractals, Chaos, Power Lawers: Minutes from an Infinite Paradise. W. H. Freeman, New York, 1991.Google ScholarGoogle Scholar
  29. J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500):2319--2323, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  30. A. K. H. Tung, X. Xin, and B. C. Ooi. Curler: Finding and visualizing nonlinear correlation. SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Yu and H. Liu. Feature selection for high-dimensional data: a fast correlation-based filter solution. ICML, 2003.Google ScholarGoogle Scholar
  32. X. Zhang, F. Pan, and W. Wang. Care: Finding local linear correlations in high dimensional data. ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Z. Zhao and H. Liu. Searching for interacting features. IJCAI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. REDUS: finding reducible subspaces in high dimensional data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
      October 2008
      1562 pages
      ISBN:9781595939913
      DOI:10.1145/1458082

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 October 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader