research-article

REDUS: finding reducible subspaces in high dimensional data

Authors:
Xiang Zhang

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
View Profile

,
Feng Pan

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
View Profile

,
Wei Wang

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
View Profile

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge managementOctober 2008Pages 961–970https://doi.org/10.1145/1458082.1458209

Published:26 October 2008Publication History

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

Pages 961–970

ABSTRACT

Finding latent patterns in high dimensional data is an important research problem with numerous applications. The most well known approaches for high dimensional data analysis are feature selection and dimensionality reduction. Being widely used in many applications, these methods aim to capture global patterns and are typically performed in the full feature space. In many emerging applications, however, scientists are interested in the local latent patterns held by feature subspaces, which may be invisible via any global transformation.

In this paper, we investigate the problem of finding strong linear and nonlinear correlations hidden in feature subspaces of high dimensional data. We formalize this problem as identifying reducible subspaces in the full dimensional space. Intuitively, a reducible subspace is a feature subspace whose intrinsic dimensionality is smaller than the number of features. We present an effective algorithm, REDUS, for finding the reducible subspaces. Two key components of our algorithm are finding the overall reducible subspace, and uncovering the individual reducible subspaces from the overall reducible subspace. A broad experimental evaluation demonstrates the effectiveness of our algorithm.

References

C. Aggarwal and P. Yu. Finding generalized projected clusters in high dimensional spaces. SIGMOD, 2000. Google ScholarDigital Library
A. Alizadeh and et al. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403:503--11, 2000.Google ScholarCross Ref
D. Barbara and P. Chen. Using the fractal dimension to cluster datasets. KDD, 2000. Google ScholarDigital Library
M. Belkin and P. Niyogi. Şlaplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2003. Google ScholarDigital Library
A. Belussi and C. Faloutsos. Self-spacial join selectivity estimation using fractal concepts. ACM Transactions on Information Systems, 16(2):161--201, 1998. Google ScholarDigital Library
A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245--271, 1997. Google ScholarDigital Library
C. Bohm, K. Kailing, P. Kroger, and A. Zimek. Computing clusters of correlation connected objects. SIGMOD, 2004. Google ScholarDigital Library
I. Borg and P. Groenen. Modern multidimensional scaling. New York: Springer, 1997.Google Scholar
F. Camastra and A. Vinciarelli. Estimating intrinsic dimension of data with a fractal-based approach. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(10):1404--1407, 2002. Google ScholarDigital Library
T. M. Cover and J. A. Thomas. The Elements of Information Theory. Wiley & Sons, New York, 1991. Google ScholarDigital Library
M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 95:14863--68, 1998.Google ScholarCross Ref
C. Faloutsos and I. Kamel. Beyond uniformity and independence: analysis of r-trees using the concept of fractal dimension. PODS, 1994. Google ScholarDigital Library
K. Fukunaga. Intrinsic dimensionality extraction. Classification, Pattern recongnition and Reduction of Dimensionality, Volume 2 of Handbook of Statistics, pages 347--360, P. R. Krishnaiah and L. N. Kanal eds., Amsterdam, North Holland, 1982.Google Scholar
K. Fukunaga and D. R. Olsen. An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, 20(2):165--171, 1976. Google ScholarDigital Library
A. Gionis, A. Hinneburg, S. Papadimitriou, and P. Tsaparas. Dimension induced clustering. KDD, 2005. Google ScholarDigital Library
G. Golub and A. Loan. Matrix computations. Johns Hopkins University Press, Baltimore, Maryland, 1996.Google Scholar
V. Iyer and et. al. The transcriptional program in the response of human fibroblasts to serum. Science, 283:83--87, 1999.Google ScholarCross Ref
I. Jolliffe. Principal component analysis. New York: Springer, 1986.Google Scholar
M. Kendall and J. D. Gibbons. Rank Correlation Methods. New York: Oxford University Press, 1990.Google Scholar
D. C. Lay. Linear Algebra and Its Applications. Addison Wesley, 2005.Google Scholar
E. Levina and P. J. Bickel. Maximum likelihood estimation of intrinsic dimension. Advances in Neural Information Processing Systems, 2005.Google Scholar
H. Liu and H. Motoda. Feature selection for knowledge discovery and data mining. Boston: Kluwer Academic Publishers, 1998. Google ScholarDigital Library
B.-U. Pagel, F. Korn, and C. Faloutsos. De ating the dimensionality curse using multiple fractal dimensions. ICDE, 2000.Google ScholarCross Ref
S. Papadimitriou, H. Kitawaga, P. B. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral. ICDE, 2003.Google ScholarCross Ref
S. N. Rasband. Chaotic Dynamics of Nonlinear Systems. Wiley-Interscience, 1990.Google Scholar
H. T. Reynolds. The analysis of cross-classifications. The Free Press, New York, 1977.Google Scholar
S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290 (5500):2323--2326, 2000.Google ScholarCross Ref
M. Schroeder. Fractals, Chaos, Power Lawers: Minutes from an Infinite Paradise. W. H. Freeman, New York, 1991.Google Scholar
J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500):2319--2323, 2000.Google ScholarCross Ref
A. K. H. Tung, X. Xin, and B. C. Ooi. Curler: Finding and visualizing nonlinear correlation. SIGMOD, 2005. Google ScholarDigital Library
L. Yu and H. Liu. Feature selection for high-dimensional data: a fast correlation-based filter solution. ICML, 2003.Google Scholar
X. Zhang, F. Pan, and W. Wang. Care: Finding local linear correlations in high dimensional data. ICDE, 2008. Google ScholarDigital Library
Z. Zhao and H. Liu. Searching for interacting features. IJCAI, 2007. Google ScholarDigital Library

Index Terms

REDUS: finding reducible subspaces in high dimensional data
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Visual subspace clustering based on dimension relevance

The proposed work aims at visual subspace clustering and addresses two challenges: an efficient visual subspace clustering workflow and an intuitive visual description of subspace structure. Handling the first challenge is to escape the circular ...
Read More
Relative Intrinsic Dimensionality Is Intrinsic to Learning
Artificial Neural Networks and Machine Learning – ICANN 2023
Abstract
High dimensional data can have a surprising property: pairs of data points may be easily separated from each other, or even from arbitrary subsets, with high probability using just simple linear classifiers. However, this is more of a rule of ...
Read More
Dimensionality reduction using magnitude and shape approximations
CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management

High dimensional data sets are encountered in many modern database applications. The usual approach is to construct a summary of the data set through a lossy compression technique, and use this lower dimensional synopsis to provide fast, approximate ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
October 2008
1562 pages
ISBN:9781595939913
DOI:10.1145/1458082
General Chair:
James G. Shanahan
Church and Duncan Group Inc, USA
,
Program Chairs:
Sihem Amer-Yahia
Yahoo! Research, USA
,
Ioana Manolescu
INRIA, France
,
Yi Zhang
University of California, Santa Cruz, USA
,
David A. Evans
JustSystems Evans Research, USA
,
Alek Kolcz
Microsoft Live Labs, USA
,
Key-Sun Choi
KAIST, Korea
,
Abdur Chowdury
Twitter, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
high dimensional data
reducible subspace
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 233
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

REDUS: finding reducible subspaces in high dimensional data

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Visual subspace clustering based on dimension relevance

Relative Intrinsic Dimensionality Is Intrinsic to Learning

Dimensionality reduction using magnitude and shape approximations