ABSTRACT
We propose a mathematical formulation for the notion of optimal projective cluster, starting from natural requirements on the density of points in subspaces. This allows us to develop a Monte Carlo algorithm for iteratively computing projective clusters. We prove that the computed clusters are good with high probability. We implemented a modified version of the algorithm, using heuristics to speed up computation. Our extensive experiments show that our method is significantly more accurate than previous approaches. In particular, we use our techniques to build a classifier for detecting rotated human faces in cluttered images.
- C. C. Aggarwal, C. M. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park. Fast algorithms for projected clustering. In Proc. of ACM SIGMOD Intl. Conf. Management of Data, pages 61-72, 1999. Google ScholarDigital Library
- C. C. Aggarwal, and P. S. Yu. Finding generalized projected clusters in high dimensional spaces. In Proc. of ACM SIGMOD Intl. Conf. Management of Data, pages 70-81, 2000. Google ScholarDigital Library
- R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. ACM SIGMOD Conf. on Management of Data, pages 94-105, 1998. Google ScholarDigital Library
- K. Chakrabarti and S. Mehrotra. Local dimensionality reduction: A new approach to indexing high dimensional spaces. In Proc. 26th Intl. Conf. Very Large Data Bases, pages 89-100, 2000. Google ScholarDigital Library
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. 2nd Intl. Conf. Knowledge Discovery and Data Mining, pages 226-231, 1996.Google Scholar
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. Density-connected setsand their application for trend detection in spatial databases. In Proc. 3rd Intl. Conf. Knowledge Discovery and Data Mining, 1997.Google Scholar
- S. Guha, R. Rastogi, and K. Shim. CURE: An efficient clustering algorithm for large databases. In Proc. ACM SIGMOD Intl. Conf. Management of Data, pages 73-84, 1998. Google ScholarDigital Library
- A. Hinneburg, C. C. Aggarwal, and D. A. Keim. What is the nearest neighbor in high dimensional spaces? In Proc. 26th Intl. Conf. Very Large Data Bases, pages 506-515, 2000. Google ScholarDigital Library
- A. Hinneburg and D. A. Keim. Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering. In Proc. 25th Intl. Conf. Very Large Data Bases, pages 506-517, 1999. Google ScholarDigital Library
- A. Hinneburg and D. A. Keim. An efficient approach to clustering in large multimedia databases with noise In Proc. 4th Intl. Conf. Knowledge Discovery and Data Mining, 1998.Google Scholar
- R. T. Ng and J. Hart. Efficient and effective clustering methods for spatial data mining. In Proc. 20th Intl. Conf. Very Large Data Bases, pages 144-155, 1994. Google ScholarDigital Library
- H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 20:22-38, 1998. Google ScholarDigital Library
- H. Schneiderman and T. Kanade. A statistical method for 3d object detection applied to faces and cars. In Proc. IEEE Intl. Conf. Comput. Vision, 2000.Google ScholarCross Ref
- P. Viola and M. Jones. Robust real-time object detection. Technical Report 2001/01, Compaq Cambridge Research Lab, 2001.Google Scholar
- T. Zhang, R. Ramakrishnan, and M. Livny. Birch: an efficient data clustering method for very large databases. In Proc. ACM-SIGMOD Intl. Conf. Management of Data, pages 103-114, 1996. Google ScholarDigital Library
Index Terms
- A Monte Carlo algorithm for fast projective clustering
Recommendations
Simulated Annealing Using a Reversible Jump Markov Chain Monte Carlo Algorithm for Fuzzy Clustering
In this paper, an approach for automatically clustering a data set into a number of fuzzy partitions with a simulated annealing using a Reversible Jump Markov Chain Monte Carlo algorithm is proposed. This is in contrast to the widely used fuzzy ...
Hybrid Bisect K-Means Clustering Algorithm
BCGIN '11: Proceedings of the 2011 International Conference on Business Computing and Global InformatizationIn this paper, we present a hybrid clustering algorithm that combines divisive and agglomerative hierarchical clustering algorithm. Our method uses bisect K-means for divisive clustering algorithm and Unweighted Pair Group Method with Arithmetic Mean (...
Comments