Article

A Monte Carlo algorithm for fast projective clustering

Authors:
Cecilia M. Procopiuc

AT&T Research Laboratory, Florham Park, NJ

AT&T Research Laboratory, Florham Park, NJ
View Profile

,
Michael Jones

Mitsubishi Electric Research Laboratory, Cambridge, MA

Mitsubishi Electric Research Laboratory, Cambridge, MA
View Profile

,
Pankaj K. Agarwal

Duke University, Durham, NC

Duke University, Durham, NC
View Profile

,
T. M. Murali

Boston University, Boston, MA

Boston University, Boston, MA
View Profile

SIGMOD '02: Proceedings of the 2002 ACM SIGMOD international conference on Management of dataJune 2002Pages 418–427https://doi.org/10.1145/564691.564739

Published:03 June 2002Publication History

SIGMOD '02: Proceedings of the 2002 ACM SIGMOD international conference on Management of data

Pages 418–427

ABSTRACT

We propose a mathematical formulation for the notion of optimal projective cluster, starting from natural requirements on the density of points in subspaces. This allows us to develop a Monte Carlo algorithm for iteratively computing projective clusters. We prove that the computed clusters are good with high probability. We implemented a modified version of the algorithm, using heuristics to speed up computation. Our extensive experiments show that our method is significantly more accurate than previous approaches. In particular, we use our techniques to build a classifier for detecting rotated human faces in cluttered images.

References

C. C. Aggarwal, C. M. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park. Fast algorithms for projected clustering. In Proc. of ACM SIGMOD Intl. Conf. Management of Data, pages 61-72, 1999. Google ScholarDigital Library
C. C. Aggarwal, and P. S. Yu. Finding generalized projected clusters in high dimensional spaces. In Proc. of ACM SIGMOD Intl. Conf. Management of Data, pages 70-81, 2000. Google ScholarDigital Library
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. ACM SIGMOD Conf. on Management of Data, pages 94-105, 1998. Google ScholarDigital Library
K. Chakrabarti and S. Mehrotra. Local dimensionality reduction: A new approach to indexing high dimensional spaces. In Proc. 26th Intl. Conf. Very Large Data Bases, pages 89-100, 2000. Google ScholarDigital Library
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. 2nd Intl. Conf. Knowledge Discovery and Data Mining, pages 226-231, 1996.Google Scholar
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. Density-connected setsand their application for trend detection in spatial databases. In Proc. 3rd Intl. Conf. Knowledge Discovery and Data Mining, 1997.Google Scholar
S. Guha, R. Rastogi, and K. Shim. CURE: An efficient clustering algorithm for large databases. In Proc. ACM SIGMOD Intl. Conf. Management of Data, pages 73-84, 1998. Google ScholarDigital Library
A. Hinneburg, C. C. Aggarwal, and D. A. Keim. What is the nearest neighbor in high dimensional spaces? In Proc. 26th Intl. Conf. Very Large Data Bases, pages 506-515, 2000. Google ScholarDigital Library
A. Hinneburg and D. A. Keim. Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering. In Proc. 25th Intl. Conf. Very Large Data Bases, pages 506-517, 1999. Google ScholarDigital Library
A. Hinneburg and D. A. Keim. An efficient approach to clustering in large multimedia databases with noise In Proc. 4th Intl. Conf. Knowledge Discovery and Data Mining, 1998.Google Scholar
R. T. Ng and J. Hart. Efficient and effective clustering methods for spatial data mining. In Proc. 20th Intl. Conf. Very Large Data Bases, pages 144-155, 1994. Google ScholarDigital Library
H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 20:22-38, 1998. Google ScholarDigital Library
H. Schneiderman and T. Kanade. A statistical method for 3d object detection applied to faces and cars. In Proc. IEEE Intl. Conf. Comput. Vision, 2000.Google ScholarCross Ref
P. Viola and M. Jones. Robust real-time object detection. Technical Report 2001/01, Compaq Cambridge Research Lab, 2001.Google Scholar
T. Zhang, R. Ramakrishnan, and M. Livny. Birch: an efficient data clustering method for very large databases. In Proc. ACM-SIGMOD Intl. Conf. Management of Data, pages 103-114, 1996. Google ScholarDigital Library

Index Terms

A Monte Carlo algorithm for fast projective clustering
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
  2. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic algorithms
    2. Probabilistic reasoning algorithms
      1. Markov-chain Monte Carlo methods
      2. Sequential Monte Carlo methods

Recommendations

Simulated Annealing Using a Reversible Jump Markov Chain Monte Carlo Algorithm for Fuzzy Clustering

In this paper, an approach for automatically clustering a data set into a number of fuzzy partitions with a simulated annealing using a Reversible Jump Markov Chain Monte Carlo algorithm is proposed. This is in contrast to the widely used fuzzy ...
Read More
Monte Carlo Method
Read More
Hybrid Bisect K-Means Clustering Algorithm
BCGIN '11: Proceedings of the 2011 International Conference on Business Computing and Global Informatization

In this paper, we present a hybrid clustering algorithm that combines divisive and agglomerative hierarchical clustering algorithm. Our method uses bisect K-means for divisive clustering algorithm and Unweighted Pair Group Method with Arithmetic Mean (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '02: Proceedings of the 2002 ACM SIGMOD international conference on Management of data
June 2002
654 pages
ISBN:1581134975
DOI:10.1145/564691
Conference Chair:
Bongki Moon
University of Wisconsin - Madison
,
General Chair:
David DeWitt
University of Wisconsin - Madison
,
Program Chair:
Michael Franklin
University of California, Berkeley
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 June 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
SIGMOD '02 Paper Acceptance Rate42of240submissions,18%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 169
  Total Citations
  View Citations
- 1,666
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Monte Carlo algorithm for fast projective clustering

SIGMOD '02: Proceedings of the 2002 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Simulated Annealing Using a Reversible Jump Markov Chain Monte Carlo Algorithm for Fuzzy Clustering

Monte Carlo Method

Hybrid Bisect K-Means Clustering Algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Monte Carlo algorithm for fast projective clustering

SIGMOD '02: Proceedings of the 2002 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Simulated Annealing Using a Reversible Jump Markov Chain Monte Carlo Algorithm for Fuzzy Clustering

Monte Carlo Method

Hybrid Bisect K-Means Clustering Algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media