research-article

SubCOID: an attempt to explore cluster-outlier iterative detection approach to multi-dimensional data analysis in subspace

Author:

Yong ShiAuthors Info & Claims

ACMSE '08: Proceedings of the 46th annual ACM Southeast Conference

Pages 132 - 135

https://doi.org/10.1145/1593105.1593139

Published: 28 March 2008 Publication History

Abstract

Many data mining algorithms focus on clustering methods. There are also a lot of approaches designed for outlier detection. We observe that, in many situations, clusters and outliers are concepts whose meanings are inseparable to each other, especially for those data sets with noise. Clusters and outliers should be treated as the concepts of the same importance in data analysis. In our previous work [22] we proposed a cluster-outlier iterative detection algorithm in full data space. However, in high dimensional spaces, for a given cluster or outlier, not all dimensions may be relevant to it. In this paper we extend our work in subspace area, tending to detect the clusters and outliers in another perspective for noisy data. Each cluster is associated with its own subset of dimensions, so is each outlier. The partition, subsets of dimensions and qualities of clusters are detected and adjusted according to the intra-relationship within clusters and the inter-relationship between clusters and outliers, and vice versa. This process is performed iteratively until a certain termination condition is reached. This data processing algorithm can be applied in many fields such as pattern recognition, data clustering and signal processing.

References

[1]

C. C. Aggarwal, C. Procopiuc, J. Wolf, P. Yu, and J. Park. Fast algorithms for projected clustering. In Proceedings of the ACM SIGMOD CONFERENCE on Management of Data, pages 61--72, Philadelphia, PA, 1999.

Digital Library

[2]

C. C. Aggarwal and P. S. Yu. Outlier detection for high dimensional data. In SIGMOD Conference, 2001.

Digital Library

[3]

R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 94--105, Seattle, WA, 1998.

Digital Library

[4]

Ankerst M., Breunig M. M., Kriegel H.-P., Sander J. OPTICS: Ordering Points To Identify the Clustering Structure. Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'99), Philadelphia, PA, pages 49--60, 1999.

Digital Library

[5]

M. Breunig, H. Kriegel, R. Ng, and J. Sander. LOF: Identifying density-based local outliers. In Proceedings of the ACM SIGMOD CONFERENCE on Management of Data, pages 93--104, Dallas, Texas, May 16--18 2000.

Digital Library

[6]

Chi-Farn Chen, Jyh-Ming Lee. The Validity Measurement of Fuzzy C-Means Classifier for Remotely Sensed Images. In Proc. ACRS 2001-22nd Asian Conference on Remote Sensing, 2001.

[7]

Dantong Yu and Aidong Zhang. ClusterTree: Integration of Cluster Representation and Nearest Neighbor Search for Large Datasets with High Dimensionality. IEEE Transactions on Knowledge and Data Engineering(TKDE), 14(3), May/June 2003.

Digital Library

[8]

M. Ester, K. H.-P., J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 1996.

[9]

U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI Press, 1996.

Digital Library

[10]

S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. In Proceedings of the ACM SIGMOD conference on Management of Data, pages 73--84, Seattle, WA, 1998.

Digital Library

[11]

S. Guha, R. Rastogi, and K. Shim. Rock: A robust clustering algorithm for categorical attributes. In Proceedings of the IEEE Conference on Data Engineering, 1999.

Digital Library

[12]

A. Hinneburg and D. A. Keim. An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pages 58--65, New York, August 1998.

Digital Library

[13]

J. MacQueen. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Volume I, Statistics., 1967.

[14]

G. Karypis, E.-H. S. Han, and V. K. NEWS. Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8):68--75, 1999.

Digital Library

[15]

L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.

[16]

E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. In Proceedings of the 24th VLDB conference, pages 392--403, New York, August 1998.

Digital Library

[17]

Maria Halkidi, Michalis Vazirgiannis. A Data Set Oriented Approach for Clustering Algorithm Selection. In PKDD, 2001.

Digital Library

[18]

R. T. Ng and J. Han. Efficient and Effective Clustering Methods for Spatial Data Mining. In Proceedings of the 20th VLDB Conference, pages 144--155, Santiago, Chile, 1994.

Digital Library

[19]

S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In Proceedings of the ACM SIGMOD CONFERENCE on Management of Data, pages 427--438, Dallas, Texas, May 16--18 2000.

Digital Library

[20]

T. Seidl and H. Kriegel. Optimal multi-step k-nearest neighbor search. In Proceedings of the ACM SIGMOD conference on Management of Data, pages 154--164, Seattle, WA, 1998.

Digital Library

[21]

G. Sheikholeslami, S. Chatterjee, and A. Zhang. Wavecluster: A multi-resolution clustering approach for very large spatial databases. In Proceedings of the 24th International Conference on Very Large Data Bases, 1998.

Digital Library

[22]

Y. Shi and A. Zhang. Towards exploring interactive relationship between clusters and outliers in multi-dimensional data analysis. In International Conference on Data Engineering (ICDE), 2005.

Digital Library

[23]

W. Wang, J. Yang, and R. Muntz. STING: A Statistical Information Grid Approach to Spatial Data Mining. In Proceedings of the 23rd VLDB Conference, pages 186--195, Athens, Greece, 1997.

Digital Library

[24]

D. Yu, G. Sheikholeslami, and A. Zhang. Findout: Finding outliers in very large datasets. The Knowledge and Information Systems (KAIS), (4), October 2000.

Digital Library

[25]

T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 103--114, Montreal, Canada, 1996.

Digital Library

Cited By

Shi YZhang L(2011)COID: A cluster–outlier iterative detection approach to multi-dimensional data analysisKnowledge and Information Systems10.1007/s10115-010-0323-y28:3(709-733)Online publication date: 1-Sep-2011
https://dl.acm.org/doi/10.1007/s10115-010-0323-y
Shi YCunningham HRuth PKraft N(2010)Towards improving subspace data analysisProceedings of the 48th annual ACM Southeast Conference10.1145/1900008.1900093(1-4)Online publication date: 15-Apr-2010
https://dl.acm.org/doi/10.1145/1900008.1900093

Index Terms

SubCOID: an attempt to explore cluster-outlier iterative detection approach to multi-dimensional data analysis in subspace
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Hybrid Bisect K-Means Clustering Algorithm
BCGIN '11: Proceedings of the 2011 International Conference on Business Computing and Global Informatization

In this paper, we present a hybrid clustering algorithm that combines divisive and agglomerative hierarchical clustering algorithm. Our method uses bisect K-means for divisive clustering algorithm and Unweighted Pair Group Method with Arithmetic Mean (...
Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number ...
Initialization of K-modes clustering using outlier detection techniques

We considered the initialization of K-modes clustering from the view of outlier detection.We proposed an initialization algorithm for K-modes clustering via the distance-based outlier detection technique.We presented a partition entropy-based outlier ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACMSE '08: Proceedings of the 46th annual ACM Southeast Conference

March 2008

548 pages

ISBN:9781605581057

DOI:10.1145/1593105

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ACM SE08

ACM SE08: ACM Southeast Regional Conference

March 28 - 29, 2008

Alabama, Auburn

Acceptance Rates

Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
136
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shi YZhang L(2011)COID: A cluster–outlier iterative detection approach to multi-dimensional data analysisKnowledge and Information Systems10.1007/s10115-010-0323-y28:3(709-733)Online publication date: 1-Sep-2011
https://dl.acm.org/doi/10.1007/s10115-010-0323-y
Shi YCunningham HRuth PKraft N(2010)Towards improving subspace data analysisProceedings of the 48th annual ACM Southeast Conference10.1145/1900008.1900093(1-4)Online publication date: 15-Apr-2010
https://dl.acm.org/doi/10.1145/1900008.1900093

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten