A Creditable Subspace Labeling Method Based on D-S Evidence Theory

Zong, Yu; Zhang, Xian-Chao; Jiang, He; Li, Ming-Chu

doi:10.1007/978-3-540-68125-0_82

Yu Zong¹,
Xian-Chao Zhang¹,
He Jiang¹ &
…
Ming-Chu Li¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5012))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2465 Accesses
1 Citations

Abstract

Due to inherent sparse, noise and nearly zero difference characteristics of high dimensional data sets, traditional clustering methods fails to detect meaningful clusters in them. Subspace clustering attempts to find the true distribution inherent to the subsets with original attributes. However, which subspace contains the true clustering result is usually uncertain. From this point of view, subspace clustering can be regarded as an uncertain discursion problem. In this paper, we firstly develop the criterion to evaluate creditable subspaces which contain the meaningful clustering results, and then propose a creditable subspace labeling method (CSL) based on D-S evidence theory. The creditable subspaces of the original data space can be found by iteratively executing the algorithm CSL. Once the creditable subspaces are got, the true clustering results can be found using a traditional clustering algorithm on each creditable subspace. Experiments show that CSL can detect the actual creditable subspace with the original attribute. In this way, a novel approach of clustering problems using traditional clustering algorithms to deal with high dimension data sets is proposed.

Supported by the Nation Science Foundation of China under Grand No.90412007 , the Nation Science Foundation of China No.60503003, the Science Research Project of AnHui Education office No KJ2008B133, and the important Science Research Project of AnHui Education office NO KJ2007A072.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berkhin, P.: Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, California (2002)
Google Scholar
Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Applied Statistics (28), 100–108 (1979)
Article MATH Google Scholar
Ng, R.T., Han, J.: Efficient and Effective Clustering Method for Spatial Data Mining. In: Proceeding of the 20th VLDB Conference, pp. 144–155 (1994)
Google Scholar
Ng, R., Han, J.: CLARANS: A method for clustering objects for spatial data mining. IEEE Trans. on Knowl., Data Eng. 14(5), 1003–1016 (2002)
Article Google Scholar
Zhang, T., Ramakrishna, R., Livny, M.: BIRCH: A New Data Clustering Algorithm and its Applications. Journal of Data Mining and Knowledge Discovery, 141–182 (1997)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large database. In: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pp. 73–84 (1998)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., et al.: A density-based algorithm for discovering clusters in large spatial database. In: Proc.1996 Int. Conf.Knowledge Discovery and Data Mining (KDD 1996), Portland, OR, August 1996, pp. 226–231 (1996)
Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., et al.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc.1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD 1998), Seattle, WA, June 1998, pp. 94–105 (1998)
Google Scholar
Cheng, C.-H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: Proceedings of the 5th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 84–93. ACM press, New York (1999)
Chapter Google Scholar
Goil, S., Nagesh, H., Choudhary, A.: MAFIA: Efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University, 2145 Sheridan Road, Evanston IL 60208 (June 1999)
Google Scholar
Kailing, K., Kriegel, H., Kroger, P.: Density-connected Subspace Clustering for High-dimensional Data. In: Proc. 4th SIAM Int. Conf. on Data Mining, Lake Buena Vista, FL, pp. 246–257 (2004)
Google Scholar
Dempster, A.: Upper and Lower Probabilities induced by multivalued mapping. Annals of Mathematical Statistics 38(2), 325–339 (1967)
Article MathSciNet Google Scholar
Orponen, P.: Dempsster’s rule of combination is #P- complete. Artificial Intelligence 44(1-2), 245–253 (1990)
Article MathSciNet Google Scholar
Jian-Wei, Z., Da-Wei, W., Yu, C., et al.: A Network Anomaly Detector Based on the D-S Evidence Theory. Journal of Software 17(3), 463–471 (2006)
Article Google Scholar
Xiaoyun, Z., Zhihui, S., Baili, Z., et al.: An Efficient Discovering and Maintenance Algorithm of Subspace Clustering over High Dimensional Data Streams. Journal of Computer Research and Development 43(5), 834–840 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Dalian University of Technology, Dalian, 116621, China
Yu Zong, Xian-Chao Zhang, He Jiang & Ming-Chu Li

Authors

Yu Zong
View author publications
You can also search for this author in PubMed Google Scholar
Xian-Chao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
He Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Chu Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Takashi Washio Einoshin Suzuki Kai Ming Ting Akihiro Inokuchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zong, Y., Zhang, XC., Jiang, H., Li, MC. (2008). A Creditable Subspace Labeling Method Based on D-S Evidence Theory. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_82

Download citation

DOI: https://doi.org/10.1007/978-3-540-68125-0_82
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics