Abstract
Both uncertain data and high-dimensional data pose huge challenges to traditional clustering algorithms. It is even more challenging for clustering high dimensional uncertain data and there are few such algorithms. In this paper, based on the classical FINDIT subspace clustering algorithm for high dimensional data, we propose a constraint based semi-supervised subspace clustering algorithm for high dimensional uncertain data, UFINDIT. We extend both the distance functions and dimension voting rules of FINDIT to deal with high dimensional uncertain data. Since the soundness criteria of FINDIT fails for uncertain data, we introduce constraints to solve the problem. We also use the constraints to improve FINDIT in eliminating parameters’ effect on the process of merging medoids. Furthermore, we propose some methods such as sampling to get an more efficient algorithm. Experimental results on synthetic and real data sets show that our proposed UFINDIT algorithm outperforms the existing subspace clustering algorithm for uncertain data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. In: ACM SIGMoD Record, vol. 28, pp. 61–72. ACM (1999)
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the 2000 ACM SIGMOD Conference, pp. 70–81. ACM (2009)
Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD Conference, pp. 94–105. ACM (1998)
Asuncion, A., Newman, D.: Uci machine learning repository (2007)
Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a mahalanobis metric from equivalence constraints. J. Mach. Learn. Res. 6(6), 937–965 (2005)
Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. CRC Press, New York (2008)
Chau, M., Cheng, R., Kao, B., Ng, J.: Uncertain data mining: an example in clustering location data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 199–204. Springer, Heidelberg (2006)
Cheng, C.H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 84–93. ACM (1999)
Cheng, H., Hua, K.A., Vu, K.: Constrained locally weighted clustering. Proc. VLDB Endowment 1(1), 90–101 (2008)
Fromont, E., Prado, A., Robardet, C.: Constraint-based subspace clustering. In: SDM, pp. 26–37. SIAM (2009)
Gullo, F., Ponti, G., Tagarelli, A.: Clustering uncertain data via K-medoids. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 229–242. Springer, Heidelberg (2008)
Günnemann, S., Kremer, H., Seidl, T.: Subspace clustering for uncertain data. In: SDM, pp. 385–396. SIAM (2010)
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Kailing, K., Kriegel, H.P., Kröger, P.: Density-connected subspace clustering for high-dimensional data. In: Proceedings of the SDM, vol. 4, pp. 246–257. SIAM (2004)
Kriegel, H.P., Pfeifle, M.: Hierarchical density-based clustering of uncertain data. In: Fifth IEEE International Conference on Data Mining, p. 4. IEEE (2005)
Kriegel, H.P., Pfeifle, M.: Density-based clustering of uncertain data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 672–677. ACM (2005)
Nagesh, H.S., Goil, S., Choudhary, A.N.: Adaptive grids for clustering massive data sets. In: SDM, pp. 1–17. SIAM (2001)
Woo, K.G., Lee, J.H., Kim, M.H., Lee, Y.J.: Findit: a fast and intelligent subspace clustering algorithm using dimension voting. Inf. Softw. Technol. 46(4), 255–271 (2004)
Zhang, X., Liu, H., Zhang, X., Liu, X.: Novel density-based clustering algorithms for uncertain data. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 27–31 July 2014, Québec City, Québec, Canada, pp. 2191–2197 (2014). http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8185
Zhang, X., Wu, Y., Qiu, Y.: Constraint based dimension correlation and distance divergence for clustering high-dimensional data. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 629–638. IEEE (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, X., Gao, L., Yu, H. (2016). Constraint Based Subspace Clustering for High Dimensional Uncertain Data. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-31750-2_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31749-6
Online ISBN: 978-3-319-31750-2
eBook Packages: Computer ScienceComputer Science (R0)