Skip to main content

Constraint Based Subspace Clustering for High Dimensional Uncertain Data

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9652))

Included in the following conference series:

Abstract

Both uncertain data and high-dimensional data pose huge challenges to traditional clustering algorithms. It is even more challenging for clustering high dimensional uncertain data and there are few such algorithms. In this paper, based on the classical FINDIT subspace clustering algorithm for high dimensional data, we propose a constraint based semi-supervised subspace clustering algorithm for high dimensional uncertain data, UFINDIT. We extend both the distance functions and dimension voting rules of FINDIT to deal with high dimensional uncertain data. Since the soundness criteria of FINDIT fails for uncertain data, we introduce constraints to solve the problem. We also use the constraints to improve FINDIT in eliminating parameters’ effect on the process of merging medoids. Furthermore, we propose some methods such as sampling to get an more efficient algorithm. Experimental results on synthetic and real data sets show that our proposed UFINDIT algorithm outperforms the existing subspace clustering algorithm for uncertain data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. In: ACM SIGMoD Record, vol. 28, pp. 61–72. ACM (1999)

    Google Scholar 

  2. Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the 2000 ACM SIGMOD Conference, pp. 70–81. ACM (2009)

    Google Scholar 

  3. Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009)

    Article  Google Scholar 

  4. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD Conference, pp. 94–105. ACM (1998)

    Google Scholar 

  5. Asuncion, A., Newman, D.: Uci machine learning repository (2007)

    Google Scholar 

  6. Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a mahalanobis metric from equivalence constraints. J. Mach. Learn. Res. 6(6), 937–965 (2005)

    MathSciNet  MATH  Google Scholar 

  7. Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. CRC Press, New York (2008)

    MATH  Google Scholar 

  8. Chau, M., Cheng, R., Kao, B., Ng, J.: Uncertain data mining: an example in clustering location data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 199–204. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Cheng, C.H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 84–93. ACM (1999)

    Google Scholar 

  10. Cheng, H., Hua, K.A., Vu, K.: Constrained locally weighted clustering. Proc. VLDB Endowment 1(1), 90–101 (2008)

    Article  Google Scholar 

  11. Fromont, E., Prado, A., Robardet, C.: Constraint-based subspace clustering. In: SDM, pp. 26–37. SIAM (2009)

    Google Scholar 

  12. Gullo, F., Ponti, G., Tagarelli, A.: Clustering uncertain data via K-medoids. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 229–242. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  13. Günnemann, S., Kremer, H., Seidl, T.: Subspace clustering for uncertain data. In: SDM, pp. 385–396. SIAM (2010)

    Google Scholar 

  14. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  15. Kailing, K., Kriegel, H.P., Kröger, P.: Density-connected subspace clustering for high-dimensional data. In: Proceedings of the SDM, vol. 4, pp. 246–257. SIAM (2004)

    Google Scholar 

  16. Kriegel, H.P., Pfeifle, M.: Hierarchical density-based clustering of uncertain data. In: Fifth IEEE International Conference on Data Mining, p. 4. IEEE (2005)

    Google Scholar 

  17. Kriegel, H.P., Pfeifle, M.: Density-based clustering of uncertain data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 672–677. ACM (2005)

    Google Scholar 

  18. Nagesh, H.S., Goil, S., Choudhary, A.N.: Adaptive grids for clustering massive data sets. In: SDM, pp. 1–17. SIAM (2001)

    Google Scholar 

  19. Woo, K.G., Lee, J.H., Kim, M.H., Lee, Y.J.: Findit: a fast and intelligent subspace clustering algorithm using dimension voting. Inf. Softw. Technol. 46(4), 255–271 (2004)

    Article  Google Scholar 

  20. Zhang, X., Liu, H., Zhang, X., Liu, X.: Novel density-based clustering algorithms for uncertain data. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 27–31 July 2014, Québec City, Québec, Canada, pp. 2191–2197 (2014). http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8185

  21. Zhang, X., Wu, Y., Qiu, Y.: Constraint based dimension correlation and distance divergence for clustering high-dimensional data. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 629–638. IEEE (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianchao Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, X., Gao, L., Yu, H. (2016). Constraint Based Subspace Clustering for High Dimensional Uncertain Data. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31750-2_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31749-6

  • Online ISBN: 978-3-319-31750-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics