Skip to main content

Improved Spectral Clustering Algorithm Based on Similarity Measure

  • Conference paper
Advanced Data Mining and Applications (ADMA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8933))

Included in the following conference series:

Abstract

Aimed at the Gaussian kernel parameter σ sensitive issue of the traditional spectral clustering algorithm, this paper proposed to utilize the similarity measure based on data density during creating the similarity matrix, inspired by density sensitive similarity measure. Making it increase the distance of the pairs of data in the high density areas, which are located in different spaces. And it can reduce the similarity degree among the pairs of data in the same density region, so as to find the spatial distribution characteristics complex data. According to this point, we designed two similarity measure methods, and both of them didn’t introduce Gaussian kernel function parameter σ. The main difference between the two methods is that the first method introduces a shortest path, while the second method doesn’t. The second method proved to have better comprehensive performance of similarity measure, experimental verification showed that it improved stability of the entire algorithm. In addition to matching spectral clustering algorithm, the final stage of the algorithm is to use the k-means (or other traditional clustering algorithms) for the selected feature vector to cluster, however the k-means algorithm is sensitive to the initial cluster centers. Therefore, we also designed a simple and effective method to optimize the initial cluster centers leads to improve the k-means algorithm, and applied the improved method to the proposed spectral clustering algorithm. Experimental results on UCI datasets show that the improved k-means clustering algorithm can further make cluster more stable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ding, C., He, X.: k-Nearest-Neighbor consistency in data clustering: Incorporating local information into global optimization. In: ACM Symposium on Applied Computing, pp. 584–589 (2004)

    Google Scholar 

  2. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data vis the EM algorithm. Journal of Royal Statistical Society Series B 39(1), 1–38 (1997)

    MathSciNet  Google Scholar 

  3. Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. ACM SIGMOD Record 27(2), 73–84 (1998)

    Article  Google Scholar 

  4. Gelbard, R., Goldman, O., Spiegler, I.: Investigating diversity of clustering methods: An empirical comparison. Data & Knowledge Engineering, 155–156 (2007)

    Google Scholar 

  5. Huang, Z.: Extensions to the k-means algorithm for clustering large datasets with categorical values. Data Mining and Knowledge Discovery 2, 283–304 (1998)

    Article  Google Scholar 

  6. Jain, A.: Data clustering: 50 years beyond k-means. In: ICPR, pp. 651–666 (2010)

    Google Scholar 

  7. Michael, K., Joyce, C.: Clustering categorical data sets using tabu search techniques. Pattern Recognition 35, 2783–2790 (2002)

    Article  MATH  Google Scholar 

  8. Queen, J.M.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkley Symposium Math. Stat. Prob., vol. 1, pp. 281–297 (1967)

    Google Scholar 

  9. Qin, Y., Zhang, S., Zhu, X., Zhang, J., Zhang, C.: Semi-parametric optimization for missing data imputation. Appl. Intell. 27(1), 79–88 (2007)

    Article  MATH  Google Scholar 

  10. Sun, Y., Zhu, Q., Chen, Z.: An iterative initial-points refinement algorithm for categorical data clustering. Pattern Recognition Letters 23, 875–884 (2002)

    Article  MATH  Google Scholar 

  11. Strehl, A., Ghosh, J.: Cluster ensembles-a knowledge reuse framework for combining partitioning‘s. Journal of Machine Learning Research 3, 583–617 (2002)

    MathSciNet  Google Scholar 

  12. Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: ICML, pp. 577–584 (2001)

    Google Scholar 

  13. Wang, L., Bo, L., Jiao, L.: Density-Sensitive Semi-Supervised Spectral Clustering. Journal of Software 18(10), 2412–2422 (2007)

    Article  Google Scholar 

  14. Wang, L., Bo, L., Jiao, L.: Density-Sensitive Spectral Clustering. Acta Electronica Sinica 35(8), 1577–1581 (2007)

    Google Scholar 

  15. Xiang, T., Gong, S.: Spectral clustering with eigenvector selection. Pattern Recognition 41(3), 1012–1029 (2008)

    Article  MATH  Google Scholar 

  16. Wu, X., Zhang, S.: Synthesizing High-Frequency Rules from Different Data Sources. IEEE Trans. Knowl. Data Eng. 15(2), 353–367 (2003)

    Article  Google Scholar 

  17. Wu, X., Zhang, C., Zhang, S.: Efficient mining of both positive and negative association rules. ACM Trans. Inf. Syst. 22(3), 381–405 (2004)

    Article  Google Scholar 

  18. Wu, X., Zhang, C., Zhang, S.: Database classification for multi-database mining. Inf. Syst. 30(1), 71–88 (2005)

    Article  MATH  Google Scholar 

  19. Zhang, S., Zhang, J., Zhu, X., Qin, Y., Zhang, C.: Missing Value Imputation Based on Data Clusteri ng. Transactions on Computational Science 1, 128–138 (2008)

    Google Scholar 

  20. Zhang, S., Chen, F., Wu, X., Zhang, C., Wang, R.: Mining bridging rules between conceptual clusters. Applied Intelligence 36(1), 108–118 (2012)

    Article  Google Scholar 

  21. Zhang, J., Zhu, X., Li, X., Zhang, S.: Mining item popularity for recommender systems. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) ADMA 2013, Part II. LNCS (LNAI), vol. 8347, pp. 372–383. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  22. Zhang, S., Zhang, C., Yan, X.: Post-mining: maintenance of association rules by weighting. Inf. Syst. 28(7), 691–707 (2003)

    Article  Google Scholar 

  23. Zhang, S., Qin, Z., Ling, C., Sheng, S.: “Missing Is Useful”: Missing Values in Cost-Sensitive Decision Trees. IEEE Trans. Knowl. Data Eng. 17(12), 1689–1693 (2005)

    Article  Google Scholar 

  24. Zhao, Y., Zhang, S.: Generalized Dimension-Reduction Framework for Recent-Biased Time Series Analysis. IEEE Trans. Knowl. Data Eng. 18(2), 231–244 (2006)

    Article  Google Scholar 

  25. Zhu, X., Zhang, S., Jin, Z., Zhang, Z., Xu, Z.: Missing Value Estimation for Mixed-Attribute Data Sets. IEEE Trans. Knowl. Data Eng. 23(1), 110–121 (2011)

    Article  Google Scholar 

  26. Zhu, X., Zhang, L., Huang, Z.: A Sparse Embedding and Least Variance Encoding Approach to Hashing. IEEE Transactions on Image Processing 23(9), 3737–3750 (2014)

    Article  MathSciNet  Google Scholar 

  27. Zhu, X., Huang, Z., Shen, H., Zhao, X.: Linear cross-modal hashing for efficient multimedia search. In: ACM Multimedia, pp. 143–152 (2013)

    Google Scholar 

  28. Zhu, X., Suk, H., Shen, D.: A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis. NeuroImage 100, 91–105 (2014)

    Article  Google Scholar 

  29. Zhu, X., Suk, H., Shen, D.: Matrix-Similarity Based Loss Function and Feature Selection for Alzheimer’s Disease Diagnosis. In: CVPR, pp. 3089–3096 (2014)

    Google Scholar 

  30. Zhu, X., Huang, Z., Yang, Y., Shen, H., Xu, C., Luo, J.: Self-taught dimensionality reduction on the high-dimensional small-sized data. Pattern Recognition 46(1), 215–229 (2013)

    Article  MATH  Google Scholar 

  31. Zhu, X., Huang, Z., Cui, J., Shen, H.: Video-to-Shot Tag Propagation by Graph Sparse Group Lasso. IEEE Transactions on Multimedia 15(3), 633–646 (2013)

    Article  Google Scholar 

  32. Zhu, X., Huang, Z., Cheng, H., Cui, J., Shen, H.: Sparse hashing for fast multimedia search. ACM Trans. Inf. Syst. 31(2), 9 (2013)

    Article  Google Scholar 

  33. Zhu, X., Huang, Z., Shen, H., Cheng, J., Xu, C.: Dimensionality reduction by Mixed Kernel Canonical Correlation Analysis. Pattern Recognition 45(8), 3003–3016 (2012)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Yan, J., Cheng, D., Zong, M., Deng, Z. (2014). Improved Spectral Clustering Algorithm Based on Similarity Measure. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14717-8_50

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14716-1

  • Online ISBN: 978-3-319-14717-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics