Abstract
Clustering analysis is an important method in data mining. In order to recognize clusters with arbitrary shapes as well as clusters with different density, we propose a new clustering approach: minimum spanning tree clustering based on density filtering. It masks the low-density points in the density filtering step, which reduces the interference of noise and makes the gap between clusters clearer. It uses relative values of adjacent distances to find mutations of density and changes between clusters to divide data sets. It is tested on multiple synthetic data sets and real-world data sets, the results of which show that the algorithm is able to detect clusters with arbitrary shape and it is insensitive to the imbalance of density between clusters. It has achieved great results on multiple data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wang, X., Yang, L.T., Xie, X., Jin, J., Deen, M.J.: A cloud-edge computing framework for cyber-physical-social services. IEEE Commun. Mag. 55(11), 80–85 (2017)
Wang, X., Yang, L.T., Chen, X., Deen, M.J., Jin, J.: Improved multi-order distributed HOSVD with its incremental computing for smart city services. IEEE Trans. Sustain. Comput. (2018). https://doi.org/10.1109/TSUSC.2018.2881439:1-1
Wang, X., Yang, L.T., Kuang, L., Liu, X., Zhang, Q., Deen, M.J.: A tensor-based big data-driven routing recommendation approach for heterogeneous networks. IEEE Netw. Mag. 33(1), 64–69 (2019)
Wang, X., Yang, L.T., Li, H., Lin, M., Han, J., Apduhan, B.O.: NQA: a nested anti-collision algorithm for RFID systems. ACM Trans. Embed. Comput. Syst. 18(4), 32 (2019)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, vol. 1, pp. 281–297 (1967)
Ester, M., Kriegel, H.P., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of International Conference on Knowledge Discovery and Data Mining, pp 226–231 (1996)
Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Du, P., Cheng, X.R.: Comparative density peaks clustering based on K-nearest neighbors. Comput. Eng. Algorithms 55(10), 161–168 (2019)
Yang, Z., Wang, H.J.: Improved density peak clustering algorithm based on weighted K-nearest neighbor. Appl. Res. Comput. 37(3), 1–7 (2019)
Gao, J., Zhao, L., Chen, Z., Li, P., Xu, H., Hu, Y.: ICFS: an improved fast search and find of density peaks clustering algorithm. In: Proceedings of 2016 IEEE 14th International Conference on Dependable, Autonomic and Secure Computing, 14th International Conference on Pervasive Intelligence and Computing, 2nd I International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), Auckland, pp. 537–543 (2016)
Lotfi, A., Seyedi, S.A., Moradi, P.: An improved density peaks method for data clustering. In: Proceedings of the 6th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, pp. 263–268 (2016)
Du, M., Ding, S., Jia, H.: Study on density peaks clustering based on K-nearest neighbors and principal component analysis. Knowl.-Based Syst. 99, 135–145 (2016)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
Fu, L., Medico, E.: FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinform. 8(1), 1–15 (2007)
Chang, H., Yeung, D.Y.: Robust path-based spectral clustering. Pattern Recogn. 41(1), 191–203 (2008)
Zhu, Y., Dass, S.C., Jain, A.K.: Statistical models for assessing the individuality of fingerprints. IEEE Trans. Inf. Forensics Secur. 2(3), 391–401 (2007)
Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 4 (2007)
Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2019). http://archive.ics.uci.edu/ml
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China under Grant No. 61702183.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, K., Xie, X., Sun, J., Cao, W. (2019). Minimum Spanning Tree Clustering Based on Density Filtering. In: Jin, H., Lin, X., Cheng, X., Shi, X., Xiao, N., Huang, Y. (eds) Big Data. BigData 2019. Communications in Computer and Information Science, vol 1120. Springer, Singapore. https://doi.org/10.1007/978-981-15-1899-7_15
Download citation
DOI: https://doi.org/10.1007/978-981-15-1899-7_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1898-0
Online ISBN: 978-981-15-1899-7
eBook Packages: Computer ScienceComputer Science (R0)