Abstract
Hierarchical clustering based on semantic feature thresholds is well applied in data mining. However, this method uses systematic sampling to select the edge points of the region during the region merging phase, resulting in the problems of under-consolidation and computational complexity when region merging based on the Euclidean distance criterion. To solve this problem, an improved hierarchical clustering method is proposed, which adopts the weighted Euclidean distance and optimizes the components of region combination. According to the orientation of the centers of two region clustering and the geometric relation among the edge points, two novel type of edge point selection strategies are established to reduce the number of edge points involved in the operation and ensure that the selected ones possess the minimum Euclidean distance in all edge point combinations. Experimental results show that the main advantages of this research project lie in its high clustering accuracy and less average clustering time.
Similar content being viewed by others
References
Hencil, Peter J.: Fast and economic clustering algorithms in data mining an analytical research. Nat. Rev. Microbiol. 6(5), 339–348 (2014)
Wu, D., Olson, D.L.: A TOPSIS data mining demonstration and application to credit scoring. Int. J. Data Warehous. Min. 2(3), 16–26 (2017)
Peña-Ayala, A.: Educational data mining: a survey and a data mining-based analysis of recent works. Expert Syst. Appl. 41(4), 1432–1462 (2014)
Hencil, Peter J.: Fast and economic clustering algorithms in data mining an analytical research. Nat. Rev. Microbiol. 6(5), 339–348 (2014)
Sanakal, R., Jayakumari, S.T.: Prognosis of diabetes using data mining approach-fuzzy c means clustering and support vector machine. Int. J. Comput. Trends Technol. 11(2), 94–98 (2014)
Kaur, M., Garg, S.K.: Survey on clustering techniques in data mining for software engineering. Indian J. Sci. Technol. 3(4), 238–243 (2014)
Sarumathi, S., Shanthi, N., Sharmila, M.: A comparative analysis of different categorical data clustering ensemble methods in data mining. Int. J. Comput. Appl. 81(4), 46–55 (2014)
Saravanan, D.: Text information retrieval using data mining clustering technique. Int. J. Appl. Eng. Res. 10(3), 7865–7873 (2015)
Innovative, P.I.I.: Application of clustering data mining techniques in temporal data sets of hydrology: a review. Int. J. Sci. Eng. Technol. 3(4), 359–363 (2014)
Park, I.K.: Clustering algorithm for data mining using posterior probability-based information entropy. J. Korea Soc. Comput. Inf. 12(12), 293–301 (2014)
Ren, D.Q., Zheng, D., Huang, G., et al.: Parallel set determination and k-means clustering for data mining on telecommunication networks. In: IEEE, International Conference on High Performance Computing and Communications & 2013, IEEE International Conference on Embedded and Ubiquitous Computing. IEEE, pp. 1553–1557. (2014)
Yotsawat, W., Srivihok, A.: Data mining of international tourists in thailand by two step clustering and classification. J. Comput. Theor. Nanosci. 20(1), 245–249 (2014)
Zhongming, Han, Ni, Chen, Hui, Zhang, et al.: A hierarchical clustering algorithm for asymmetric distance. Pattern Recognit. Artif. Intell. 27(05), 410–416 (2014)
Zhang, Q., Wang, Q., que Kim, S.: Journal of condensation, hierarchical clustering analysis method of rock mass structural plane random packet advantage. Chin. J. Geotech. Eng. 36(08), 1432–1437 (2014)
LuoEntao, Jun, Wang Guo.: A hierarchical clustering method based on semantic feature threshold in large data. J. Electr. Inf. 37(12), 2795–2801 (2015)
Jian, Hu, Bingru, Yang, Zefeng, Song, et al.: Web text clustering algorithm based on unstructured data mining structure model. J. Univ. Sci. Technol. Beijing 30(2), 217–220 (2008)
Luo, E.T., Wang, G.J.: A hierarchical clustering method based on the threshold of semantic feature in big data. J. Electr. Inf. Technol. 37, 2796–2800 (2015)
Tan, Y.H., Li, B., Li, X.Y., et al.: Designing a super-peer semantic network based on hierarchical clusters. In: Information Computing and Telecommunications. IEEE, pp. 194–197. (2010)
Clerkin P, Cunningham P, Hayes C.: Ontology discovery for the semantic web using hierarchical clustering. Trinity College Dublin, Department of Computer Science (2001)
Rocha, A.R., Pirmez, L., Delicato, F.C., et al.: WSNs clustering based on semantic neighborhood relationships. Comput. Netw. 56(5), 1627–1645 (2012)
Castro, R.M., Coates, M.J., Nowak, R.D.: Likelihood based hierarchical clustering[J]. Sig. Process. IEEE Trans. 52(8), 2308–2321 (2004)
Weia, C.P.: A Latent Semantic Indexing-based approach to multilingual document clustering. Deci. Support. Sys. 45(3), 606–620 (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
yu-feng, Y. Semantic feature hierarchical clustering algorithm based on improved regional merging strategy. Cluster Comput 22 (Suppl 1), 1495–1503 (2019). https://doi.org/10.1007/s10586-018-1941-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-018-1941-5