Abstract
In the case of current technology, most of the measurements are focused on geometric distance, and the distribution of data is not considered. In order to compensate for this shortcoming of geometric distance measurement, this paper uses the KL distance as the similarity measurement standard for uncertain data, and the DOUD_C algorithm and COUD_C algorithm are proposed respectively in the discrete domain and continuous domain. In order to solve the problem of efficient clustering of the high dimensional data, this paper considers the data structure of grid, and BROUD_C algorithm is proposed. According to the adjacency characteristic of the grid, the cluster process is extended continuously, the algorithm can find clusters of arbitrary shapes, and we can filter a large number of isolated points, it solves the uncertain data clustering problem effectively in the obstacle space. The experimental results show that compared to the OBS_UK_means with VPA and SDA pruning algorithm and FOPTICS algorithm, the clustering performance of BROUD_C algorithm is more significant and CPU has less execution time in the obstacle space.
Similar content being viewed by others
References
Zhou, A.-Y., Jin, C.-Q., Wang, G.-R., et al. (2009). A survey on the management of uncertain data. Chinese Journal of Computers,32(01), 1–16.
Pourzaferani, M., Barekatain, B., & Dehghani, S. (2018). An enhanced energy-aware cluster-based routing algorithm in wireless sensor networks. Wireless Personal Communications, 98(1), 1605–1635.
Xu, L., Hu, Q., & Zhang, X., et al. (2016). AdaUK-Means: An ensemble boosting clustering algorithm on uncertain objects. In Chinese conference on pattern recognition (pp. 27–41). Singapore: Springer.
Liao, K.-T., Liu, C.-M. (2017). An effective clustering mechanism for uncertain data mining using centroid boundary in UKmeans. In Computer symposium (pp. 300–305). IEEE.
Kriegel, H. P, & Pfeifle, M. (2005). Hierarchical density-based clustering of uncertain data (pp. 689–692).
Erdem, A., & Gündem, T. I. (2014). M-FDBSCAN: A multicore density-based uncertain data clustering algorithm. Turkish Journal of Electrical Engineering & Computer Sciences,22(1), 143–154.
Tepwankul, A., & Maneewongwattana, S. (2010). U-DBSCAN: A density-based clustering algorithm for uncertain objects. In: IEEE international conference on data engineering workshops (pp. 136–143). IEEE.
Liu, H., Zhang, X., Zhang, X., et al. (2017). Self-adapted mixture distance measure for clustering uncertain data. Knowledge-Based Systems,126, 33–47.
Ngai, W. K., Kao, B., Chui, C. K., Cheng, R., Chau, M., & Yip, K. Y. (2006). Efficient clustering of uncertain data. In Sixth international conference on data mining (ICDM’06), Hong Kong (pp. 436–445).
Kao, B., Lee, S. D., Lee, F. K. F., et al. (2010). Clustering uncertain data using Voronoi diagrams and R-tree index. IEEE Transactions on Knowledge and Data Engineering,22(9), 1219–1233.
Lin, Y. C., Yang, D. N., & Chen, M. S. (2010). Data selection for exact value acquisition to improve uncertain clustering. Web-age information management (pp. 459–470). Berlin: Springer.
Zhang, J., Papadias, D., & Mouratidis, K., et al. (2004). Spatial queries in the presence of obstacles. In: International conference and proceedings on extending database technology, advances in database technology - EDBT 2004, Heraklion, Crete, Greece, March 14–18, DBLP (pp. 366–384).
Keyan, C. A. O., Wang, G., Han, D., et al. (2012). Clustering algorithm of uncertain data in obstacle space. Journal of Frontiers of Computer Science and Technology,6(12), 1087–1097.
Zhang, X., Du, H., & Yang, T., et al. (2010). A novel spatial clustering with obstacles constraints based on PNPSO and K-Medoids. In Advances in swarm intelligence (pp. 605–610). Berlin: Springer.
Zhou, J., Pan, Y., & Chen, C. L. P., et al. (2017). K-medoids method based on divergence for uncertain data clustering. In IEEE International conference on systems, man, and cybernetics (pp. 2671–2674). IEEE.
Shan, D., & Yang, Z. (2013). Hierarchical clustering analysis method based on the grid with obstacle space. Journal of Digital Information Management,11(2), 207–211.
Xiao, L., & Hung, E. (). An efficient distance calculation method for uncertain objects. In IEEE symposium on computational intelligence and data mining, 2007. CIDM 2007 (pp. 10–17). IEEE.
Xing, C., & Wen, P. (2015). Uncertain data streams clustering algorithm based on grid density and force. Application Research of Computers,32(1), 98–101.
Wang, J. (2014). Research on clustering algorithm for uncertain data based on probability distribution similarity. Xi’an: Xidian University.
Xu, L., Hu, Q., Hung, E., et al. (2015). Large margin clustering on uncertain data by considering probability distribution similarity. Neurocomputing,158(C), 81–89.
Ming, H. (2010). The research on spatial data clustering based on space partition. Wuhan: Wuhan University.
Li, C., Sun, Z., Chen, G., et al. (2004). Kernel density estimation and its application to clustering algorithm construction. Journal of Computer Research and Development,10, 1712–1719.
Yang, C., Duraiswami, R., & Gumerov, N. A., et al. (2003). Improved fast gauss transform and efficient kernel density estimation. In IEEE international conference and proceedings on computer vision (Vol. 1, pp. 664–671). IEEE.
Webb, A. R., Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification (pp. xx + 654). New York: Wiley, ISBN: 0-471-05669-3. (2007). Journal of Classification 24(2):305–307.
Cao, Z., Sun, R., & Li, M. (2014). A method for clustering uncertain data streams based on GMM. Journal of Computer Research and Development,51(S2), 102–109.
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant No. 61370084; the Science and Technology Research Project of Heilongjiang Provincial Education Department 1253lz004.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wan, J., Cui, M., He, Y. et al. Uncertain Data Clustering Based on Probability Distribution in Obstacle Space. Wireless Pers Commun 111, 2191–2214 (2020). https://doi.org/10.1007/s11277-019-06980-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-019-06980-0