Abstract
Using topology in data analysis is a promising new field, and recently, it has attracted numerous researchers and played a vital role in both research and application. This study explores the burgeoning field of topology-based data analysis, mainly focusing on its application in clustering algorithms within data mining. Our research addresses the critical challenges of reducing execution time and enhancing clustering quality, which includes decreasing the dependency on input parameters - a notable limitation in current methods. We propose five innovative strategies to optimize clustering algorithms that utilize topological relationships by combining solutions of expanding points fewer times, merging clusters, and using a jump to increase the radius value according to the nearest neighbor distance array index. These strategies aim to refine clustering performance by improving algorithmic efficiency and the quality of clustering outcomes. This approach elevates the standard of cluster analysis and contributes significantly to the evolving landscape of data mining and analysis.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
Data will be made available on reasonable request
Notes
http://download.geofabrik.de/asia/vietnam.html
http://cs.uef.fi/mopsi/routes/network/
https://viblo.asia/p/hierarchical-clustering-phan-cum-du-lieu-maGK7q2elj2.
https://download.ge ofabrik.de/asia/vietnam.html
https://hub.arcgis.com/search
http://insideairbnb.com
http://cs.uef.fi/mopsi/routes/network/
References
Tobler WR (1970) A computer movie simulating urban growth in the detroit region. Econ Geogr 46:234–240. https://doi.org/10.2307/143141
Wang Y, Peng H, Xiong Y, Song H (2023) Spatial relationship recognition via heterogeneous representation: A review. Neurocomputing 533:116–140. https://doi.org/10.1016/j.neucom.2023.02.053
Han J, Pei J, Tong H (2022) Data Mining: Concepts and Techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier Science, Waltham, Mass. https://doi.org/10.1016/C2009-0-61819-5
Hartigan JA, Wong MA (1979) Algorithm as 136: A k-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108
Mai ST et al (2022) Incremental density-based clustering on multicore processors. IEEE Trans Pattern Anal Mach Intell 44(3):1338–1356. https://doi.org/10.1109/TPAMI.2020.3023125
Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) Optics: ordering points to identify the clustering structure. SIGMOD Rec. 28(2):49–60. https://doi.org/10.1145/304181.304187
Zhang Z, Zhang J, Xue H (2008) Improved k-means clustering algorithm. In: 2008 Congress on image and signal processing, pp 169–172. https://doi.org/10.1109/CISP.2008.350
Li L, You J, Han G, Chen H (2012) Double partition around medoids based cluster ensemble. In: 2012 International conference on machine learning and cybernetics, pp 1390–1394. https://doi.org/10.1109/ICMLC.2012.6359568
Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. ACM Sigmod Record 25(2):103–114
Guha S, Rastogi R, Shim K (1998) Cure: An efficient clustering algorithm for large databases. ACM Sigmod Record 27(2):73–84
Schikuta E (1996) Grid-clustering: an efficient hierarchical clustering method for very large data sets. In: Proceedings of 13th international conference on pattern recognition, pp 101–1052. https://doi.org/10.1109/ICPR.1996.546732
Yanchang Z, Junde S (2001) Gdilc: a grid-based density-isoline clustering algorithm. In: 2001 International conferences on info-tech and info-net. proceedings (Cat. No.01EX479):pp 140–1453. https://doi.org/10.1109/ICII.2001.983048
Sheikholeslami G, Chatterjee S, Zhang A (2000) Wavecluster: a wavelet-based clustering approach for spatial data in very large databases. The VLDB Journal 8(3):289–304. https://doi.org/10.1007/s007780050009
Liang B, Cai J, Yang H (2023) Grid-dpc: Improved density peaks clustering based on spatial grid walk. Appl Intell 53(3):3221–3239. https://doi.org/10.1007/s10489-022-03705-y
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 226–231. AAAI Press
Mai ST, Assent I, Le A (2016) Anytime optics: An efficient approach for hierarchical density-based clustering. In: Navathe SB, Wu W, Shekhar S, Du X, Wang XS, Xiong H,(eds.) Database systems for advanced applications, pp 164–179. Springer. Cham
Mai G, Janowicz K, Hu Y, Gao S (2016) Adcn: An anisotropic density-based clustering algorithm. In: Proceedings of the 24th ACM SIGSPATIAL international conference on advances in geographic information systems. https://doi.org/10.1145/2996913.2996940. ACM
Ng RT, Han J (2002) Clarans: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016. https://doi.org/10.1109/TKDE.2002.1033770
Tran T, Vo B, Le TTN, Nguyen NT (2017) Text clustering using frequent weighted utility itemsets. Cybern Syst 48(3):193–209. https://doi.org/10.1080/01969722.2016.1276774
Murad A, Khashoggi BF (2020) Using gis for disease mapping and clustering in jeddah, saudi arabia. ISPRS Int J Geo-Information 9(5). https://doi.org/10.3390/ijgi9050328
Sarubbi JFM, Mesquita CMR, Wanner EF, Santos VF, Silva CM (2016) A strategy for clustering students minimizing the number of bus stops for solving the school bus routing problem. In: NOMS 2016 - 2016 IEEE/IFIP network operations and management symposium, pp 1175–1180. https://doi.org/10.1109/NOMS.2016.7502983
Guimarães AG, Maia AD (2017) Challenges and innovation opportunities in load multimodal transport - lmt in brazil: cluster technique application as a support tool for decision making. Transp Res Procedia 25:870–887. https://doi.org/10.1016/j.trpro.2017.05.463
Gao Y, Zhang Y, Alsulaiman H (2021) Spatial structure system of land use along urban rail transit based on gis spatial clustering. Eur J Remote Sens 54(sup2):438–445. https://doi.org/10.1080/22797254.2020.1801356
Prasad RK, Sarmah R, Chakraborty S, Sarmah S (2023) Nnvdc: A new versatile density-based clustering method using k-nearest neighbors. Expert Syst Appl 227:120250. https://doi.org/10.1016/j.eswa.2023.120250
Li J, Chen S, Pan X, Yuan Y, Shen H-B (2022) Cell clustering for spatial transcriptomics data with graph neural networks. Nat Comput Sci 2(6):6. https://doi.org/10.1038/s43588-022-00266-5
Li H, Du T, Wan X (2023) Time series clustering based on relationship network and community detection. Expert Syst Appl 216:119481. https://doi.org/10.1016/j.eswa.2022.119481
Yeturu K (2020) Chapter 3 - machine learning algorithms, applications, and practices in data science. In: Srinivasa Rao ASR, Rao CR,(eds.) Principles and methods for data science. Handbook of Statistics, vol 43, pp 81–206. Elsevier, . https://doi.org/10.1016/bs.host.2020.01.002. https://www.sciencedirect.com/science/article/pii/S0169716120300225
Cheng D, Xu R, Zhang B, Jin R (2023) Fast density estimation for density-based clustering methods 532:170–182 https://doi.org/10.1016/j.neucom.2023.02.035. Accessed 2024-03-01
Campello RJGB, Kröger P, Sander J, Zimek A (2020) Density-based clustering. WIREs Data Min Knowl Disc 10(2):1343. https://doi.org/10.1002/widm.1343
Mishra G, Mohanty SK (2019) A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree. Expert Syst Appl 132:28–43. https://doi.org/10.1016/j.eswa.2019.04.048
Bui Q-T, Vo B (2020) Do H-AN, Hung NQV, Snasel V (2019) F-mapper: A fuzzy mapper clustering algorithm. Knowl-Based Syst 189:105107. https://doi.org/10.1016/j.knosys.2019.105107
Nguyen TTD, Nguyen LTT, Bui Q-T, Yun U, Vo B (2023) An efficient topological-based clustering method on spatial data in network space. Expert Syst Appl 215:119395. https://doi.org/10.1016/j.eswa.2022.119395
Khan MS (2024) An approach to extract topological information from intuitionistic fuzzy sets and their application in obtaining a natural hierarchical clustering algorithm. Appl Soft Comput, pp 111691. https://doi.org/10.1016/j.asoc.2024.111691
Alomari HW, Al-Badarneh AF, Al-Alaj A, Khamaiseh SY (2023) Enhanced approach for agglomerative clustering using topological relations. IEEE Access 11:21945–21967. https://doi.org/10.1109/ACCESS.2023.3252374
Alomari HW, Al-Badarneh AF (2016) A topological-based spatial data clustering. In: Casasent D, Alam M.S,(eds.) Optical pattern recognition XXVII, pp 98450. SPIE, Baltimore, Maryland, United States. https://doi.org/10.1117/12.2229413
Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Zhang M, Ma Y, Li J, Zhang J (2023) A density connection weight-based clustering approach for dataset with density-sparse region. Expert Syst Appl 230:120633. https://doi.org/10.1016/j.eswa.2023.120633
Ros F, Guillaume S (2019) Munec: a mutual neighbor-based clustering algorithm. Inf Sci 486:148–170. https://doi.org/10.1016/j.ins.2019.02.051
Nguyen TTD, Nguyen LTT, Nguyen A, Yun U, Vo B (2021) A method for efficient clustering of spatial data in network space. J Intell & Fuzzy Syst 40(6):11653–11670. https://doi.org/10.3233/JIFS-202806
Ros F, Guillaume S, Riad R, El Hajji M (2022) Detection of natural clusters via s-dbscan a self-tuning version of dbscan. Knowl-Based Syst 241:108288. https://doi.org/10.1016/j.knosys.2022.108288
Li Y, Zhou W, Wang H (2020) F-dpc: Fuzzy neighborhood-based density peak algorithm. IEEE Access 8:165963–165972. https://doi.org/10.1109/ACCESS.2020.3022954
Tkachenko R, Izonin I (2019) Model and principles for the implementation of neural-like structures based on geometric data transformations. In: Hu Z, Petoukhov S, Dychka I, He M,(eds.) Advances in computer science for engineering and education, pp 578–587. Springer, Cham
Mukherjee A, Goswami P, Yang L, Sah Tyagi SK, Samal UC, Mohapatra SK (2020) Deep neural network-based clustering technique for secure iiot. Neural Comput & Applic 32(20):16109–16117. https://doi.org/10.1007/s00521-020-04763-4
Tsiotas D, Tselios V (2023) Dimension reduction in the topology of multilayer spatial networks: The case of the interregional commuting in greece. Netw Spat Econ 23(1):97–133. https://doi.org/10.1007/s11067-022-09578-5
Liu Q, Deng M, Shi Y, Wang J (2012) A density-based spatial clustering algorithm considering both spatial proximity and attribute similarity. Comput Geosci 46:296–309. https://doi.org/10.1016/j.cageo.2011.12.017
Bui Q-T, Vo B, Snasel V, Pedrycz W, Hong T-P, Nguyen N-T, Chen M-Y (2021) Sfcm: A fuzzy clustering algorithm of extracting the shape information of data. IEEE Trans Fuzzy Syst 29(1):75–89. https://doi.org/10.1109/TFUZZ.2020.3014662
Wang T, Ren C, Luo Y, Tian J (2019) Ns-dbscan: A density-based clustering algorithm in network space. ISPRS International Journal of Geo-Information 8(5). https://doi.org/10.3390/ijgi8050218
Di Felice P, Clementini E (2009) Topological relationships. In: Liu L, Özsu MT,(eds.) Encyclopedia of database systems, pp 3140–3143. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_432
Haklay M (2010) How good is volunteered geographical information? a comparative study of openstreetmap and ordnance survey datasets. Environ Plann B Plann Des 37(4):682–703. https://doi.org/10.1068/b35097
Zhao Q, Xu M, Fränti P (2009) Sum-of-squares based cluster validity index and significance analysis. In: Kolehmainen M, Toivanen P, Beliczynski B,(eds.) Adaptive and natural computing algorithms, pp 313–322. Springer, Berlin, Heidelberg
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1(2):224–227. https://doi.org/10.1109/TPAMI.1979.4766909
Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybern 4(1):95–104. https://doi.org/10.1080/01969727408546059
Jianyun L, Junming S, Chunling W (2023) Multi-level clustering based on cluster order constructed with dynamic local density. Appl Intell 53(8):9744–9761. https://doi.org/10.1007/s10489-022-03830-8
Elaziz MA, Zaid EOA, Al-qaness MAA, Ibrahim RA (2021) Automatic superpixel-based clustering for color image segmentation using q-generalized pareto distribution under linear normalization and hunger games search. Mathematics 9(19):19. https://doi.org/10.3390/math9192383
Acknowledgements
This research is funded by the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2021.08.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
This paper contains no studies with human participants or animals performed by any authors.
Consent to Participate
Informed consent was obtained from all individual participants included in the study.
Consent to Publish
Authors give consent to the Applied Intelligence Journal to publish their paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nguyen, T.T.D., Nguyen, L.T.T., Bui, QT. et al. Efficient strategies for spatial data clustering using topological relations. Appl Intell 55, 203 (2025). https://doi.org/10.1007/s10489-024-05927-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05927-8