Skip to main content

Efficient strategies for spatial data clustering using topological relations

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Using topology in data analysis is a promising new field, and recently, it has attracted numerous researchers and played a vital role in both research and application. This study explores the burgeoning field of topology-based data analysis, mainly focusing on its application in clustering algorithms within data mining. Our research addresses the critical challenges of reducing execution time and enhancing clustering quality, which includes decreasing the dependency on input parameters - a notable limitation in current methods. We propose five innovative strategies to optimize clustering algorithms that utilize topological relationships by combining solutions of expanding points fewer times, merging clusters, and using a jump to increase the radius value according to the nearest neighbor distance array index. These strategies aim to refine clustering performance by improving algorithmic efficiency and the quality of clustering outcomes. This approach elevates the standard of cluster analysis and contributes significantly to the evolving landscape of data mining and analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

Data will be made available on reasonable request

Notes

  1. http://download.geofabrik.de/asia/vietnam.html

  2. http://cs.uef.fi/mopsi/routes/network/

  3. https://viblo.asia/p/hierarchical-clustering-phan-cum-du-lieu-maGK7q2elj2.

  4. https://download.ge ofabrik.de/asia/vietnam.html

  5. https://hub.arcgis.com/search

  6. http://insideairbnb.com

  7. http://cs.uef.fi/mopsi/routes/network/

References

  1. Tobler WR (1970) A computer movie simulating urban growth in the detroit region. Econ Geogr 46:234–240. https://doi.org/10.2307/143141

    Article  MATH  Google Scholar 

  2. Wang Y, Peng H, Xiong Y, Song H (2023) Spatial relationship recognition via heterogeneous representation: A review. Neurocomputing 533:116–140. https://doi.org/10.1016/j.neucom.2023.02.053

    Article  MATH  Google Scholar 

  3. Han J, Pei J, Tong H (2022) Data Mining: Concepts and Techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier Science, Waltham, Mass. https://doi.org/10.1016/C2009-0-61819-5

  4. Hartigan JA, Wong MA (1979) Algorithm as 136: A k-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108

    MATH  Google Scholar 

  5. Mai ST et al (2022) Incremental density-based clustering on multicore processors. IEEE Trans Pattern Anal Mach Intell 44(3):1338–1356. https://doi.org/10.1109/TPAMI.2020.3023125

    Article  MATH  Google Scholar 

  6. Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) Optics: ordering points to identify the clustering structure. SIGMOD Rec. 28(2):49–60. https://doi.org/10.1145/304181.304187

    Article  Google Scholar 

  7. Zhang Z, Zhang J, Xue H (2008) Improved k-means clustering algorithm. In: 2008 Congress on image and signal processing, pp 169–172. https://doi.org/10.1109/CISP.2008.350

  8. Li L, You J, Han G, Chen H (2012) Double partition around medoids based cluster ensemble. In: 2012 International conference on machine learning and cybernetics, pp 1390–1394. https://doi.org/10.1109/ICMLC.2012.6359568

  9. Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. ACM Sigmod Record 25(2):103–114

    Article  MATH  Google Scholar 

  10. Guha S, Rastogi R, Shim K (1998) Cure: An efficient clustering algorithm for large databases. ACM Sigmod Record 27(2):73–84

    Article  MATH  Google Scholar 

  11. Schikuta E (1996) Grid-clustering: an efficient hierarchical clustering method for very large data sets. In: Proceedings of 13th international conference on pattern recognition, pp 101–1052. https://doi.org/10.1109/ICPR.1996.546732

  12. Yanchang Z, Junde S (2001) Gdilc: a grid-based density-isoline clustering algorithm. In: 2001 International conferences on info-tech and info-net. proceedings (Cat. No.01EX479):pp 140–1453. https://doi.org/10.1109/ICII.2001.983048

  13. Sheikholeslami G, Chatterjee S, Zhang A (2000) Wavecluster: a wavelet-based clustering approach for spatial data in very large databases. The VLDB Journal 8(3):289–304. https://doi.org/10.1007/s007780050009

    Article  MATH  Google Scholar 

  14. Liang B, Cai J, Yang H (2023) Grid-dpc: Improved density peaks clustering based on spatial grid walk. Appl Intell 53(3):3221–3239. https://doi.org/10.1007/s10489-022-03705-y

    Article  MATH  Google Scholar 

  15. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 226–231. AAAI Press

  16. Mai ST, Assent I, Le A (2016) Anytime optics: An efficient approach for hierarchical density-based clustering. In: Navathe SB, Wu W, Shekhar S, Du X, Wang XS, Xiong H,(eds.) Database systems for advanced applications, pp 164–179. Springer. Cham

  17. Mai G, Janowicz K, Hu Y, Gao S (2016) Adcn: An anisotropic density-based clustering algorithm. In: Proceedings of the 24th ACM SIGSPATIAL international conference on advances in geographic information systems. https://doi.org/10.1145/2996913.2996940. ACM

  18. Ng RT, Han J (2002) Clarans: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016. https://doi.org/10.1109/TKDE.2002.1033770

    Article  MATH  Google Scholar 

  19. Tran T, Vo B, Le TTN, Nguyen NT (2017) Text clustering using frequent weighted utility itemsets. Cybern Syst 48(3):193–209. https://doi.org/10.1080/01969722.2016.1276774

    Article  MATH  Google Scholar 

  20. Murad A, Khashoggi BF (2020) Using gis for disease mapping and clustering in jeddah, saudi arabia. ISPRS Int J Geo-Information 9(5). https://doi.org/10.3390/ijgi9050328

  21. Sarubbi JFM, Mesquita CMR, Wanner EF, Santos VF, Silva CM (2016) A strategy for clustering students minimizing the number of bus stops for solving the school bus routing problem. In: NOMS 2016 - 2016 IEEE/IFIP network operations and management symposium, pp 1175–1180. https://doi.org/10.1109/NOMS.2016.7502983

  22. Guimarães AG, Maia AD (2017) Challenges and innovation opportunities in load multimodal transport - lmt in brazil: cluster technique application as a support tool for decision making. Transp Res Procedia 25:870–887. https://doi.org/10.1016/j.trpro.2017.05.463

    Article  MATH  Google Scholar 

  23. Gao Y, Zhang Y, Alsulaiman H (2021) Spatial structure system of land use along urban rail transit based on gis spatial clustering. Eur J Remote Sens 54(sup2):438–445. https://doi.org/10.1080/22797254.2020.1801356

    Article  MATH  Google Scholar 

  24. Prasad RK, Sarmah R, Chakraborty S, Sarmah S (2023) Nnvdc: A new versatile density-based clustering method using k-nearest neighbors. Expert Syst Appl 227:120250. https://doi.org/10.1016/j.eswa.2023.120250

    Article  MATH  Google Scholar 

  25. Li J, Chen S, Pan X, Yuan Y, Shen H-B (2022) Cell clustering for spatial transcriptomics data with graph neural networks. Nat Comput Sci 2(6):6. https://doi.org/10.1038/s43588-022-00266-5

    Article  MATH  Google Scholar 

  26. Li H, Du T, Wan X (2023) Time series clustering based on relationship network and community detection. Expert Syst Appl 216:119481. https://doi.org/10.1016/j.eswa.2022.119481

    Article  MATH  Google Scholar 

  27. Yeturu K (2020) Chapter 3 - machine learning algorithms, applications, and practices in data science. In: Srinivasa Rao ASR, Rao CR,(eds.) Principles and methods for data science. Handbook of Statistics, vol 43, pp 81–206. Elsevier, . https://doi.org/10.1016/bs.host.2020.01.002. https://www.sciencedirect.com/science/article/pii/S0169716120300225

  28. Cheng D, Xu R, Zhang B, Jin R (2023) Fast density estimation for density-based clustering methods 532:170–182 https://doi.org/10.1016/j.neucom.2023.02.035. Accessed 2024-03-01

  29. Campello RJGB, Kröger P, Sander J, Zimek A (2020) Density-based clustering. WIREs Data Min Knowl Disc 10(2):1343. https://doi.org/10.1002/widm.1343

    Article  MATH  Google Scholar 

  30. Mishra G, Mohanty SK (2019) A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree. Expert Syst Appl 132:28–43. https://doi.org/10.1016/j.eswa.2019.04.048

    Article  MATH  Google Scholar 

  31. Bui Q-T, Vo B (2020) Do H-AN, Hung NQV, Snasel V (2019) F-mapper: A fuzzy mapper clustering algorithm. Knowl-Based Syst 189:105107. https://doi.org/10.1016/j.knosys.2019.105107

    Article  MATH  Google Scholar 

  32. Nguyen TTD, Nguyen LTT, Bui Q-T, Yun U, Vo B (2023) An efficient topological-based clustering method on spatial data in network space. Expert Syst Appl 215:119395. https://doi.org/10.1016/j.eswa.2022.119395

    Article  MATH  Google Scholar 

  33. Khan MS (2024) An approach to extract topological information from intuitionistic fuzzy sets and their application in obtaining a natural hierarchical clustering algorithm. Appl Soft Comput, pp 111691. https://doi.org/10.1016/j.asoc.2024.111691

  34. Alomari HW, Al-Badarneh AF, Al-Alaj A, Khamaiseh SY (2023) Enhanced approach for agglomerative clustering using topological relations. IEEE Access 11:21945–21967. https://doi.org/10.1109/ACCESS.2023.3252374

    Article  MATH  Google Scholar 

  35. Alomari HW, Al-Badarneh AF (2016) A topological-based spatial data clustering. In: Casasent D, Alam M.S,(eds.) Optical pattern recognition XXVII, pp 98450. SPIE, Baltimore, Maryland, United States. https://doi.org/10.1117/12.2229413

  36. Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7

    Article  MATH  Google Scholar 

  37. Zhang M, Ma Y, Li J, Zhang J (2023) A density connection weight-based clustering approach for dataset with density-sparse region. Expert Syst Appl 230:120633. https://doi.org/10.1016/j.eswa.2023.120633

    Article  MATH  Google Scholar 

  38. Ros F, Guillaume S (2019) Munec: a mutual neighbor-based clustering algorithm. Inf Sci 486:148–170. https://doi.org/10.1016/j.ins.2019.02.051

    Article  MathSciNet  MATH  Google Scholar 

  39. Nguyen TTD, Nguyen LTT, Nguyen A, Yun U, Vo B (2021) A method for efficient clustering of spatial data in network space. J Intell & Fuzzy Syst 40(6):11653–11670. https://doi.org/10.3233/JIFS-202806

    Article  MATH  Google Scholar 

  40. Ros F, Guillaume S, Riad R, El Hajji M (2022) Detection of natural clusters via s-dbscan a self-tuning version of dbscan. Knowl-Based Syst 241:108288. https://doi.org/10.1016/j.knosys.2022.108288

    Article  MATH  Google Scholar 

  41. Li Y, Zhou W, Wang H (2020) F-dpc: Fuzzy neighborhood-based density peak algorithm. IEEE Access 8:165963–165972. https://doi.org/10.1109/ACCESS.2020.3022954

    Article  Google Scholar 

  42. Tkachenko R, Izonin I (2019) Model and principles for the implementation of neural-like structures based on geometric data transformations. In: Hu Z, Petoukhov S, Dychka I, He M,(eds.) Advances in computer science for engineering and education, pp 578–587. Springer, Cham

  43. Mukherjee A, Goswami P, Yang L, Sah Tyagi SK, Samal UC, Mohapatra SK (2020) Deep neural network-based clustering technique for secure iiot. Neural Comput & Applic 32(20):16109–16117. https://doi.org/10.1007/s00521-020-04763-4

    Article  Google Scholar 

  44. Tsiotas D, Tselios V (2023) Dimension reduction in the topology of multilayer spatial networks: The case of the interregional commuting in greece. Netw Spat Econ 23(1):97–133. https://doi.org/10.1007/s11067-022-09578-5

    Article  MATH  Google Scholar 

  45. Liu Q, Deng M, Shi Y, Wang J (2012) A density-based spatial clustering algorithm considering both spatial proximity and attribute similarity. Comput Geosci 46:296–309. https://doi.org/10.1016/j.cageo.2011.12.017

    Article  MATH  Google Scholar 

  46. Bui Q-T, Vo B, Snasel V, Pedrycz W, Hong T-P, Nguyen N-T, Chen M-Y (2021) Sfcm: A fuzzy clustering algorithm of extracting the shape information of data. IEEE Trans Fuzzy Syst 29(1):75–89. https://doi.org/10.1109/TFUZZ.2020.3014662

    Article  MATH  Google Scholar 

  47. Wang T, Ren C, Luo Y, Tian J (2019) Ns-dbscan: A density-based clustering algorithm in network space. ISPRS International Journal of Geo-Information 8(5). https://doi.org/10.3390/ijgi8050218

  48. Di Felice P, Clementini E (2009) Topological relationships. In: Liu L, Özsu MT,(eds.) Encyclopedia of database systems, pp 3140–3143. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_432

  49. Haklay M (2010) How good is volunteered geographical information? a comparative study of openstreetmap and ordnance survey datasets. Environ Plann B Plann Des 37(4):682–703. https://doi.org/10.1068/b35097

    Article  Google Scholar 

  50. Zhao Q, Xu M, Fränti P (2009) Sum-of-squares based cluster validity index and significance analysis. In: Kolehmainen M, Toivanen P, Beliczynski B,(eds.) Adaptive and natural computing algorithms, pp 313–322. Springer, Berlin, Heidelberg

  51. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1(2):224–227. https://doi.org/10.1109/TPAMI.1979.4766909

  52. Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybern 4(1):95–104. https://doi.org/10.1080/01969727408546059

    Article  MathSciNet  MATH  Google Scholar 

  53. Jianyun L, Junming S, Chunling W (2023) Multi-level clustering based on cluster order constructed with dynamic local density. Appl Intell 53(8):9744–9761. https://doi.org/10.1007/s10489-022-03830-8

    Article  MATH  Google Scholar 

  54. Elaziz MA, Zaid EOA, Al-qaness MAA, Ibrahim RA (2021) Automatic superpixel-based clustering for color image segmentation using q-generalized pareto distribution under linear normalization and hunger games search. Mathematics 9(19):19. https://doi.org/10.3390/math9192383

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This research is funded by the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2021.08.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Loan T. T. Nguyen or Bay Vo.

Ethics declarations

Competing of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

This paper contains no studies with human participants or animals performed by any authors.

Consent to Participate

Informed consent was obtained from all individual participants included in the study.

Consent to Publish

Authors give consent to the Applied Intelligence Journal to publish their paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, T.T.D., Nguyen, L.T.T., Bui, QT. et al. Efficient strategies for spatial data clustering using topological relations. Appl Intell 55, 203 (2025). https://doi.org/10.1007/s10489-024-05927-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05927-8

Keywords