ABSTRACT
Density-based clustering methods are frequently used to define spatial clusters and outliers (noise) for location-only data. Different algorithms for solving this problem emerged over the past few decades, with their main difference being the numerical representation of the spatial density. A problem not addressed by conventional density-based clustering methods is defining alternate spatial cluster maps at statistically significant spatial scales. This problem differs from conventional clustering, as the goal of finding alternate clusters is to define different spatial cluster maps for all statistically significant spatial scales. Knowledge of distinct spatial scales pertinent to clustering is important for understanding various scales underlying the data. In addition, alternate clusters with different spatial scales can inform decisions that require to be made at different spatial granularity. In this paper, we introduce a statistical test that uses Kullback-Leibler (KL) divergence loss between different spatial density profiles to identify all statistically significant spatial scales at which clustering occurs. The proposed method defines different clustering maps that reflect different scales at which spatial clusters occur. We define the divergence on a 1-D representation of cluster density, the reachability profile, to cluster spatial units with varying spatial scales. We illustrate the use of multiple spatial clustering at different scales by comparing the proposed method to the state-of-the-art for defining a single map of multiscale clusters, HDBScan. We conclude the paper by applying the proposed method to physical and human geography problems, area of interest delineation, and wildfire cluster modeling, respectively.
- Mihael Ankerst, Markus M Breunig, Hans-Peter Kriegel, and Jörg Sander. 1999. OPTICS: Ordering points to identify the clustering structure. ACM Sigmod record 28, 2 (1999), 49--60.Google Scholar
- Ricardo JGB Campello, Davoud Moulavi, and Jörg Sander. 2013. Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining. Springer, 160--172.Google ScholarCross Ref
- Eunjoon Cho, Seth A Myers, and Jure Leskovec. 2011. Friendship and mobility: user movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 1082--1090.Google ScholarDigital Library
- Xiaowen Dong, Dimitrios Mavroeidis, Francesco Calabrese, and Pascal Frossard. 2015. Multiscale event detection in social media. Data Mining and Knowledge Discovery 29, 5 (2015), 1374--1405.Google ScholarDigital Library
- Jeff Eidenshink, Brian Schwind, Ken Brewer, Zhi-Liang Zhu, Brad Quayle, and Stephen Howard. 2007. A project for monitoring trends in burn severity. Fire ecology 3, 1 (2007), 3--21.Google Scholar
- Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, Vol. 96. 226--231.Google Scholar
- Daniela P González, Mauricio Monsalve, Roberto Moris, and Cristóbal Herrera. 2018. Risk and Resilience Monitor: Development of multiscale and multilevel indicators for disaster risk management for the communes and urban areas of Chile. Applied geography 94 (2018), 262--271.Google Scholar
- Yingjie Hu, Song Gao, Krzysztof Janowicz, Bailang Yu, Wenwen Li, and Sathya Prasad. 2015. Extracting and understanding urban areas of interest using geo-tagged photos. Computers, Environment and Urban Systems 54 (2015), 240--254.Google ScholarCross Ref
- Yuhao Kang, Song Gao, Yunlei Liang, Mingxiao Li, Jinmeng Rao, and Jake Kruse. 2020. Multiscale dynamic human mobility flow dataset in the US during the COVID-19 epidemic. Scientific data 7, 1 (2020), 1--13.Google Scholar
- Teuvo Kohonen. 1990. The self-organizing map. Proc. IEEE 78, 9 (1990), 1464--1480.Google ScholarCross Ref
- Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics 22, 1 (1951), 79--86.Google Scholar
- Pabitra Mitra, CA Murthy, and Sankar K Pal. 2002. Density based multiscale data condensation. (2002).Google Scholar
- Radford M Neal and Geoffrey E Hinton. 1998. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in graphical models. Springer, 355--368.Google Scholar
- Marc-André Parisien and Max A Moritz. 2009. Environmental controls on the distribution of wildfire at multiple spatial scales. Ecological Monographs 79, 1 (2009), 127--154.Google ScholarCross Ref
- Bin Peng, Kaiyu Guan, Jinyun Tang, Elizabeth A Ainsworth, Senthold Asseng, Carl J Bernacchi, Mark Cooper, Evan H Delucia, Joshua W Elliott, Frank Ewert, et al. 2020. Towards a multiscale crop modelling framework for climate change adaptation assessment. Nature Plants 6, 4 (2020), 338--348.Google ScholarCross Ref
- Tian-Tian Zhang and Bo Yuan. 2018. Density-based multiscale analysis for clustering in strong noise settings with varying densities. IEEE Access 6 (2018), 25861--25873.Google ScholarCross Ref
Index Terms
- Density-based cluster detection at multiple spatial scales via kullback-leibler divergence of reachability profiles
Recommendations
A density-based spatial clustering algorithm considering both spatial proximity and attribute similarity
Geometrical properties and attributes are two important characteristics of a spatial object. In previous spatial clustering studies, these two characteristics were often neglected. This paper addresses the problem of how to accommodate geometrical ...
Getis-Ord's hot- and cold-spot statistics as a basis for multivariate spatial clustering of orchard tree data
We propose a multivariate spatial clustering approach for partitioning orchard data.Data is spatially scaled by Getis-Ord Gi statistic, followed by k-means clustering.Trees are discriminated into spatially homogeneous groups.Feasibility and performance ...
Spatial entropy-based clustering for mining data with spatial correlation
PAKDD'11: Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part IDue to the inherent characteristics of spatial datasets, spatial clustering methods need to consider spatial attributes, nonspatial attributes and spatial correlation among non-spatial attributes across space. However, most existing spatial clustering ...
Comments