Abstract
Finding the regions where people appear plays a key role in many fields like user behavior analysis, urban planning, etc. Therefore, how to partition the world, especially the urban areas where people are crowd and active, into regions is very crucial. In this paper, we propose a novel method called Restricted Spatial-Temporal DBSCAN (RST-DBSCAN). The key idea is to partition busy urban areas based on spatial-temporal information. Arbitrary and separated shapes of regions in urban areas would be then obtained. Besides, we would further get busier region earlier by RST-DBSCAN. Experimental results show that our approach yields significant improvements over existing methods on a real-world dataset extracted from Gowalla, a location-based social network.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
With the rapid development of internet, cyberspace has become an important field for entertainment, consumption, etc. If we associate the cyberspace with geospatial, i.e. linking the online with offline, a mass of user behaviors could be mined, which would be of great help for recommendation system, friendship prediction, urban planning, etc. [1]. The first step for mapping cyberspace to geospatial is to find the regions where users locate. Therefore, how to divide the world, especially urban areas, becomes an important and necessary intermediate link. Traditionally, three methods are widely used for area partition, i.e. address method, rigid method and cluster method. However, these methods all have limitations, which would be introduced in detail as follows.
Address method refers to partitioning areas with physical address, like Starbucks, KFC, etc. In fact, as addresses often reveal user behaviors, like shopping, having dinner, entertainment, etc., this method is commonly used to study either user behavior prediction or product recommendation [2,3,4]. Although achieving high precision, address method has limitation in the scale of regions. And it’s also too difficult for the method to find interactions between two users. For instance, the sparsity of user interactions in Foursquare and Yelp are greater than 99.99% [5]. As a result, it is inappropriate to do researches related to user interactions using this method.
Grid method refers to partitioning areas with gird. The metric for partitioning areas is usually kilometer or degree of latitude and longitude. For instance, Backstrom et al. divided the US into 0.01 \(\times \) 0.01 degree regions (about 0.4 square miles) [6], while Cho et al. discretized the world into 25 \(\times \) 25 km cells [7]. This partitioning method is simple and practicable, and obtained regions could cover an ideal range to find interactions between users. Therefore, it is often used to study user social relationships [8,9,10]. However, its disadvantages are also obvious. The method could not partition areas based on practical significance or independent meaning, and always divides a function unit, e.g. a shop or a store, into two or more separate girds, missing the interaction information of users. Besides, it is hard to set a proper size for girds, since little size may miss interactions, while big size may cover too many user traces and make the extraction of interactions too hard to fulfill.
Cluster method refers to partitioning areas by clustering locations in the form of GPS, and density-based clustering (DBSCAN) is a representative method. Zhou et al. designed a new approach based on DBSCAN, with which arbitrary shapes could be obtained, and areas could be partitioned according to practical significance or independent meaning as a result [11]. Besides, reasonable parameter value is easy to set for DBSCAN when partitioning areas. By this way, DBSCAN outperforms other cluster methods evidently, and is widely used later when dealing with personal locations [12, 13]. However, as DBSCAN works based on density, a cluster would “spread” infinitely when density meets the requirement. As a result, intensive locations in busy urban areas would be clustered into a same group with DBSCAN. Figure 1 shows the hot map of user check-ins in the busy area of Austin, US with the dataset extracted from Gowalla, a location-based social network. Intensive check-ins and locations as shown in the figure, make it difficult to partition the area with DBSCAN. In fact, this area is taken as a region when clustered with 0.001 degree. Therefore, DBSCAN doesn’t work well when dealing with a mass of locations, especially with those in busy urban areas.
In this paper, we propose a method for partitioning busy urban areas with spatial-temporal information in LBSN (Location-Based Social Network). On one hand, arbitrary shapes would be obtained by designing method with density-based clustering, overcoming the stiff of grid method. On the other hand, a boundary is designed ingeniously for limiting the scale of clusters, overcoming the infinite “spread” of DBSCAN. Our approach is validated using real user data and show a good performance.
Our contributions are concluded as follows:
-
We propose a new method, RST-DBSCAN, for partitioning busy urban areas based on spatial-temporal information, with which arbitrary and separated shapes of regions would be obtained reasonably;
-
We propose a new time mapping method to connect temporal information with spatial information. The method makes it possible to partition areas with 3-dimensional spatial-temporal information;
-
We further propose to cluster from the locations which own greater density. In this way, we can get busier regions earlier, making the approach more reasonably;
-
We visually and quantitatively evaluate our approach with a location-based social network, Gowalla. Results show that the performance of RST-DBSCAN outperforms that of competitors.
The rest of this paper is organized as follows. Section 2 introduces the design of RST-DBSCAN. Section 3 describes experiments. Finally, Sect. 4 gives the conclusion.
2 Design of the Model
In this section, we introduce our method in the following steps. Definitions for designing RST-DBSCAN would be introduced at first. Then we introduce the ideas for designing RST-DBSCAN in the view of space and time respectively. Finally, we describe the method with schematic and pseudocode.
2.1 Definitions
DBSCAN is a well-known cluster method and works by clustering points which satisfy the density conditions into a group. Therefore, two important concepts, \(\varepsilon \), the radius of a circle, and MinPts, the minimum number of neighbor points within that circle, are needed [14]. In this section, points are employed to denote locations in 2-dimensional spatial space. Then the circle with radius \(\varepsilon \) (named neighbor region here) means the scale covered by a location, and MinPts means the minimum number of location’s neighbors. Besides, we make definitions for designing RST-DBSCAN as follows.
Definition 1: Candidate core location
If the neighbor region of a location, p, covers at least MinPts locations, p is a candidate core location.
Definition 2: Neighbor location
The neighbors of p, denoted by \(N_\varepsilon (p)\), is defined by
Here, S is the set of all locations, and q is any location in the set.
Definition 3: Core location
The location which starts a clustering process is taken as core location.
Definition 4: Location directly density reachable
For a core location p and a location q, we say that q is directly density reachable from p if q is the neighbor location of p.
Definition 5: Location density reachable
For a location p and a location q, we say q is density reachable from p if there is a chain of locations \(q_1,q_2,...,q_n\), such that \(q_1=q, q_n=p\), and \(q_{i+1}\) is directly density reachable from \(q_i\). Here \(1\le i \le n\).
Definition 6: Core-related location
Assume location n is density reachable from core location p and satisfies the following condition
Then location n is a core-related location for p. Besides, the circle region with radius \( \lambda \cdot \varepsilon \) is called as core-related region of p.
What needs to note is that once a candidate core location is covered by a core location, it becomes a core-related location.
2.2 Area Partition with Spatial Information
For the sake of understanding, we introduce how RST-DBSCAN works with spatial information at first in this section, and then the application of time information would be introduced in the next section.
The schematic of RST-DBSCAN with \( \lambda =2\) is shown in Fig. 2. Here black points denote locations, circles of dotted lines denote the neighbor regions, and the circles of solid line denote core-related regions. RST-DBSCAN decides the core locations based on the number of neighbor locations. As the density around location o is the greatest, we take it as the first core location. Location s, r and t are all density reachable from o, so these locations belong to the same group when clustering with DBSCAN. However, as location t beyond the range of o’s core-related region, it would not belong to the group if clustered by RST-DBSCAN. With core-related regions as the restricted condition, we would obtain separated regions in busy urban areas.
2.3 Area Partition with Temporal Information
In Sect. 2.2, we partition busy urban area by restricted DBSCAN with spatial information in the form of GPS. With the help of temporal information, we could continuously process urban area further. Although intensive check-in is an important feature in busy urban area, the intensity varies a lot over time. Figure 3 shows the hot map of users’ check-ins around the Texas government at different time periods. As can be seen, most of check-ins appear at commercial districts in the daytime, while most of them appear at resident area at night. In more detail, Fig. 4 shows the situation of check-ins in a few blocks away. As can be seen, check-ins are active in shopping mall in the afternoon and at dusk, while more users appear in bars at early morning. Therefore, conclusions can be drawn that users check-ins also reflect their living habits, and area partitioned with temporal information would contain more information.
However, how to use temporal information for partitioning area is still a challenging issue. Previous studies often partition time with hour periods, which is not reasonable enough. For instance, a restaurant is active from 5:10 PM to 20:45 PM, and it is hard to partition this time period with traditional method. Therefore, we bring out a new time-mapping method to connect temporal information with spatial information. The formula of time-mapping is
Here \(\vartheta \) is time-mapping parameter. When \(\vartheta =1\), 1 hour corresponds to \(\varepsilon \) degree in the spatial scale. Then we could deal with temporal information with RST-DBSCAN, with which arbitrary and reasonable time period would be obtained.
2.4 Description of the Model
In this section, we would introduce the design of our approach. The pseudocode of RST-DBSCAN is shown in Algorithm 1. We take the check-in dataset D, the radius of a circle \(\varepsilon \) and the minimum number of neighbors MinPts as input, and obtain location clusters with the algorithm. Here D includes user coordinate of latitude and longitude and check-in time. Specific implementation details of RST-DBSCAN algorithm are as follows.
-
(1)
Initialization
Set the number of clusters as 0. Compute the mapping result of time and obtain 3-dimensional dataset with unprocessed locations, \(\varGamma \). The dataset of candidate core locations \({\varOmega }'\) and a queue \({\varOmega }\) are initialized as empty, respectively. Here we say a location has been processed if it has been clustered into to a group, unprocessed otherwise.
-
(2)
Obtain the set of candidate core locations
Traverse all the locations. Take location x for example. Calculate the number of x’s neighbors, \(\left| N_\varepsilon (x) \right| \), and then confirm whether x belongs to \({\varOmega }'\) or not according to the numerical value of \(\left| N_\varepsilon (x) \right| \) and MinPts.
-
(3)
Sort locations with their number of neighbors
Sort locations in \({\varOmega }'\) in descending order, according to their number of neighbors. Then a queue of candidate core locations \({\varOmega }\) is formed. Location with more neighbors in \({\varOmega }\) would be in more forward position.
-
(4)
Obtain target clusters
Extract the first candidate core location q in \({\varOmega }\). Drop q if it has been processed, otherwise set it as the core location. Find all locations which are unprocessed and density reachable from q, and remove those that beyond the scope of q’ core-related region. Then we would obtain a cluster with q as the center. Mark these locations as processed. Repeat the process above until \({\varOmega }\) is empty, and all locations will be clustered.
With above steps, a region where locations belong to the same cluster is formed at last. Based on practical significance and independent meaning, RST-DBSCAN could partition the area into arbitrary and separated shapes of regions.
3 Experiments
In this section, we carry on two experiments to verify the effectiveness of our approach. One is to partition a busy urban area with RST-DBSCAN. The other is to mine social links with user interactions in the regions obtained by RST-DBSCAN, as area partition plays an important role on social link mining.
3.1 Evaluation Index
We adopt three performance metrices, precision (P), recall (R), F1-measure (\(F_1\)) to estimate the model, which can be calculated as follows. Let TP, TN, FP and FN denote the numbers of true positives, true negatives, false positives and false negatives respectively.
As can be seen, F1-measure is the comprehensive value of precision and recall. Besides, links in social network are typically sparse, so F1-measure plays a more important role than accuracy when evaluating model. As a result, F1-measure is taken as a main performance index here.
3.2 Experiment 1
Dataset
Experiments are performed on a publicly available dataset, which is extracted from a location-based social network namely Gowalla. It collects user check-ins and social links from 2009.2 to 2010.10 [7]. As most check-ins appear in Austin, US, we choose the urban area of Austin, around the government of Texas, as the target. The detailed scope is \(30.262^{\circ }\) N -\(30.270 ^{\circ }\) N, \(97.730 ^{\circ }\) W-\(97.747^{\circ }\) W, and 5,173 users check in 72,131 times at 1,450 different locations.
Comparative Approach
DBSCAN is taken as the comparative approach in this part.
Experimental Setup
We set \(MinPts=1\) for both DBSCAN and RST-DBSCAN. \(\lambda \) is set to 3 for RST-DBSCAN. \(\varepsilon \) is set in the range of 0.005, 0.001, 0.0005.
Evaluation Results
When the value of \(\varepsilon \) varies, the results are shown in Table 1. As can be seen, for \(\varepsilon =0.005\) and 0.001, DBSCAN couldn’t partition this area, while RST-DBSCAN divides the area into 4 and 14 regions respectively. For \(\varepsilon =0.005\), the area could be partitioned by DBSCAN, while the number is still far less than that by RST-DBSCAN. In fact, as 0.001 degree is roughly equal to 0.1 km, smaller size even couldn’t cover a unit like a store or a restaurant. Therefore, 0.001 is nearly the minimum size for rationality, and our approach outperforms DBSCAN substantially.
More visually, Fig. 5 shows 1,450 different locations in this area, represented by red points. When \(\varepsilon =0.001\), these locations belong to a same group clustered by DBSCAN. When clustered with RST-DBSCAN, the area is divided into 14 regions and results are shown in Fig. 6. Here we mark the locations with 14 different colors, and locations with the same color belong to the same group. We draw borders of these groups manually for viewing convenience.
3.3 Experiment 2
Dataset
The dataset extracted from Gowalla is also employed in this experiment. As inadequate information would make adverse effects on the experiment, we select users whose check-ins and friends are all at the top 2% ( i.e. the time of check-ins is above 186, and the number of friends is above 52). There are 761 users which are matched with the conditions and there exists 8,828 links among them.
Comparative Approaches
Address method, Grid method and DBSCAN are taken as comparative approaches in this part.
We obtain regions with these methods at the first step, and compute the interactions of users in these regions to mine unknown social links. Scellatopropose et al. have proved that social links could be mined by user interactions in spatial space and designed several features for mapping relations [15]. We choose CR as the feature in this experiment, which is represented as the number of common regions two persons both check in. We calculate CR and assume there exists social links among users whose CR are at the k top. Recall, precision and F1-measure are taken as the metrics to evaluate our approach.
Experimental Setup
We set \(MinPts=1\), \(\varepsilon =0.001\) and \(\lambda =3\) for RST-DBSCAN. The MinPts and \(\varepsilon \) of DBSCAN is the same for a fair comparison. We set gird method with 0.01\(\times \)0.01 degree, as this is the most effective value and is often used when studying relationships between user interactions in spatial and social link mining.
Evaluation Results
Results are shown in Figs. 7 and 8. As can be seen, recall@k of RST-DBSCAN outperforms the other methods, and the advantage of RST-DBSCAN is more obvious when k is smaller. At the same time, precision@k of RST-DBSCAN also outperform that of Grid and DBSCAN. An interesting phenomenon is that precision@k of Address is the greatest when k is small enough. In fact, it is normal as address is much more precise than other methods, and two persons meet at smaller regions frequently means it is more possible for them to be friends. However, as Address method commonly covers small regions, a mass of interactions between two persons are easily missed, leading a rapid drop of Precision@k when k increases. Therefore, compared to others, Address is not a stable method when mining social links.
Specially, we compute the F1-measure when k = 1, and the values of Address, Grid, DBSCAN, RST-DBSCAN are 40.14%, 44.72%, 42.71%, 46.95% respectively, and the F1-measure of RST-DBSCAN outperforms others’ by 6.81%, 2.24%, 4.24%. As RST-DBSCAN could partition busy urban areas more reasonably, especially for commercial areas, better results would be obtained when mining social links.
4 Conclusion
In this paper, we present a new area partition method, called RST-DBSCAN, for partitioning busy urban areas with spatial-temporal information in LBSN. Our approach is able to divide the areas into arbitrary and separated regions based on practical significance or independent meaning reasonably. Comprehensive experiments are conducted on a real-world dataset, and results show that our approach performs better than competitors.
References
Zheng, Y.: Computing with Spatial Trajectories. Springer, Heidelberg (2011). https://doi.org/10.1007/978-1-4614-1629-6
Liu, Y., Wei, W., Sun, A., Miao, C.: Exploiting geographical neighborhood characteristics for location recommendation, pp. 739–748 (2014)
Lian, D., Ge, Y., Zhang, F., Yuan, N.J.: Content-aware collaborative filtering for location recommendation based on human mobility data, pp. 261–270, (2015)
Liu, Q., Wu, S., Wang, L., Tan, T.: Predicting the next location: a recurrent model with spatial and temporal contexts. In: Thirtieth AAAI Conference on Artificial Intelligence, pp. 194–200 (2016)
Hu, B., Ester, M.: Social topic modeling for point-of-interest recommendation in location-based social networks. In: IEEE International Conference on Data Mining, pp. 845–850 (2014)
Backstrom, L., Sun, E., Marlow, C.: Find me if you can: improving geographical prediction with social and spatial proximity. In: WWW, pp. 61–70 (2010)
Cho, E., Myers, S.A., Leskovec, J.: Friendship and mobility: user movement in location-based social networks. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, pp. 1082–1090, August 2011
Pham, H., Shahabi, C., Liu, Y.: EBM: an entropy-based model to infer social strength from spatiotemporal data. In: ACM Sigmod International Conference on Management of Data (2013)
Zhang, K., Yun, X., Zhang, X.Y., Zhu, X., Chao, L., Wang, S.: Weighted hierarchical geographic information description model for social relation estimation. Neurocomputing 216, 554–560 (2016)
Zhang, X.Y., Zhang, K., Yun, X., Wang, S., Yuan, Q.: Location-based correlation estimation in social network via collaborative learning. In: IEEE INFOCOM 2016 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) (2016)
Zhou, C., Frankowski, D., Ludford, P., Shekhar, S., Terveen, L.: Discovering personal gazetteers: an interactive clustering approach. In: ACM International Workshop on Geographic Information Systems, pp. 266–273 (2004)
Zhou, C., Bhatnagar, N., Shekhar, S., Terveen, L.: Mining personally important places from GPS tracks. In: IEEE International Conference on Data Engineering Workshop, pp. 517–526 (2007)
Killijian, M.O.: Next place prediction using mobility Markov chains. In: The Workshop on Measurement, Privacy, and Mobility, p. 3 (2012)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. China Machine Press, Beijing (2006)
Scellato, S., Noulas, A., Mascolo, C.: Exploiting place features in link prediction on location-based social networks. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2011)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ai, Z., Zhang, K., Wang, S., Li, C., Zhang, Xy., Li, S. (2019). A Novel Partition Method for Busy Urban Area Based on Spatial-Temporal Information. In: Rodrigues, J., et al. Computational Science – ICCS 2019. ICCS 2019. Lecture Notes in Computer Science(), vol 11537. Springer, Cham. https://doi.org/10.1007/978-3-030-22741-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-22741-8_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22740-1
Online ISBN: 978-3-030-22741-8
eBook Packages: Computer ScienceComputer Science (R0)