Reverse k-nearest neighbor search in the presence of obstacles☆
Introduction
Given a multi-dimensional data set P and a query point q, a reverse nearest neighbor (RNN) query retrieves all the points in P that have q as their nearest neighbor (NN). Due to its wide application base such as decision support [15], profile-based marketing [15], [26], and resource allocation [15], [35], RNN is one of the most popular variants of NN queries [7], [12], [14], [17], [20]. Formally, RNN(q) = {p ∈ P | q ∈ NN(p)}, in which RNN(q) represents the set of reverse nearest neighbors to q and NN(p) denotes the NN of a point p ∈ P. Consider an example in Fig. 1a, where the data set P consists of three data points (i.e., p1, p2, p3) in a two-dimensional (2D) space. Each point pi (1 ≤ i ≤ 3) is associated with a vicinity circle/arc cir(pi, r) centered at pi and having r = dist(pi, NN(pi)) as its radius, i.e., the vicinity circle/arc cir(pi, r) covers the NN of pi. Here, dist() refers to a specified distance metric. As shown in Fig. 1a, for the RNN query issued at a point q that uses the Euclidean distance as the distance metric, its result set RNN(q) = {p1}, since q is only inside p1's vicinity arc cir(p1, dist(p1, p2)).
RNN search has been well studied, and many efficient algorithms have been proposed to support RNN query and its variants. A short review of some representative algorithms will be presented in Section 2.1. Existing algorithms employ either the Euclidean distance in a Euclidean space or the network distance in a road network to measure the proximity between objects. To the best of our knowledge, all those algorithms do not take into account the existence of obstacles (e.g., buildings and blindage). However, obstacles are ubiquitous in the real world, and their existence may change the distance between objects, and hence affect the final query result. Recently, the impact of obstacles on various research problems has attracted much attention from academy [10], [11], [13], [19], [23], [32], [34], [39], and many work have been conducted, by taking the influence of obstacles into consideration. For example, the spatial clustering in the presence of obstacles (e.g., COD_CLARANS [29], DBRS_O [30], DBCLuC [38], etc.) is a new research direction for the data mining community formed by considering the impact of obstacles on spatial clustering.
In this paper, we study the impact of obstacles on RNN retrieval in a Euclidean space, and form a new type of RNN queries, namely, obstructed reverse nearest neighbor (ORNN) search. Given a data set P, an obstacle set O, and a query point q in a 2D space, an ORNN query finds from P, all the points that take q as their NN, according to the obstructed distance, i.e., the distance/length of the shortest path that connects two points without crossing any obstacle. An example is depicted in Fig. 1b, where P = {p1, p2, p3} and O = {o1, o2}. To simplify the discussion in this paper, we assume that obstacles are in rectangular shapes, although they could be in any other shape as well.
Let ||pi, q|| be the obstructed distance from a point pi to q, and ONN(pi) be the obstructed nearest neighbor (ONN) of pi that has the smallest obstructed distance to pi compared with other points. We associate each point pi ∈ P with (i) an arc arc(pi, ||pi, ONN(pi)||) centered at pi and with radius ||pi, ONN(pi)||, and (ii) its obstructed path to q. For instance, the arc arc(p3, ||p3, p2||) centered at p3 and having ||p3, p2|| as the radius indicates that p2 is the ONN to p3, and the straight line from p3 to q denotes the obstructed path between them without crossing any obstacle. It is observed that ||p3, q|| < ||p3, ONN(p3) = p2|| and ||p2, q|| < ||p2, ONN(p2) = p3||, and thus, q is the ONN to both p2 and p3, i.e., q's ORNN set ORNN(q) = {p2, p3}. Note that, p1 is the RNN of q in a Euclidean space (see Fig. 1a), but it is not the ORNN of q in an obstructed space due to the block of obstacle o1.
We focus on ORNN search because, it is not only a challenging problem from the research point of view, but also very useful in many applications. As an example, suppose KFC plans to open a new restaurant and wants to distribute coupons to its potential customers for promotion. Assume that there are some buildings and parks (i.e., obstacles) around the new restaurant, and customers who have the new restaurant as their obstructed nearest restaurant are more likely to visit. Consequently, in order to ensure the effectiveness of the promotion, KFC needs to identify the persons that take the new restaurant as their obstructed nearest restaurant, and distribute coupons to them. In addition, due to the ubiquity of obstacles, the ORNN query is obviously important, as a stand-alone tool or a stepping stone, in location-based services, geographic information systems, and complex spatial data analysis/mining involving obstacles.
In addition to the ORNN query, we also study several interesting variations, i.e., (1) obstructed reverse k-nearest neighbor (ORkNN) search, which retrieves all the points in the dataset P that take a given query point q as one of their obstructed k-nearest neighbors (OkNN); (2) ORkNN retrieval with an obstructed distance threshold δ (δ-ORkNN), which finds the ORkNN points that has the obstructed distances to q bounded by a pre-defined threshold δ; and (3) constrained ORkNN (CORkNN) search, which returns the ORkNN points in a specified restricted area (defined by the spatial region constraints).
In this paper, we present an efficient solution to tackle the ORNN query, which follows a filter-refinement framework and does not require any pre-processing. Moreover, we extend ORNN query algorithm to efficiently handle ORkNN, δ-ORkNN, and CORkNN queries, respectively. In brief, the key contributions of the paper are summarized as follows:
- •
We formalize ORNN search, a new addition to the family of spatial queries in the presence of obstacles.
- •
We introduce a new concept of boundary region to facilitate the pruning of unqualified data points and node entries.
- •
We develop efficient algorithms to answer exact or approximate ORNN retrieval.
- •
We extend ORNN query techniques to handle several variations of ORNN queries, i.e., ORkNN search, δ-ORkNN search, and CORkNN search.
- •
We conduct extensive experiments with both real and synthetic data sets to verify the effectiveness of the presented pruning heuristics and the performance of the proposed algorithms.
Note that, this paper extends our preliminary work [9] in several substantial ways. First, we investigate three new ORNN query variants, i.e., ORkNN, δ-ORkNN, and CORkNN queries. Second, we conduct a more comprehensive performance evaluation which incorporates the new classes of queries. Third, we present a more complete review of the related work and more illustrative examples, to make the paper self-contained.
The rest of this paper is organized as follows. Section 2 reviews related work. Section 3 formulates the ORNN query. Section 4 discusses pruning heuristics. Section 5 presents ODC and BRF Algorithms. Section 6 elaborates algorithms for processing ORNN search. Section 7 extends ORNN query solution to tackle several ORNN query variants. Considerable experimental results and our findings are reported in Section 8. Finally, Section 9 concludes the paper with some directions for future work.
Section snippets
Related work
In this section, we overview the existing work related to ORNN retrieval, including RNN search, spatial queries with obstacles, and main-memory obstacle path problems.
Problem formulation
In this section, we formally define the ORNN query, the focus of this paper. Table 1 summarizes the notations used frequently throughout the paper.
Definition 3.1 (Visibility [11]). Given two points p, p′ in a data set P and an obstacle set O, p and p′ are visible to each other iff there is no any obstacle o in O such that the line segment formed by p and p′, denoted as [p, p′], crosses o.
Definition 3.2 (Obstacle-free Path [10]). Given two points p, p′ in a data set P and an obstacle set O, a path P(p, p′) = {v0, v1, v2,
Pruning heuristics
Before presenting the pruning heuristics for ORNN retrieval, we introduce some concepts that can be used to the development of effective pruning strategies.
Definition 4.1 (Point Angle). Given a point p and a query point q, let q be the origin. The amount of rotation (in anti-clockwise direction) about q required to bring the x-axis into correspondence with the line segment [q, p] is defined as p's point angle w.r.t. q, denoted by θp (∈ [0, 2π)).
Definition 4.2 (Boundary Vertex, Boundary Vertex Set). Given a point p, a
ODC and BRF Algorithms
In order to enable the boundary region based pruning, there are two issues we have to address, i.e., (i) obstructed distance computation and (ii) boundary region formation. In what follows, we present corresponding solutions.
ORNN query processing
In this section, we explain how to process ORNN search efficiently. Algorithm 4 shows the pseudo-code of the ORNN Search Algorithm (ORNN). It follows a filter-refinement framework, assuming that the data set P and the obstacle set O are indexed by two different R-trees. Specifically, the filtering step prunes unqualified data points and node MBRs using the currently identified boundary regions, and obtains a candidate set Sc which is a superset of the final query result set; the subsequent
Extensions
In this section, we extend our techniques to tackle several interesting ORNN query variants, i.e., ORkNN, δ-ORkNN, and CORkNN queries.
Experimental evaluation
In this section, we experimentally evaluate the effectiveness of the developed pruning heuristics and the performance of the proposed algorithms for ORNN search and its variants, using both real and synthetic datasets. All the algorithms were implemented in C++, and all experiments were conducted on an Intel Core 2 Duo 2.93 GHz PC with 3GB RAM.
Conclusions
This paper, for the first time, identifies and solves a new type of RNN queries, namely, obstructed reverse nearest neighbor (ORNN) search, which considers the impact of obstacles on the distances between objects. The ORNN query is not only interesting from a research point of view, but also useful in many decision support applications involving spatial data and physical obstacles. We carry out a systematic study of ORNN retrieval. We carefully formalize the problem, develop effective pruning
Acknowledgments
We would like to thank Jun Zhang for providing us the source codes proposed in [39]. This work was supported in part by the 973 Program no. 2015CB352502, NSFC Grants no. 61522208, 61379033 and 61472348, and the Fundamental Research Funds for the Central Universities under Grant no. 2015XZZX005-07.
References (39)
- et al.
A concurrency control algorithm for nearest neighbor query
Inf. Sci.
(1999) - et al.
Efficient mutual nearest neighbor query processing for moving object trajectories
Inf. Sci.
(2010) - et al.
Processing generalized k-nearest neighbor queries on a wireless broadcast stream
Inf. Sci.
(2012) - et al.
Reverse nearest neighbor aggregates over data streams
- et al.
Moving range k nearest neighbor queries with quality guarantee over uncertain moving objects
Inf. Sci.
(2015) - et al.
Network voronoi diagram on uncertain objects for nearest neighbor queries
Inf. Sci.
(2015) - et al.
Reverse kNN search in arbitrary dimensionality
- et al.
Efficient reverse k-nearest neighbor search in arbitrary metric spaces
- et al.
The R*-tree: an efficient and robust access method for points and rectangles
- et al.
Nearest and reverse nearest neighbor queries for moving objects
VLDB J.
(2006)
Computational Geometry: Algorithms and Applications
Probabilistic reverse nearest neighbor queries on uncertain data
IEEE Trans. Knowl. Data Eng.
Lazy updates: an efficient technique to continuously monitoring reverse kNN
A note on two problems in connexion with graphs
Numer. Math.
On efficient obstructed reverse nearest neighbor query processing
Continuous nearest-neighbor search in the presence of obstacles
ACM Trans. Database Syst.
Visible reverse k-nearest neighbor query processing in spatial databases
IEEE Trans. Knowl. Data Eng.
Continuous visible nearest neighbor query processing in spatial databases
VLDB J.
Influence sets based on reverse nearest neighbor queries
Cited by (0)
- ☆
This paper is an extended version of the conference paper, titled “On Efficient Obstructed Reverse Nearest Neighbor Query Processing”, which has been published in the Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2011), November 1–4, 2011, Chicago, IL, USA. Specifically, the paper extends the conference paper by including (i) additional three interesting variants of ORNN queries, i.e., ORkNN search (Section 7.1), δ-ORkNN retrieval (Section 7.2), and CORkNN search (Section 7.2); (ii) enhanced experimental evaluation that incorporates the new classes of queries (Section 8); and (iii) more complete and informative related work (Section 2), more pseudo-codes, more illustrative examples, and more analyzes. More details concerning this paper's extension have also been pointed out explicitly in Section 1 of the paper.