Elsevier

Information Sciences

Volume 330, 10 February 2016, Pages 274-292
Information Sciences

Reverse k-nearest neighbor search in the presence of obstacles

https://doi.org/10.1016/j.ins.2015.10.022Get rights and content

Abstract

In this paper, we study a new form of reverse nearest neighbor (RNN) queries, i.e., obstructed reverse nearest neighbor (ORNN) search. It considers the impact of obstacles on the distance between objects, which is ignored by the existing work on RNN retrieval. Given a data set P, an obstacle set O, and a query point q in a two-dimensional space, an ORNN query finds from P, all the points/objects that have q as their nearest neighbor, according to the obstructed distance metric, i.e., the length of the shortest path between two points without crossing any obstacle. We formalize ORNN search, develop effective pruning heuristics (via introducing a novel concept of boundary region), and propose efficient algorithms for ORNN query processing assuming that both P and O are indexed by traditional data-partitioning indexes (e.g., R-trees). In addition, several interesting variations of ORNN queries, namely, obstructed reverse k-nearest neighbor (ORkNN) search, ORkNN search with maximum obstructed distance δ (δ-ORkNN), and constrained ORkNN (CORkNN) search, have been introduced, and they can be tackled by extending the ORNN query techniques, which demonstrates the flexibility of the proposed ORNN query algorithm. Extensive experimental evaluation using both real and synthetic data sets verifies the effectiveness of pruning heuristics and the performance of algorithms, respectively.

Introduction

Given a multi-dimensional data set P and a query point q, a reverse nearest neighbor (RNN) query retrieves all the points in P that have q as their nearest neighbor (NN). Due to its wide application base such as decision support [15], profile-based marketing [15], [26], and resource allocation [15], [35], RNN is one of the most popular variants of NN queries [7], [12], [14], [17], [20]. Formally, RNN(q) = {pP | qNN(p)}, in which RNN(q) represents the set of reverse nearest neighbors to q and NN(p) denotes the NN of a point pP. Consider an example in Fig. 1a, where the data set P consists of three data points (i.e., p1, p2, p3) in a two-dimensional (2D) space. Each point pi (1 ≤ i ≤ 3) is associated with a vicinity circle/arc cir(pi, r) centered at pi and having r = dist(pi, NN(pi)) as its radius, i.e., the vicinity circle/arc cir(pi, r) covers the NN of pi. Here, dist() refers to a specified distance metric. As shown in Fig. 1a, for the RNN query issued at a point q that uses the Euclidean distance as the distance metric, its result set RNN(q) = {p1}, since q is only inside p1's vicinity arc cir(p1, dist(p1, p2)).

RNN search has been well studied, and many efficient algorithms have been proposed to support RNN query and its variants. A short review of some representative algorithms will be presented in Section 2.1. Existing algorithms employ either the Euclidean distance in a Euclidean space or the network distance in a road network to measure the proximity between objects. To the best of our knowledge, all those algorithms do not take into account the existence of obstacles (e.g., buildings and blindage). However, obstacles are ubiquitous in the real world, and their existence may change the distance between objects, and hence affect the final query result. Recently, the impact of obstacles on various research problems has attracted much attention from academy [10], [11], [13], [19], [23], [32], [34], [39], and many work have been conducted, by taking the influence of obstacles into consideration. For example, the spatial clustering in the presence of obstacles (e.g., COD_CLARANS [29], DBRS_O [30], DBCLuC [38], etc.) is a new research direction for the data mining community formed by considering the impact of obstacles on spatial clustering.

In this paper, we study the impact of obstacles on RNN retrieval in a Euclidean space, and form a new type of RNN queries, namely, obstructed reverse nearest neighbor (ORNN) search. Given a data set P, an obstacle set O, and a query point q in a 2D space, an ORNN query finds from P, all the points that take q as their NN, according to the obstructed distance, i.e., the distance/length of the shortest path that connects two points without crossing any obstacle. An example is depicted in Fig. 1b, where P = {p1, p2, p3} and O = {o1, o2}. To simplify the discussion in this paper, we assume that obstacles are in rectangular shapes, although they could be in any other shape as well.

Let ||pi, q|| be the obstructed distance from a point pi to q, and ONN(pi) be the obstructed nearest neighbor (ONN) of pi that has the smallest obstructed distance to pi compared with other points. We associate each point piP with (i) an arc arc(pi, ||pi, ONN(pi)||) centered at pi and with radius ||pi, ONN(pi)||, and (ii) its obstructed path to q. For instance, the arc arc(p3, ||p3, p2||) centered at p3 and having ||p3, p2|| as the radius indicates that p2 is the ONN to p3, and the straight line from p3 to q denotes the obstructed path between them without crossing any obstacle. It is observed that ||p3, q|| < ||p3, ONN(p3) = p2|| and ||p2, q|| < ||p2, ONN(p2) = p3||, and thus, q is the ONN to both p2 and p3, i.e., q's ORNN set ORNN(q) = {p2, p3}. Note that, p1 is the RNN of q in a Euclidean space (see Fig. 1a), but it is not the ORNN of q in an obstructed space due to the block of obstacle o1.

We focus on ORNN search because, it is not only a challenging problem from the research point of view, but also very useful in many applications. As an example, suppose KFC plans to open a new restaurant and wants to distribute coupons to its potential customers for promotion. Assume that there are some buildings and parks (i.e., obstacles) around the new restaurant, and customers who have the new restaurant as their obstructed nearest restaurant are more likely to visit. Consequently, in order to ensure the effectiveness of the promotion, KFC needs to identify the persons that take the new restaurant as their obstructed nearest restaurant, and distribute coupons to them. In addition, due to the ubiquity of obstacles, the ORNN query is obviously important, as a stand-alone tool or a stepping stone, in location-based services, geographic information systems, and complex spatial data analysis/mining involving obstacles.

In addition to the ORNN query, we also study several interesting variations, i.e., (1) obstructed reverse k-nearest neighbor (ORkNN) search, which retrieves all the points in the dataset P that take a given query point q as one of their obstructed k-nearest neighbors (OkNN); (2) ORkNN retrieval with an obstructed distance threshold δ (δ-ORkNN), which finds the ORkNN points that has the obstructed distances to q bounded by a pre-defined threshold δ; and (3) constrained ORkNN (CORkNN) search, which returns the ORkNN points in a specified restricted area (defined by the spatial region constraints).

In this paper, we present an efficient solution to tackle the ORNN query, which follows a filter-refinement framework and does not require any pre-processing. Moreover, we extend ORNN query algorithm to efficiently handle ORkNN, δ-ORkNN, and CORkNN queries, respectively. In brief, the key contributions of the paper are summarized as follows:

  • We formalize ORNN search, a new addition to the family of spatial queries in the presence of obstacles.

  • We introduce a new concept of boundary region to facilitate the pruning of unqualified data points and node entries.

  • We develop efficient algorithms to answer exact or approximate ORNN retrieval.

  • We extend ORNN query techniques to handle several variations of ORNN queries, i.e., ORkNN search, δ-ORkNN search, and CORkNN search.

  • We conduct extensive experiments with both real and synthetic data sets to verify the effectiveness of the presented pruning heuristics and the performance of the proposed algorithms.

Note that, this paper extends our preliminary work [9] in several substantial ways. First, we investigate three new ORNN query variants, i.e., ORkNN, δ-ORkNN, and CORkNN queries. Second, we conduct a more comprehensive performance evaluation which incorporates the new classes of queries. Third, we present a more complete review of the related work and more illustrative examples, to make the paper self-contained.

The rest of this paper is organized as follows. Section 2 reviews related work. Section 3 formulates the ORNN query. Section 4 discusses pruning heuristics. Section 5 presents ODC and BRF Algorithms. Section 6 elaborates algorithms for processing ORNN search. Section 7 extends ORNN query solution to tackle several ORNN query variants. Considerable experimental results and our findings are reported in Section 8. Finally, Section 9 concludes the paper with some directions for future work.

Section snippets

Related work

In this section, we overview the existing work related to ORNN retrieval, including RNN search, spatial queries with obstacles, and main-memory obstacle path problems.

Problem formulation

In this section, we formally define the ORNN query, the focus of this paper. Table 1 summarizes the notations used frequently throughout the paper.

Definition 3.1

(Visibility [11]). Given two points p, p′ in a data set P and an obstacle set O, p and p′ are visible to each other iff there is no any obstacle o in O such that the line segment formed by p and p′, denoted as [p, p′], crosses o.

Definition 3.2

(Obstacle-free Path [10]). Given two points p, p′ in a data set P and an obstacle set O, a path P(p, p′) = {v0, v1, v2,

Pruning heuristics

Before presenting the pruning heuristics for ORNN retrieval, we introduce some concepts that can be used to the development of effective pruning strategies.

Definition 4.1

(Point Angle). Given a point p and a query point q, let q be the origin. The amount of rotation (in anti-clockwise direction) about q required to bring the x-axis into correspondence with the line segment [q, p] is defined as p's point angle w.r.t. q, denoted by θp (∈ [0, 2π)).

Definition 4.2

(Boundary Vertex, Boundary Vertex Set). Given a point p, a

ODC and BRF Algorithms

In order to enable the boundary region based pruning, there are two issues we have to address, i.e., (i) obstructed distance computation and (ii) boundary region formation. In what follows, we present corresponding solutions.

ORNN query processing

In this section, we explain how to process ORNN search efficiently. Algorithm 4 shows the pseudo-code of the ORNN Search Algorithm (ORNN). It follows a filter-refinement framework, assuming that the data set P and the obstacle set O are indexed by two different R-trees. Specifically, the filtering step prunes unqualified data points and node MBRs using the currently identified boundary regions, and obtains a candidate set Sc which is a superset of the final query result set; the subsequent

Extensions

In this section, we extend our techniques to tackle several interesting ORNN query variants, i.e., ORkNN, δ-ORkNN, and CORkNN queries.

Experimental evaluation

In this section, we experimentally evaluate the effectiveness of the developed pruning heuristics and the performance of the proposed algorithms for ORNN search and its variants, using both real and synthetic datasets. All the algorithms were implemented in C++, and all experiments were conducted on an Intel Core 2 Duo 2.93 GHz PC with 3GB RAM.

Conclusions

This paper, for the first time, identifies and solves a new type of RNN queries, namely, obstructed reverse nearest neighbor (ORNN) search, which considers the impact of obstacles on the distances between objects. The ORNN query is not only interesting from a research point of view, but also useful in many decision support applications involving spatial data and physical obstacles. We carry out a systematic study of ORNN retrieval. We carefully formalize the problem, develop effective pruning

Acknowledgments

We would like to thank Jun Zhang for providing us the source codes proposed in [39]. This work was supported in part by the 973 Program no. 2015CB352502, NSFC Grants no. 61522208, 61379033 and 61472348, and the Fundamental Research Funds for the Central Universities under Grant no. 2015XZZX005-07.

References (39)

  • M.D. Berg et al.

    Computational Geometry: Algorithms and Applications

    (2000)
  • M.A. Cheema et al.

    Probabilistic reverse nearest neighbor queries on uncertain data

    IEEE Trans. Knowl. Data Eng.

    (2010)
  • M.A. Cheema et al.

    Lazy updates: an efficient technique to continuously monitoring reverse kNN

  • E. Dijkstra

    A note on two problems in connexion with graphs

    Numer. Math.

    (1959)
  • Y. Gao et al.

    On efficient obstructed reverse nearest neighbor query processing

  • Y. Gao et al.

    Continuous nearest-neighbor search in the presence of obstacles

    ACM Trans. Database Syst.

    (2011)
  • Y. Gao et al.

    Visible reverse k-nearest neighbor query processing in spatial databases

    IEEE Trans. Knowl. Data Eng.

    (2009)
  • Y. Gao et al.

    Continuous visible nearest neighbor query processing in spatial databases

    VLDB J.

    (2011)
  • F. Korn et al.

    Influence sets based on reverse nearest neighbor queries

  • Cited by (0)

    This paper is an extended version of the conference paper, titled “On Efficient Obstructed Reverse Nearest Neighbor Query Processing”, which has been published in the Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2011), November 1–4, 2011, Chicago, IL, USA. Specifically, the paper extends the conference paper by including (i) additional three interesting variants of ORNN queries, i.e., ORkNN search (Section 7.1), δ-ORkNN retrieval (Section 7.2), and CORkNN search (Section 7.2); (ii) enhanced experimental evaluation that incorporates the new classes of queries (Section 8); and (iii) more complete and informative related work (Section 2), more pseudo-codes, more illustrative examples, and more analyzes. More details concerning this paper's extension have also been pointed out explicitly in Section 1 of the paper.

    View full text