Time-aware spatial keyword cover query

https://doi.org/10.1016/j.datak.2019.05.004Get rights and content

Abstract

The existing spatial keyword cover query only considers the relevance of the text and the position, and ignores the temporal information of the geospatial objects. In this paper, we define a new query, time-aware spatial keyword cover query (TSKCQ), which takes into account the textual relevance, positional relevance, and temporal information of the objects. A new cost function is proposed in TSKCQ, which is used to evaluate the user’s satisfaction in time and space under the premise of satisfying textual constraints proposed by the user. With this function, an object set with the best cost function value would be returned by TSKCQ. We propose a TR-tree for indexing the temporal and spatial information of objects. Based on this, we propose an exact algorithm to tackle TSKCQ, in which effective pruning strategies are used. Finally, experiments demonstrate the efficiency of the proposed algorithm.

Introduction

With the rapid development of wireless networks and geographical positioning technologies, location-based services are widely used. In recent years, massive amounts of geo-textual objects are becoming available, each of which possesses both a geographical location and a textual description. Spatial keyword queries have been studied extensively [1]. Spatial keyword cover queries are also beginning to receive significant attention from the spatial database research community and the industry [2], [3], [4]. Spatial keyword cover queries focus on finding a group of objects covering all the query keywords and minimizing the inter-objects distance. Due to the remarkable value in practice and the diversification of people’s needs, several variants of spatial keyword cover queries have been studied. In [5], the best keyword cover search was proposed. The availability and importance of keyword rating in object evaluation was observed, so when looking for a set of objects covering all query keywords, the search considers two factors: the inter-objects distance and the keyword rating of objects. In [6], a query for finding the minimum spatial keyword cover was proposed. When searching for a set of objects covering all query keywords, the query takes into account inter-objects distance and the number of the objects.

We observe that geo-tagged objects are not always valid and temporal information plays an important role in spatial keyword cover query. To our knowledge, temporal information in spatial keyword cover query has never been considered. This paper proposes a new form of spatial keyword cover query, i.e., time-aware spatial keyword cover query (TSKCQ), which considers three factors: positional relevance, textual relevance, and valid time of objects. TSKCQ meets the needs of users to some extent. Due to the consideration of objects’ temporal information, the solution of TSKCQ can be very different from that of spatial keyword cover query. An example is illustrated in Fig. 1. Visitors may plan their trips according to the opening time of attractions. Suppose a tourist would like to visit a “museum” from 7:00 to 8:00, visit a “cafeteria” from 10:00 to 12:00, and visit a “zoo” from 14:00 to 17:00. Spatial keyword cover query returns {o1,o2,o3}, since it considers the distance between the objects only. TSKCQ returns {o4, o2, o3}, because it considers not only the distance between objects, but also the valid time of objects.

In this paper, we define a new cost function, and create an efficient index structure, i.e., Time-aware R-tree (TR-tree), which indexes the objects of the spatial dataset. After that, we propose an exact algorithm called Principal-keyword based Query Algorithm (PQA) that can tackle TSKCQ. In PQA, the principal query keyword is firstly selected from the query keywords. A query keyword with the minimum number of objects is the principal query keyword, and the objects associated with the principal query keyword are principal objects. For each principal object, the local optimal solution (LOS) is computed. The LOS with the highest cost function value is the final result. With the LOS and the current best solution, effective pruning strategies are proposed, which greatly improve the efficiency of the query. We get the Best keyword Cover Algorithm (BCA) by modifying the exact algorithm in [5] to tackle TSKCQ. Compared to BCA, the query time of PQA is shorter.

To summarize, the main contributions of this paper are:

(1) We formally define the problem of TSKCQ and propose a new cost function for it.

(2) We design an index structure, called TR-tree, and propose effective pruning strategies and an exact algorithm to tackle TSKCQ.

(3) We conduct extensive experiments using the data sets to demonstrate the efficiency of our algorithm.

The rest of this paper is organized as follows. Section 2 discusses related work. The problem is defined in Section 3. Section 4 introduces the TR-tree index structure. Section 5 elaborates the exact algorithm for answering TSKCQ. Then, Section 6 reports the experimental results. Finally, we conclude our paper in Section 7.

Section snippets

Spatial keyword query with a query location

Early spatial keyword query with a query location mainly retrieves individual objects that are related to the query keywords and are close to the query location [7], [8], [9], [10], [11]. Due to the diversification of actual application requirements, variants of spatial keyword query have gradually emerged. Li et al. [12] proposed the direction-based spatial keyword search, which takes as arguments a spatial point, a direction, and a set of keywords. It finds k nearest neighbors of the query,

Problem definition

In a spatial dataset, each object may be associated with one or multiple keywords. We convert the object with multiple keywords into multiple objects in the same location, so that for each object there is only one keyword. An object o is in the form (id, l, k, t), where id is the unique identifier of o, l indicates the location of the object o in a two-dimensional geographical space, k represents the keyword of o, and t is the valid time of o in the form of (st, et), with st and et being the

TR-tree index structure

R-tree [25], one of the most popular spatial index structures, is used to index all the spatial objects. R-tree is widely used and easy to be extended, therefore, to process TSKCQ, we augment R-tree with one additional dimension to index valid time of objects. In this work, a three-dimensional R-tree, called Time-aware R-tree (TR-tree for short), is used. We now detail the non-leaf node and leaf node of the TR-tree:

Non-leaf nodes of the TR-tree contain entries of the form N(ptrs, mbr, UT), where

Pruning strategies

Given a TSKCQ Q = (K, T, UDist), we use the principal query keyword to find the LOS for each principal object, and the TSKCQ returns the LOS with the largest Φ K as the result.

Definition 5 Principal Query Keyword

Given a TSKCQ Q = (K, T, UDist), a principal query keyword kp is selected from K according to a certain scheme, and the remaining query keywords in K are referred to as non-principal query keywords. Objects with kp are called principal objects, and objects with non-principal query keywords are called non-principal

Experiments

In this section, we evaluate three algorithms, namely PQA, PQA algorithm without pruning strategy (no-PQA), and Best keyword Cover Algorithm (BCA).

PQA: The query algorithm is proposed in this paper.

no-PQA: The algorithm is obtained by removing the pruning strategies for the PQA algorithm.

BCA: The algorithm is obtained by modifying the algorithms in [5] to tackle TSKCQ. The specific modifications are as follows: (1) Rating of objects is replaced by valid time of objects; (2) KRR*-tree is

Conclusions

We present a new type of query in this paper, namely, time-aware spatial keyword cover query (TSKCQ). Compared with the spatial keyword cover queries, TSKCQ not only considers the textual information and temporal information of geospatial objects, but also considers the valid time of the objects, which satisfies the user’s needs to a certain extent. In TSKCQ, we define a new cost function and create the TR-tree index structure, and propose effective pruning strategies. We also propose an exact

Acknowledgment

This work was supported by the Key Research and Development Program of Hebei Province, China (No. 18270307D).

Declaration of competing interest

None.

Zijun Chen ( [email protected]) received the BS degree from the Northeast Heavy Machinery Institute, China, the MS degree from Yanshan University, and the Ph.D. degree from Fudan University in 2002, all in computer science. Since 1995, he has been with the School of Information Science and Engineering, Yanshan University, Qinhuangdao, China, where he is currently a professor. His current research interests include moving object databases and spatio-temporal databases.

References (28)

  • CaryA. et al.

    Efficient and scalable method for processing top-k spatial Boolean queries

  • LiZ. et al.

    IR-Tree: An efficient index for geographic document search

    IEEE Trans. Knowl. Data Eng.

    (2010)
  • Rocha-JuniorJ.B. et al.

    Efficient processing of top-k spatial keyword queries

  • TaoY. et al.

    Fast nearest neighbor search with keywords

    IEEE Trans. Knowl. Data Eng.

    (2014)
  • Zijun Chen ( [email protected]) received the BS degree from the Northeast Heavy Machinery Institute, China, the MS degree from Yanshan University, and the Ph.D. degree from Fudan University in 2002, all in computer science. Since 1995, he has been with the School of Information Science and Engineering, Yanshan University, Qinhuangdao, China, where he is currently a professor. His current research interests include moving object databases and spatio-temporal databases.

    Tingting Zhao ( [email protected]) received the BS degree in computer science and technology from North China University of Science and Technology, China, in 2016. She is currently working toward the MS degree in the School of Information Science and Engineering, Yanshan University, China. Her research interest includes spatio-temporal databases.

    Wenyuan Liu ( [email protected]) received the BS and MS degrees from the Northeast Heavy Machinery Institute, China, and the PhD degree from the Harbin Institute of Technology in 2000, all in computer science. Since 1996, he has been with the School of Information Science and Engineering, Yanshan University, Qinhuangdao, China, where he is currently a professor. He is also a special Invited Researcher with the IoT of TNlist, Tsinghua University. His research interests include wireless sensor networks and mobile networks.

    View full text