Abstract
In this paper, we propose a new type of query that augments the spatial keyword search with an additional boolean expression constraint. The query is issued against a corpus of structured or semi-structured spatial entities and is very useful in applications like mobile search and targeted location-aware advertising. We devise three types of indexing and filtering strategies. First, we utilize the hybrid \(\hbox {IR}^2\)-tree and propose a novel hashing scheme for efficient pruning. Second, we propose an inverted index-based solution, named BE-Inv, that is more cache concious and exhibits great pruning power for boolean expression matching. Our third method, named SKB-Inv, adopts a novel two-level partitioning scheme to organize the spatial entities into inverted lists and effectively facilitate the pruning in the spatial, textual, and boolean expression dimensions. In addition, we propose an adaptive query processing strategy that takes into account the selectivity of query keywords and predicates for early termination. We conduct our experiments using two real datasets with 3.5 million Foursquare venues and 50 million Twitter geo-profiles. The results show that the methods based on inverted index are superior to the hybrid \(\hbox {IR}^2\)-tree; and SKB-Inv achieves the best performance.
















Similar content being viewed by others
Notes
The location information can be naturally derived from the GPS of smartphones.
With the prevalence of large-memory machines, there is a trend of designing memory-resident indexes for efficient information retrieval. For instance, Twitter’s EarlyBird system [2] is designed to be in-memory to support real-time keyword search. Our paper follows the trend and assumes that the index can be accommodated in memory.
References
Ahmed, J., Siyal, M.Y., Najam, S., Najam, Z.: Challenges and Issues in Modern Computer Architectures. Springer, Singapore, pp. 23–29 (2017). https://doi.org/10.1007/978-981-10-3120-5_3
Asadi, N., Lin, J.: Fast candidate generation for real-time tweet search with bloom filter chains. ACM Trans. Inf. Syst., p 13 (2013)
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Garcia-Molina, H., Jagadish, H.V. (eds.), pp. 322–331. SIGMOD, ACM Press (1990)
Chakrabarti, K., Chaudhuri, S., Cheng, T., Xin, D.: A framework for robust discovery of entity synonyms. In: KDD, pp 1384–1392 (2012). https://doi.org/10.1145/2339530.2339743
Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: an experimental evaluation. PVLDB 6(3), 217–228 (2013)
Chen, Y.Y., Suel, T., Markowetz, A.: Efficient query processing in geographic web search engines. In: SIGMOD, pp. 277–288 (2006)
Cheng, T., Lauw, H.W., Paparizos, S.: Fuzzy matching of web queries to structured data. In: ICDE, pp. 713–716 (2010) https://doi.org/10.1109/ICDE.2010.5447817
Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. PVLDB 2(1), 337–348 (2009)
Ding, B., König, A.C.: Fast set intersection in memory. PVLDB 4(4), 255–266 (2011)
Fabret, F., Jacobsen, H.A., Llirbat, F., Pereira, J., Ross, K.A., Shasha, D.: Filtering algorithms and implementation for very fast publish/subscribe. In: SIGMOD, pp. 115–126 (2001)
Faloutsos, C., Christodoulakis, S.: Signature files: an access method for documents and its analytical performance evaluation. ACM Trans. Inf. Syst. 2(4), 267–288 (1984)
Felipe, I.D., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: ICDE, pp. 656–665 (2008)
Finkel, R.A., Bentley, J.L.: Quad trees: a data structure for retrieval on composite keys. Acta Inform. 4, 1–9 (1974)
Fontoura, M., Josifovski, V., Kumar, R., Olston, C., Tomkins, A., Vassilvitskii, S.: Relaxation in text search using taxonomies. PVLDB 1(1), 672–683 (2008)
Gaede, V., Günther, O.: Multidimensional access methods. ACM Comput. Surv. 30(2), 170–231 (1998)
Gargantini, I.: An effective way to represent quadtrees. Commun. ACM 25(12), 905–910 (1982)
Hariharan, R., Hore, B., Li, C., Mehrotra, S.: Processing spatial-keyword (sk) queries in geographic information retrieval (gir) systems. In: SSDBM, p. 16 (2007)
Kolahdouzan, M.R., Shahabi, C.: Voronoi-based \(k\) nearest neighbor search for spatial network databases. In: VLDB, pp 840–851 (2004)
Lee, T., Park, J., Lee, S., Hwang, S., Elnikety, S., He, Y.: Processing and optimizing main memory spatial-keyword queries. PVLDB 9(3), 132–143 (2015)
Nievergelt, J., Hinterberger, H., Sevcik, K.C.: The grid file: an adaptable, symmetric multikey file structure. ACM Trans. Database Syst. 9(1), 38–71 (1984)
Papadias, D., Kalnis, P., Zhang, J., Tao, Y.: Efficient olap operations in spatial data warehouses. In: SSTD, pp. 443–459 (2001)
Parameswaran, A.G., Kaushik, R., Arasu, A.: Efficient parsing-based search over structured data. In: CIKM, pp. 49–58 (2013). https://doi.org/10.1145/2505515.2505764
Rocha-Junior, J.B., Nørvåg, K.: Top-k spatial keyword queries on road networks. In: EDBT, pp. 168–179 (2012)
Rocha-Junior, J.B., Gkorgkas, O., Jonassen, S., Nørvåg, K.: Efficient processing of top-k spatial keyword queries. In: SSTD, pp. 205–222 (2011)
Roussopoulos, N., Kelley, S., Vincent, F.: SIGMOD. In: Carey, M.J., Schneider, D.A. (eds.) Nearest Neighbor Queries, pp. 71–79. ACM Press, New York (1995)
Sadoghi, M., Jacobsen, H.A.: Be-tree: an index structure to efficiently match boolean expressions over high-dimensional discrete space. In: SIGMOD, pp. 637–648 (2011)
Sharifzadeh, M., Shahabi, C.: Vor-tree: R-trees with voronoi diagrams for efficient processing of spatial nearest neighbor queries. PVLDB 3(1), 1231–1242 (2010)
Wang, Y., Zhang, D., Liu, Q., Shen, F., Lee, L.H.: Towards enhancing the last-mile delivery: an effective crowd-tasking model with scalable solutions. Transp. Res. Part E: Logist. Transp. Rev. 93, 279–293 (2016)
Whang, S., Brower, C., Shanmugasundaram, J., Vassilvitskii, S., Vee, E., Yerneni, R., Garcia-Molina, H.: Indexing boolean expressions. PVLDB 2(1), 37–48 (2009)
Wu, D., Yiu, M.L., Cong, G., Jensen, C.S.: Joint top-k spatial keyword query processing. TKDE 24(10), 1889–1903 (2012)
Xin, D., He, Y., Ganti, V.: Keyword++: a framework to improve keyword search over entity databases. PVLDB 3(1), 711–722 (2010)
Yan, T.W., Garcia-Molina, H.: Index structures for selective dissemination of information under the boolean model. ACM Trans. Database Syst. 19(2), 332–364 (1994)
Yu, C., Ooi, B.C., Tan, K., Jagadish, H.V.: Indexing the distance: an efficient method to KNN processing. In: VLDB, pp. 421–430 (2001)
Zhang, C., Zhang, Y., Zhang, W., Lin, X.: Inverted linear quadtree: efficient top \(k\) spatial keyword search. In: ICDE, pp. 901–912 (2013a)
Zhang, D., Chee, Y.M., Mondal, A., Tung, A.K.H., Kitsuregawa, M.: Keyword search in spatial databases: towards searching by document. In: ICDE, pp. 688–699 (2009)
Zhang, D., Ooi, B.C., Tung, A.K.H.: Locating mapped resources in web 2.0. In: ICDE, pp. 521–532 (2010)
Zhang, D., Tan, K.L., Tung, A.K.H.: Scalable top-k spatial keyword search. In: EDBT, pp. 359–370 (2013b)
Zhang, D., Chan, C., Tan, K.: An efficient publish/subscribe index for ecommerce databases. PVLDB 7(8), 613–624 (2014a)
Zhang, D., Chan, C., Tan, K.: Processing spatial keyword query as a top-k aggregation query. In: SIGIR, pp. 355–364 (2014b). https://doi.org/10.1145/2600428.2609562
Zhang, P., Cheng, R., Mamoulis, N., Renz, M., Züfle, A., Tang, Y., Emrich, T.: Voronoi-based nearest neighbor search for multi-dimensional uncertain databases. In: ICDE, pp. 158–169 (2013c)
Zhong, R., Fan, J., Li, G., Tan, K-L., Zhou, L.: Location-aware instant search. In: CIKM, pp. 385–394 (2012)
Zhong, R., Li, G., Tan, K., Zhou, L.: G-tree: an efficient index for KNN search on road networks. In: CIKM, pp. 39–48 (2013)
Zhou, Y., Xie, X., Wang, C., Gong, Y., Ma, W.Y.: Hybrid index structures for location-based web search. In: CIKM, pp. 155–162 (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, D., Li, Y., Cao, X. et al. Augmented keyword search on spatial entity databases. The VLDB Journal 27, 225–244 (2018). https://doi.org/10.1007/s00778-018-0497-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-018-0497-6