Skip to main content
Log in

Augmented keyword search on spatial entity databases

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

In this paper, we propose a new type of query that augments the spatial keyword search with an additional boolean expression constraint. The query is issued against a corpus of structured or semi-structured spatial entities and is very useful in applications like mobile search and targeted location-aware advertising. We devise three types of indexing and filtering strategies. First, we utilize the hybrid \(\hbox {IR}^2\)-tree and propose a novel hashing scheme for efficient pruning. Second, we propose an inverted index-based solution, named BE-Inv, that is more cache concious and exhibits great pruning power for boolean expression matching. Our third method, named SKB-Inv, adopts a novel two-level partitioning scheme to organize the spatial entities into inverted lists and effectively facilitate the pruning in the spatial, textual, and boolean expression dimensions. In addition, we propose an adaptive query processing strategy that takes into account the selectivity of query keywords and predicates for early termination. We conduct our experiments using two real datasets with 3.5 million Foursquare venues and 50 million Twitter geo-profiles. The results show that the methods based on inverted index are superior to the hybrid \(\hbox {IR}^2\)-tree; and SKB-Inv achieves the best performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. https://www.statista.com/chart/1520/number-of-monthly-active-twitter-users/

  2. http://www.statista.com/statistics/277958/number-of-mobile-active-facebook-users-worldwide/

  3. The location information can be naturally derived from the GPS of smartphones.

  4. https://www.yelp.com/

  5. https://www.gotinder.com/

  6. http://appcrawlr.com/app/search?q=sort+by+distance&device=ios

  7. With the prevalence of large-memory machines, there is a trend of designing memory-resident indexes for efficient information retrieval. For instance, Twitter’s EarlyBird system [2] is designed to be in-memory to support real-time keyword search. Our paper follows the trend and assumes that the index can be accommodated in memory.

  8. https://local.google.com/

  9. https://local.yahoo.com/

  10. https://developer.foursquare.com/

  11. http://www.gregsadetsky.com/aol-data/.

References

  1. Ahmed, J., Siyal, M.Y., Najam, S., Najam, Z.: Challenges and Issues in Modern Computer Architectures. Springer, Singapore, pp. 23–29 (2017). https://doi.org/10.1007/978-981-10-3120-5_3

  2. Asadi, N., Lin, J.: Fast candidate generation for real-time tweet search with bloom filter chains. ACM Trans. Inf. Syst., p 13 (2013)

  3. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Garcia-Molina, H., Jagadish, H.V. (eds.), pp. 322–331. SIGMOD, ACM Press (1990)

  4. Chakrabarti, K., Chaudhuri, S., Cheng, T., Xin, D.: A framework for robust discovery of entity synonyms. In: KDD, pp 1384–1392 (2012). https://doi.org/10.1145/2339530.2339743

  5. Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: an experimental evaluation. PVLDB 6(3), 217–228 (2013)

    Google Scholar 

  6. Chen, Y.Y., Suel, T., Markowetz, A.: Efficient query processing in geographic web search engines. In: SIGMOD, pp. 277–288 (2006)

  7. Cheng, T., Lauw, H.W., Paparizos, S.: Fuzzy matching of web queries to structured data. In: ICDE, pp. 713–716 (2010) https://doi.org/10.1109/ICDE.2010.5447817

  8. Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. PVLDB 2(1), 337–348 (2009)

    Google Scholar 

  9. Ding, B., König, A.C.: Fast set intersection in memory. PVLDB 4(4), 255–266 (2011)

    Google Scholar 

  10. Fabret, F., Jacobsen, H.A., Llirbat, F., Pereira, J., Ross, K.A., Shasha, D.: Filtering algorithms and implementation for very fast publish/subscribe. In: SIGMOD, pp. 115–126 (2001)

  11. Faloutsos, C., Christodoulakis, S.: Signature files: an access method for documents and its analytical performance evaluation. ACM Trans. Inf. Syst. 2(4), 267–288 (1984)

    Article  Google Scholar 

  12. Felipe, I.D., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: ICDE, pp. 656–665 (2008)

  13. Finkel, R.A., Bentley, J.L.: Quad trees: a data structure for retrieval on composite keys. Acta Inform. 4, 1–9 (1974)

    Article  MATH  Google Scholar 

  14. Fontoura, M., Josifovski, V., Kumar, R., Olston, C., Tomkins, A., Vassilvitskii, S.: Relaxation in text search using taxonomies. PVLDB 1(1), 672–683 (2008)

    Google Scholar 

  15. Gaede, V., Günther, O.: Multidimensional access methods. ACM Comput. Surv. 30(2), 170–231 (1998)

    Article  Google Scholar 

  16. Gargantini, I.: An effective way to represent quadtrees. Commun. ACM 25(12), 905–910 (1982)

    Article  MATH  Google Scholar 

  17. Hariharan, R., Hore, B., Li, C., Mehrotra, S.: Processing spatial-keyword (sk) queries in geographic information retrieval (gir) systems. In: SSDBM, p. 16 (2007)

  18. Kolahdouzan, M.R., Shahabi, C.: Voronoi-based \(k\) nearest neighbor search for spatial network databases. In: VLDB, pp 840–851 (2004)

  19. Lee, T., Park, J., Lee, S., Hwang, S., Elnikety, S., He, Y.: Processing and optimizing main memory spatial-keyword queries. PVLDB 9(3), 132–143 (2015)

    Google Scholar 

  20. Nievergelt, J., Hinterberger, H., Sevcik, K.C.: The grid file: an adaptable, symmetric multikey file structure. ACM Trans. Database Syst. 9(1), 38–71 (1984)

    Article  Google Scholar 

  21. Papadias, D., Kalnis, P., Zhang, J., Tao, Y.: Efficient olap operations in spatial data warehouses. In: SSTD, pp. 443–459 (2001)

  22. Parameswaran, A.G., Kaushik, R., Arasu, A.: Efficient parsing-based search over structured data. In: CIKM, pp. 49–58 (2013). https://doi.org/10.1145/2505515.2505764

  23. Rocha-Junior, J.B., Nørvåg, K.: Top-k spatial keyword queries on road networks. In: EDBT, pp. 168–179 (2012)

  24. Rocha-Junior, J.B., Gkorgkas, O., Jonassen, S., Nørvåg, K.: Efficient processing of top-k spatial keyword queries. In: SSTD, pp. 205–222 (2011)

  25. Roussopoulos, N., Kelley, S., Vincent, F.: SIGMOD. In: Carey, M.J., Schneider, D.A. (eds.) Nearest Neighbor Queries, pp. 71–79. ACM Press, New York (1995)

  26. Sadoghi, M., Jacobsen, H.A.: Be-tree: an index structure to efficiently match boolean expressions over high-dimensional discrete space. In: SIGMOD, pp. 637–648 (2011)

  27. Sharifzadeh, M., Shahabi, C.: Vor-tree: R-trees with voronoi diagrams for efficient processing of spatial nearest neighbor queries. PVLDB 3(1), 1231–1242 (2010)

    Google Scholar 

  28. Wang, Y., Zhang, D., Liu, Q., Shen, F., Lee, L.H.: Towards enhancing the last-mile delivery: an effective crowd-tasking model with scalable solutions. Transp. Res. Part E: Logist. Transp. Rev. 93, 279–293 (2016)

    Article  Google Scholar 

  29. Whang, S., Brower, C., Shanmugasundaram, J., Vassilvitskii, S., Vee, E., Yerneni, R., Garcia-Molina, H.: Indexing boolean expressions. PVLDB 2(1), 37–48 (2009)

    Google Scholar 

  30. Wu, D., Yiu, M.L., Cong, G., Jensen, C.S.: Joint top-k spatial keyword query processing. TKDE 24(10), 1889–1903 (2012)

    Google Scholar 

  31. Xin, D., He, Y., Ganti, V.: Keyword++: a framework to improve keyword search over entity databases. PVLDB 3(1), 711–722 (2010)

    Google Scholar 

  32. Yan, T.W., Garcia-Molina, H.: Index structures for selective dissemination of information under the boolean model. ACM Trans. Database Syst. 19(2), 332–364 (1994)

    Article  Google Scholar 

  33. Yu, C., Ooi, B.C., Tan, K., Jagadish, H.V.: Indexing the distance: an efficient method to KNN processing. In: VLDB, pp. 421–430 (2001)

  34. Zhang, C., Zhang, Y., Zhang, W., Lin, X.: Inverted linear quadtree: efficient top \(k\) spatial keyword search. In: ICDE, pp. 901–912 (2013a)

  35. Zhang, D., Chee, Y.M., Mondal, A., Tung, A.K.H., Kitsuregawa, M.: Keyword search in spatial databases: towards searching by document. In: ICDE, pp. 688–699 (2009)

  36. Zhang, D., Ooi, B.C., Tung, A.K.H.: Locating mapped resources in web 2.0. In: ICDE, pp. 521–532 (2010)

  37. Zhang, D., Tan, K.L., Tung, A.K.H.: Scalable top-k spatial keyword search. In: EDBT, pp. 359–370 (2013b)

  38. Zhang, D., Chan, C., Tan, K.: An efficient publish/subscribe index for ecommerce databases. PVLDB 7(8), 613–624 (2014a)

    Google Scholar 

  39. Zhang, D., Chan, C., Tan, K.: Processing spatial keyword query as a top-k aggregation query. In: SIGIR, pp. 355–364 (2014b). https://doi.org/10.1145/2600428.2609562

  40. Zhang, P., Cheng, R., Mamoulis, N., Renz, M., Züfle, A., Tang, Y., Emrich, T.: Voronoi-based nearest neighbor search for multi-dimensional uncertain databases. In: ICDE, pp. 158–169 (2013c)

  41. Zhong, R., Fan, J., Li, G., Tan, K-L., Zhou, L.: Location-aware instant search. In: CIKM, pp. 385–394 (2012)

  42. Zhong, R., Li, G., Tan, K., Zhou, L.: G-tree: an efficient index for KNN search on road networks. In: CIKM, pp. 39–48 (2013)

  43. Zhou, Y., Xie, X., Wang, C., Gong, Y., Ma, W.Y.: Hybrid index structures for location-based web search. In: CIKM, pp. 155–162 (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Tao Shen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, D., Li, Y., Cao, X. et al. Augmented keyword search on spatial entity databases. The VLDB Journal 27, 225–244 (2018). https://doi.org/10.1007/s00778-018-0497-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-018-0497-6

Keywords

Navigation