Skip to main content
Log in

RASIM: a rank-aware separate index method for answering top-k spatial keyword queries

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

A top-k spatial keyword query returns k objects having the highest (or lowest) scores with regard to spatial proximity as well as text relevancy. Approaches for answering top-k spatial keyword queries can be classified into two categories: the separate index approach and the hybrid index approach. The separate index approach maintains the spatial index and the text index independently and can accommodate new data types. However, it is difficult to support top-k pruning and merging efficiently at the same time since it requires two different orders for clustering the objects: the first based on scores for top-k pruning and the second based on object IDs for efficient merging. In this paper, we propose a new separate index method called Rank-Aware Separate Index Method (RASIM) for top-k spatial keyword queries. RASIM supports both top-k pruning and efficient merging at the same time by clustering each separate index in two different orders through the partitioning technique. Specifically, RASIM partitions the set of objects in each index into rank-aware (RA) groups that contain the objects with similar scores and applies the first order to these groups according to their scores and the second order to the objects within each group according to their object IDs. Based on the RA groups, we propose two query processing algorithms: (i) External Threshold Algorithm (External TA) that supports top-k pruning in the unit of RA groups and (ii) Generalized External TA that enhances the performance of External TA by exploiting special properties of the RA groups. RASIM is the first research work that supports top-k pruning based on the separate index approach. Naturally, it keeps the advantages of the separate index approach. In addition, in terms of storage and query processing time, RASIM is more efficient than the IR-tree method, which is the prevailing method to support top-k pruning to date and is based on the hybrid index approach. Experimental results show that, compared with the IR-tree method, the index size of RASIM is reduced by up to 1.85 times, and the query performance is improved by up to 3.22 times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anh, V., Moffat, A.: Impact transformation: effective and efficient Web retrieval. In: Proc. ACM SIGIR Int’l Conf. on Research and Development in Information Retrieval, pp. 3–10 (2002)

  2. Asadi, S., Zhou, X., and Yang, G.: Using local popularity of Web resources for geo-ranking of search engine results. World Wide Web J. 12(2), 149–170 (2009)

    Article  Google Scholar 

  3. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, Addison-Wesley (1999)

    Google Scholar 

  4. Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R*-Tree: an efficient and robust access method for points and rectangles. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 322–331 (1990)

  5. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)

    Article  Google Scholar 

  6. Brown, E.W.: Fast evaluation of structured queries for information retrieval. In: Proc. ACM Int’l SIGIR Conf. on Research and Development in Information Retrieval, pp. 30–38 (1995)

  7. Chang, Y., Bergman, L., Castelli, V., Li, C., Lo, M., Smith, J.: The ONION technique: indexing for linear optimization queries. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 391–402 (2000)

  8. Chaudhuri, S., Ramakrishnan, R., Weikum, G.: Integrating DB and IR technologies: what is the sound of one hand clapping? In: Proc. Conf. on Innovative Data Systems Research (CIDR), pp. 1–12 (2005)

  9. Chen, Y., Suel, T., Markowetz, A.: Efficient query processing in geographic Web search engines. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 277–288 (2006)

  10. Cong, G., Jensen, C., Wu, D.: Efficient retrieval of the Top-k most relevant spatial web objects. In: Proc. 35th Int’l Conf. on Very Large Data Bases (VLDB), pp. 754–765 (2009)

  11. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proc. 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pp. 102–113 (2001)

  12. Felipe, I., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: Proc. 24th Int’l Conf. on Data Engineering (ICDE), IEEE, pp. 656–665 (2008)

  13. Guo, L., Shanmugasundaram, J., Beyer, K., Shekita, E.: Efficient inverted lists and query algorithms for structured value ranking in update-intensive relational databases. In: Proc. 21st Int’l Conf. on Data Engineering (ICDE), IEEE, pp. 298–309 (2005)

  14. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 47–57 (1984)

  15. Hariharan, R., Hore, B., Li, C., Mehrotra, S.: Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) systems. In: Proc. 19th Int’l Conf. on Scientific and Statistical Database Management (SSDBM), p. 16 (2007)

  16. Harper, S., Chen, A.: Web accessibility guidelines: a lesson from the evolving Web. World Wide Web J. 15(1), 61–88 (2012)

    Article  Google Scholar 

  17. Hjaltason, G., Samet, H.: Distance browsing in spatial databases. ACM Trans. Database Syst. 24(2), 265–318 (1999)

    Article  Google Scholar 

  18. Ilyas, I., Beskales, G., Soliman, M.: A survey of Top-K query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 11 (2008)

    Article  Google Scholar 

  19. Li, Z., Lee, K., Zheng, B., Lee, W., Lee, D., Wang, X.: IR-Tree: an efficient index for geographic document search. IEEE Trans. Knowl. Data Eng. 23(4), 585–599 (2011)

    Article  Google Scholar 

  20. Long, X., Suel, T.: Optimized query execution in large search engines with global page ordering. In: Proc. 29th Int’l Conf. on Very Large Data Bases (VLDB), pp. 129–140 (2003)

  21. Martins, B., Silva, M., Adnrade, L.: Indexing and ranking in Geo-IR systems. In: Proc. 2nd Int’l Workshop on Geo-IR(GIR), ACM SIGIR, pp. 31–34 (2005)

  22. Masutani, O., Iwasaki, H.: BEIRA: an area-based user interface for map services. World Wide Web J. 12(1), 51–68 (2009)

    Article  Google Scholar 

  23. Park, D., Kim, H.: An enhanced technique for k-Nearest neighbor queries with non-spatial selection predicates. Multimedia Tools and Application Archive 19(1), 79–103 (2003)

    Article  Google Scholar 

  24. Rocha-Junior, J., Gkorgkas, O., Jonassen, S., Norvag, K.: Efficient processing of Top-k spatial keyword queries. In: Proc. 12th Intl Symposium on Spatial and Temporal Databases (SSTD), pp. 205–222 (2011)

  25. Sellis, T., Roussopoulos, N., Faloutsos, C.: The R+-Tree: a dynamic index for multi-dimensional objects. In: Proc. 13th Int’l Conf. on Very Large Data Bases (VLDB), pp. 507–518 (1987)

  26. Song, J., Whang, K., Lee, Y., Kim, S.: Spatial join processing using corner transformation. IEEE Trans. Knowl. Data Eng. 11(4), 688–698 (1999)

    Article  Google Scholar 

  27. Song, J., Whang, K., Lee, Y., Lee, M., Han, W., Park, B.: The clustering property of corner transformation for spatial database applications. Inf. Softw. Technol. 44(7), 419–429 (2002)

    Article  Google Scholar 

  28. Vaid, S., Jones, C., Joho, H., Sanderson, M.: Spatio-textual indexing for geographical search on the Web. In: Proc. 9th International Symposium on Spatial and Temporal Databases (SSTD), pp. 218–235 (2005)

  29. Weikum, G.: DB&IR: both sides now. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 25–30 (2007)

  30. Whang, K., Krishnamurthy, R.: The multilevel grid file: a dynamic hierarchical multidimensional file structure. In: Proc. Int’l Symposium on Database Systems for Advanced Applications (DASFAA), pp. 449–459 (1991)

  31. Whang, K., Park, B., Han, W., Lee, Y.: Inverted index storage structure using subindexes and large objects for tight coupling of information retrieval with database management systems. United States Patent 6349308. Appl. No. 09/250,487, 15 Feb. 1999 (2002)

  32. Whang, K., Lee, M., Lee, J., Kim, M., Han, W.: Odysseus: a high-performance ORDBMS tightly-coupled with IR features. In: Proc. 21st Int’l Conf. on Data Engineering (ICDE), IEEE, pp. 1104–1105 (2005). This paper received the Best Demonstration Award

  33. Whang, K., Lee, J., Kim, M., Lee, M., Lee, K., Han, W., Kim, J.: Tightly-coupled spatial database features in the Odysseus/OpenGIS DBMS for high-performance. GeoInformatica 14(4), 425–446 (2010)

    Article  Google Scholar 

  34. Zhou, Y., Xie, X., Wang, C., Gong, Y., Ma, W.: Hybrid index structures for location-based Web search. In: Proc. 14th ACM Conf. on Information and Knowledge Management (CIKM), pp. 155–162 (2005)

  35. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2), 1–56 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyu-Young Whang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kwon, HY., Whang, KY., Song, IY. et al. RASIM: a rank-aware separate index method for answering top-k spatial keyword queries. World Wide Web 16, 111–139 (2013). https://doi.org/10.1007/s11280-012-0159-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-012-0159-3

Keywords

Navigation