Abstract
A top-k spatial keyword query returns k objects having the highest (or lowest) scores with regard to spatial proximity as well as text relevancy. Approaches for answering top-k spatial keyword queries can be classified into two categories: the separate index approach and the hybrid index approach. The separate index approach maintains the spatial index and the text index independently and can accommodate new data types. However, it is difficult to support top-k pruning and merging efficiently at the same time since it requires two different orders for clustering the objects: the first based on scores for top-k pruning and the second based on object IDs for efficient merging. In this paper, we propose a new separate index method called Rank-Aware Separate Index Method (RASIM) for top-k spatial keyword queries. RASIM supports both top-k pruning and efficient merging at the same time by clustering each separate index in two different orders through the partitioning technique. Specifically, RASIM partitions the set of objects in each index into rank-aware (RA) groups that contain the objects with similar scores and applies the first order to these groups according to their scores and the second order to the objects within each group according to their object IDs. Based on the RA groups, we propose two query processing algorithms: (i) External Threshold Algorithm (External TA) that supports top-k pruning in the unit of RA groups and (ii) Generalized External TA that enhances the performance of External TA by exploiting special properties of the RA groups. RASIM is the first research work that supports top-k pruning based on the separate index approach. Naturally, it keeps the advantages of the separate index approach. In addition, in terms of storage and query processing time, RASIM is more efficient than the IR-tree method, which is the prevailing method to support top-k pruning to date and is based on the hybrid index approach. Experimental results show that, compared with the IR-tree method, the index size of RASIM is reduced by up to 1.85 times, and the query performance is improved by up to 3.22 times.
Similar content being viewed by others
References
Anh, V., Moffat, A.: Impact transformation: effective and efficient Web retrieval. In: Proc. ACM SIGIR Int’l Conf. on Research and Development in Information Retrieval, pp. 3–10 (2002)
Asadi, S., Zhou, X., and Yang, G.: Using local popularity of Web resources for geo-ranking of search engine results. World Wide Web J. 12(2), 149–170 (2009)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, Addison-Wesley (1999)
Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R*-Tree: an efficient and robust access method for points and rectangles. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 322–331 (1990)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)
Brown, E.W.: Fast evaluation of structured queries for information retrieval. In: Proc. ACM Int’l SIGIR Conf. on Research and Development in Information Retrieval, pp. 30–38 (1995)
Chang, Y., Bergman, L., Castelli, V., Li, C., Lo, M., Smith, J.: The ONION technique: indexing for linear optimization queries. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 391–402 (2000)
Chaudhuri, S., Ramakrishnan, R., Weikum, G.: Integrating DB and IR technologies: what is the sound of one hand clapping? In: Proc. Conf. on Innovative Data Systems Research (CIDR), pp. 1–12 (2005)
Chen, Y., Suel, T., Markowetz, A.: Efficient query processing in geographic Web search engines. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 277–288 (2006)
Cong, G., Jensen, C., Wu, D.: Efficient retrieval of the Top-k most relevant spatial web objects. In: Proc. 35th Int’l Conf. on Very Large Data Bases (VLDB), pp. 754–765 (2009)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proc. 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pp. 102–113 (2001)
Felipe, I., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: Proc. 24th Int’l Conf. on Data Engineering (ICDE), IEEE, pp. 656–665 (2008)
Guo, L., Shanmugasundaram, J., Beyer, K., Shekita, E.: Efficient inverted lists and query algorithms for structured value ranking in update-intensive relational databases. In: Proc. 21st Int’l Conf. on Data Engineering (ICDE), IEEE, pp. 298–309 (2005)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 47–57 (1984)
Hariharan, R., Hore, B., Li, C., Mehrotra, S.: Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) systems. In: Proc. 19th Int’l Conf. on Scientific and Statistical Database Management (SSDBM), p. 16 (2007)
Harper, S., Chen, A.: Web accessibility guidelines: a lesson from the evolving Web. World Wide Web J. 15(1), 61–88 (2012)
Hjaltason, G., Samet, H.: Distance browsing in spatial databases. ACM Trans. Database Syst. 24(2), 265–318 (1999)
Ilyas, I., Beskales, G., Soliman, M.: A survey of Top-K query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 11 (2008)
Li, Z., Lee, K., Zheng, B., Lee, W., Lee, D., Wang, X.: IR-Tree: an efficient index for geographic document search. IEEE Trans. Knowl. Data Eng. 23(4), 585–599 (2011)
Long, X., Suel, T.: Optimized query execution in large search engines with global page ordering. In: Proc. 29th Int’l Conf. on Very Large Data Bases (VLDB), pp. 129–140 (2003)
Martins, B., Silva, M., Adnrade, L.: Indexing and ranking in Geo-IR systems. In: Proc. 2nd Int’l Workshop on Geo-IR(GIR), ACM SIGIR, pp. 31–34 (2005)
Masutani, O., Iwasaki, H.: BEIRA: an area-based user interface for map services. World Wide Web J. 12(1), 51–68 (2009)
Park, D., Kim, H.: An enhanced technique for k-Nearest neighbor queries with non-spatial selection predicates. Multimedia Tools and Application Archive 19(1), 79–103 (2003)
Rocha-Junior, J., Gkorgkas, O., Jonassen, S., Norvag, K.: Efficient processing of Top-k spatial keyword queries. In: Proc. 12th Intl Symposium on Spatial and Temporal Databases (SSTD), pp. 205–222 (2011)
Sellis, T., Roussopoulos, N., Faloutsos, C.: The R+-Tree: a dynamic index for multi-dimensional objects. In: Proc. 13th Int’l Conf. on Very Large Data Bases (VLDB), pp. 507–518 (1987)
Song, J., Whang, K., Lee, Y., Kim, S.: Spatial join processing using corner transformation. IEEE Trans. Knowl. Data Eng. 11(4), 688–698 (1999)
Song, J., Whang, K., Lee, Y., Lee, M., Han, W., Park, B.: The clustering property of corner transformation for spatial database applications. Inf. Softw. Technol. 44(7), 419–429 (2002)
Vaid, S., Jones, C., Joho, H., Sanderson, M.: Spatio-textual indexing for geographical search on the Web. In: Proc. 9th International Symposium on Spatial and Temporal Databases (SSTD), pp. 218–235 (2005)
Weikum, G.: DB&IR: both sides now. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 25–30 (2007)
Whang, K., Krishnamurthy, R.: The multilevel grid file: a dynamic hierarchical multidimensional file structure. In: Proc. Int’l Symposium on Database Systems for Advanced Applications (DASFAA), pp. 449–459 (1991)
Whang, K., Park, B., Han, W., Lee, Y.: Inverted index storage structure using subindexes and large objects for tight coupling of information retrieval with database management systems. United States Patent 6349308. Appl. No. 09/250,487, 15 Feb. 1999 (2002)
Whang, K., Lee, M., Lee, J., Kim, M., Han, W.: Odysseus: a high-performance ORDBMS tightly-coupled with IR features. In: Proc. 21st Int’l Conf. on Data Engineering (ICDE), IEEE, pp. 1104–1105 (2005). This paper received the Best Demonstration Award
Whang, K., Lee, J., Kim, M., Lee, M., Lee, K., Han, W., Kim, J.: Tightly-coupled spatial database features in the Odysseus/OpenGIS DBMS for high-performance. GeoInformatica 14(4), 425–446 (2010)
Zhou, Y., Xie, X., Wang, C., Gong, Y., Ma, W.: Hybrid index structures for location-based Web search. In: Proc. 14th ACM Conf. on Information and Knowledge Management (CIKM), pp. 155–162 (2005)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2), 1–56 (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kwon, HY., Whang, KY., Song, IY. et al. RASIM: a rank-aware separate index method for answering top-k spatial keyword queries. World Wide Web 16, 111–139 (2013). https://doi.org/10.1007/s11280-012-0159-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-012-0159-3