Abstract
Geographic objects associated with descriptive texts are becoming prevalent, justifying the need for spatial-keyword queries that consider both locations and textual descriptions of the objects. Specifically, the relevance of an object to a query is measured by spatial-textual similarity that is based on both spatial proximity and textual similarity. In this article, we introduce the Reverse Spatial-Keyword k-Nearest Neighbor (RSKkNN) query, which finds those objects that have the query as one of their k-nearest spatial-textual objects. The RSKkNN queries have numerous applications in online maps and GIS decision support systems.
To answer RSKkNN queries efficiently, we propose a hybrid index tree, called IUR-tree (Intersection-Union R-tree) that effectively combines location proximity with textual similarity. Subsequently, we design a branch-and-bound search algorithm based on the IUR-tree. To accelerate the query processing, we improve IUR-tree by leveraging the distribution of textual description, leading to some variants of the IUR-tree called Clustered IUR-tree (CIUR-tree) and combined clustered IUR-tree (C2IUR-tree), for each of which we develop optimized algorithms. We also provide a theoretical cost model to analyze the efficiency of our algorithms. Our empirical studies show that the proposed algorithms are efficient and scalable.
- E. Achtert, C. Böhm, P. Kröger, and P. Kunath. 2006. Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). 515--526. Google ScholarDigital Library
- E. Achtert, H.-P. Kriegel, P. Kröger, M. Renz, and A. Zufle. 2009. Reverse k-nearest neighbor search in dynamic and general metric databases. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (EDBT'09). 886--897. Google ScholarDigital Library
- S. Berchtold, C. Bohm, D. Keim, and H. Kriegel. 1997. A cost model for nearest neighbour search in high-dimensional data space. In Proceedings of the 16th ACM Conference on Principles of Database Systems (PODS'97). 78--86. Google ScholarDigital Library
- C. Bohm and H. Kriegel. 2001. A cost model and index architecture for the similarity join. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'01). 411--420. Google ScholarDigital Library
- S. Boriah, V. Chandola, and V. Kumar. 2008. Similarity measures for categorical data: A comparative evaluation. In Proceedings of the SIAM International Conference on Data Mining. 243--254.Google Scholar
- X. Cao, G. Cong, and C. S. Jensen. 2010. Retrieving top-k prestige-based relevant spatial web objects. Proc. VLDB Endow. 3, 1, 373--384. Google ScholarDigital Library
- M. A. Cheema, X. Lin, W. Zhang, and Y. Zhang. 2011. Influence zone: Efficiently processing reverse k nearest neighbors queries. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). 577--588. Google ScholarDigital Library
- M. A. Cheema, X. Lin, W. Zhang, and Y. Zhang. 2012. Efficiently processing snapshot and continuous reverse k nearest neighbors queries. The VLDB J. 21, 5, 703--728. Google ScholarDigital Library
- M. A. Cheema, X. Lin, Y. Zhang, W. Wang, and W. Zhang. 2009. Lazy updates: An efficient technique to continuously monitoring reverse knn. Proc. VLDB Endow. 2, 1, 1138--1149. Google ScholarDigital Library
- G. Cong, C. S. Jensen, and D. Wu. 2009. Efficient retrieval of the top-k most relevant spatial web objects. Proc. VLDB Endow. 2, 1, 337--348. Google ScholarDigital Library
- A. Corral, Y. Manolopoulos, Y. Theodoridis, and M. Vassilakopoulos. 2006. Cost models for distance joins queries using r-trees. Data Knowl. Engin. 57, 1, 1--36. Google ScholarDigital Library
- M. H. Degroot and M. J. Schervish. 2004. Probability and Statistics. Pearson Education.Google Scholar
- T. Emrich, H.-P. Kriegel, P. Kroger, M. Renz, N. Xu, and A. Zufle. 2010. Reverse k-nearest neighbor monitoring on mobile objects. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS'10). 494--497. Google ScholarDigital Library
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge, Discovery and Data Mining (KDD'96). 226--231.Google Scholar
- R. Fagin, A. Lotem, and M. Naor. 2003. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 614--656. Google ScholarDigital Library
- C. Faloutsos and I. Kamel. 1994. Beyond uniformity and independence: Analysis of r-trees using the concept of fractal dimension. In Proceeding of the 13th ACM SIGACT-SIGMODE-SIGART Symposium on Principles of Database Systems. 4--13. Google ScholarDigital Library
- C. Faloutsos, T. K. Sellis, and N. Roussopoulos. 1987. Analysis of object oriented spatial access methods. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'87). 426--439. Google ScholarDigital Library
- I. D. Felipe, V. Hristidis, and N. Rishe. 2008. Keyword search on spatial databases. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'08). 656--665. Google ScholarDigital Library
- E. A. Fox, Q. F. Chen, A. M. Daoud, and L. S. Heath. 1991. Order-preserving minimal perfect hash functions and information retrieval. ACM Trans. Inf. Syst. 9, 3, 281--308. Google ScholarDigital Library
- A. Guttman. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'84). 47--57. Google ScholarDigital Library
- T. H. Haveliwala, A. Gionis, D. Klein, and P. Indyk. 2002. Evaluating strategies for similarity search on the web. In Proceedings of the 11th International Conference on World Wide Web (WWW'02). 432--442. Google ScholarDigital Library
- A. Huang. 2008. Similarity measures for text document clustering. In Proceedings of the New Zealand Computer Science Research Student Conference. 49--56.Google Scholar
- Y. Huang, N. Jing, and E. A. Rundensteiner. 1997. A cost model for estimating the performance of spatial joins using r-trees. In Proceedings of the 9th International Conference on Scientific and Statistical Database Management (SSDBM'97). 30--38. Google ScholarDigital Library
- T. Johnson and D. Shasha. 1994. 2q: A low overhead high performance buffer management replacement algorithm. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94). 439--450. Google ScholarDigital Library
- I. Kamel and C. Faloutsos. 1993. On packing r-trees. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'93). 490--499. Google ScholarDigital Library
- J. M. Kang, M. F. Mokbel, S. Shekhar, T. Xia, and D. Zhang. 2007. Continuous evaluation of monochromatic and bichromatic reverse nearest neighbors. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). 806--815.Google Scholar
- A. Khodaei, C. Shahabi, and C. Li. 2012. Skif-p: A point-based indexing and ranking of web documents for spatial-keyword search. Geoinformatica 16, 3, 563--596. Google ScholarDigital Library
- F. Korn and S. Muthukrishnan. 2000. Influenced sets based on reverse nearest neighbor queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'00). 201--212. Google ScholarDigital Library
- F. Korn, B. Pagel, and C. Faloutsos. 2001. On the ‘dimensionlity curse’ and the ‘self-similarity blessing’. IEEE Trans. Knowl. Data Engin. 13, 1, 96--111. Google ScholarDigital Library
- S. Kullback and R. A. Leibler. 1951. On information and sufficiency. Ann. Math. Statist. 22, 1, 79--86.Google ScholarCross Ref
- M. D. Lee and M. Welsh. 2005. An empirical evaluation of models of text document similarity. In Proceedings of the Annual Conference of the Cognitive Science Society (CogSci'05). 1254--1259.Google Scholar
- Z. Li, K. C. K. Lee, B. Zheng, W.-C. Lee, D. L. Lee, and X. Wang. 2011. Ir-tree: An efficient index for geographic document search. IEEE Trans. Knowl. Data Engin. 23, 4, 585--599. Google ScholarDigital Library
- K.-I. Lin, M. Nolen, and C. Yang. 2003. Applying bulk insertion techniques for dynamic reverse nearest neighbor problems. In Proceedings of the International Database Engineering and Applications Symposium (IDEAS'03). 290--297.Google Scholar
- J. Lu, Y. Lu, and G. Cong. 2011. Reverse spatial and textual k nearest neighbor search. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'11). 349--360. Google ScholarDigital Library
- B. Pagel, H.-W. Six, H. Toben, and P. Widmayer. 1993. Towards an analysis of range query performance in spatial data structures. In Proceeding of the 12th ACM SIGACT-SIGMODE-SIGART Symposium on Principles of Database Systems. 214--221. Google ScholarDigital Library
- A. Papadopoulos and Y. Manolopoulos. 1997. Performance of nearest neighbour queries in r-trees. In Proceeding of the 6th International Conference on Database Theory. 394--408. Google ScholarDigital Library
- N. Roussopoulos, S. Kelley, and F. Vincent. 1995. Nearest neighbor queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'95). 71--79. Google ScholarDigital Library
- S. Saltenis, C. S. Jensen, S. T. Leutenegger, and M. A. Lopez. 2000. Indexing the positions of continuously moving objects. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'00). 331--342. Google ScholarDigital Library
- Salton. 1988. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. Int. J. 24, 5, 513--523. Google ScholarDigital Library
- A. Singh, H. Ferhatosmanoglu, and A. S. Tosun. 2003. High dimensional reverse nearest neighbor queries. In Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM'03). 91--98. Google ScholarDigital Library
- I. Stanoi, D. Agrawal, and A. E. Abbadi. 2000. Reverse nearest neighbor queries for dynamic databases. In Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. 44--53.Google Scholar
- I. Stanoi, M. Riedewald, D. Agrawal, and A. Abbadi. 2001. Discovery of influence sets in frequently updated databases. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01). 99--108. Google ScholarDigital Library
- A. Strehl, E. Strehl, J. Ghosh, and R. Mooney. 2000. Impact of similarity measures on web-page clustering. In Proceedings of the Workshop on Artificial Intelligence for Web Search (AAAI'00). 58--64.Google Scholar
- P.-N. Tan, M. Steinbach, and V. Kumar. 2005. Introduction to Data Mining. Addison-Wesley. Google ScholarDigital Library
- Y. Tao and D. Papadias. 2003. Spatial queries in dynamic environments. ACM Trans. Database Syst. 28, 2, 101--139. Google ScholarDigital Library
- Y. Tao, D. Papadias, and X. Lian. 2004a. Reverse knn search in arbitrary dimensionality. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB'04). 744--755. Google ScholarDigital Library
- Y. Tao, J. Zhang, D. Papadias, and N. Mamoulis. 2004b. An efficient cost model for optimization of nearest neighbour search in low and medium dimensional spaces. IEEE Trans. Knowl. Data Engin. 16, 1169--1184. Google ScholarDigital Library
- Y. Theodoridis and T. Sellis. 1996. A model for the prediction of r-tree performance. In Proceedings of the 15th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'96). 161--171. Google ScholarDigital Library
- Y. Theodoridis, E. Stefanakis, and T. Sellis. 2000. Efficient cost models for spatial queries using r-trees. IEEE Trans. Knowl. Data Engin. 12, 1, 19--32. Google ScholarDigital Library
- E. C. Titchmarsh. 2005. The Theory of the Riemann Zeta-Function. Oxford University Press.Google Scholar
- S. Vaid, C. B. Jones, H. Joho, and M. Sanderson. 2005. Spatio-textual indexing for geographical search on the web. In Proceedings of the International Conference on Advances in Spatial and Temporal Databases (SSTD'05). 218--235. Google ScholarDigital Library
- A. Vlachou, C. Doulkeridis, Y. Kotidis, and K. Nørvag. 2010. Reverse top-k queries. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'10). 365--376.Google Scholar
- W. Wu, F. Yang, C. Y. Chan, and K.-L. Tan. 2008a. Continuous reverse k-nearest-neighbor monitoring. In Proceedings of the International Conference on Mobile Data Management (MDM'08). 132--139. Google ScholarDigital Library
- W. Wu, F. Yang, C.-Y. Chan, and K.-L. Tan. 2008b. Finch: Evaluating reverse k-nearest-neighbor queries on location data. Proc. VLDB. Endow. 1, 1, 1056--1067. Google ScholarDigital Library
- D. Zhang, Y. M. Chee, A. Mondal, A. K. H. Tung, and M. Kitsuregawa. 2009. Keyword search in spatial databases: Towards searching by document. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). 688--699. Google ScholarDigital Library
- Y. Zhou, X. Xie, C. Wang, Y. Gong, and W.-Y. Ma. 2005. Hybrid index structures for location-based web search. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM'05). ACM Press, New York, 155--162. Google ScholarDigital Library
Index Terms
- Efficient Algorithms and Cost Models for Reverse Spatial-Keyword k-Nearest Neighbor Search
Recommendations
Reverse spatial and textual k nearest neighbor search
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataGeographic objects associated with descriptive texts are becoming prevalent. This gives prominence to spatial keyword queries that take into account both the locations and textual descriptions of content. Specifically, the relevance of an object to a ...
Efficient algorithms for answering reverse spatial-keyword nearest neighbor queries
SIGSPATIAL '15: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information SystemsWith the proliferation of local services and GPS-enabled mobile phones, reverse spatial-keyword Nearest Neighbor queries are becoming an important type of query. Given a service object (e.g., shop) q as the query, which has a location and a text ...
Ranked Reverse Nearest Neighbor Search
Given a set of data points P and a query point q in a multidimensional space, Reverse Nearest Neighbor (RNN) query finds data points in P whose nearest neighbors are q. Reverse k-Nearest Neighbor (RkNN) query (where k ≥ 1) generalizes RNN query to find ...
Comments