research-article

Efficient Algorithms and Cost Models for Reverse Spatial-Keyword k-Nearest Neighbor Search

Authors:
Ying Lu

Renmin University of China, Los Angeles, CA

Renmin University of China, Los Angeles, CA
View Profile

,
Jiaheng Lu

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Gao Cong

Nanyang Technological University, Singapore

Nanyang Technological University, Singapore
View Profile

,
Wei Wu

Institute for Infocomm Research, Singapore

Institute for Infocomm Research, Singapore
View Profile

,
Cyrus Shahabi

University of Southern California, Los Angeles, CA

University of Southern California, Los Angeles, CA
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 39 Issue 2Article No.: 13pp 1–46https://doi.org/10.1145/2576232

Published:26 May 2014Publication History

ACM Transactions on Database Systems

Abstract

Geographic objects associated with descriptive texts are becoming prevalent, justifying the need for spatial-keyword queries that consider both locations and textual descriptions of the objects. Specifically, the relevance of an object to a query is measured by spatial-textual similarity that is based on both spatial proximity and textual similarity. In this article, we introduce the Reverse Spatial-Keyword k-Nearest Neighbor (RSKkNN) query, which finds those objects that have the query as one of their k-nearest spatial-textual objects. The RSKkNN queries have numerous applications in online maps and GIS decision support systems.

To answer RSKkNN queries efficiently, we propose a hybrid index tree, called IUR-tree (Intersection-Union R-tree) that effectively combines location proximity with textual similarity. Subsequently, we design a branch-and-bound search algorithm based on the IUR-tree. To accelerate the query processing, we improve IUR-tree by leveraging the distribution of textual description, leading to some variants of the IUR-tree called Clustered IUR-tree (CIUR-tree) and combined clustered IUR-tree (C²IUR-tree), for each of which we develop optimized algorithms. We also provide a theoretical cost model to analyze the efficiency of our algorithms. Our empirical studies show that the proposed algorithms are efficient and scalable.

References

E. Achtert, C. Böhm, P. Kröger, and P. Kunath. 2006. Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). 515--526. Google ScholarDigital Library
E. Achtert, H.-P. Kriegel, P. Kröger, M. Renz, and A. Zufle. 2009. Reverse k-nearest neighbor search in dynamic and general metric databases. In Proceedings of the 12^th International Conference on Extending Database Technology: Advances in Database Technology (EDBT'09). 886--897. Google ScholarDigital Library
S. Berchtold, C. Bohm, D. Keim, and H. Kriegel. 1997. A cost model for nearest neighbour search in high-dimensional data space. In Proceedings of the 16^th ACM Conference on Principles of Database Systems (PODS'97). 78--86. Google ScholarDigital Library
C. Bohm and H. Kriegel. 2001. A cost model and index architecture for the similarity join. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'01). 411--420. Google ScholarDigital Library
S. Boriah, V. Chandola, and V. Kumar. 2008. Similarity measures for categorical data: A comparative evaluation. In Proceedings of the SIAM International Conference on Data Mining. 243--254.Google Scholar
X. Cao, G. Cong, and C. S. Jensen. 2010. Retrieving top-k prestige-based relevant spatial web objects. Proc. VLDB Endow. 3, 1, 373--384. Google ScholarDigital Library
M. A. Cheema, X. Lin, W. Zhang, and Y. Zhang. 2011. Influence zone: Efficiently processing reverse k nearest neighbors queries. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). 577--588. Google ScholarDigital Library
M. A. Cheema, X. Lin, W. Zhang, and Y. Zhang. 2012. Efficiently processing snapshot and continuous reverse k nearest neighbors queries. The VLDB J. 21, 5, 703--728. Google ScholarDigital Library
M. A. Cheema, X. Lin, Y. Zhang, W. Wang, and W. Zhang. 2009. Lazy updates: An efficient technique to continuously monitoring reverse knn. Proc. VLDB Endow. 2, 1, 1138--1149. Google ScholarDigital Library
G. Cong, C. S. Jensen, and D. Wu. 2009. Efficient retrieval of the top-k most relevant spatial web objects. Proc. VLDB Endow. 2, 1, 337--348. Google ScholarDigital Library
A. Corral, Y. Manolopoulos, Y. Theodoridis, and M. Vassilakopoulos. 2006. Cost models for distance joins queries using r-trees. Data Knowl. Engin. 57, 1, 1--36. Google ScholarDigital Library
M. H. Degroot and M. J. Schervish. 2004. Probability and Statistics. Pearson Education.Google Scholar
T. Emrich, H.-P. Kriegel, P. Kroger, M. Renz, N. Xu, and A. Zufle. 2010. Reverse k-nearest neighbor monitoring on mobile objects. In Proceedings of the 18^th SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS'10). 494--497. Google ScholarDigital Library
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2^nd International Conference on Knowledge, Discovery and Data Mining (KDD'96). 226--231.Google Scholar
R. Fagin, A. Lotem, and M. Naor. 2003. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 614--656. Google ScholarDigital Library
C. Faloutsos and I. Kamel. 1994. Beyond uniformity and independence: Analysis of r-trees using the concept of fractal dimension. In Proceeding of the 13^th ACM SIGACT-SIGMODE-SIGART Symposium on Principles of Database Systems. 4--13. Google ScholarDigital Library
C. Faloutsos, T. K. Sellis, and N. Roussopoulos. 1987. Analysis of object oriented spatial access methods. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'87). 426--439. Google ScholarDigital Library
I. D. Felipe, V. Hristidis, and N. Rishe. 2008. Keyword search on spatial databases. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'08). 656--665. Google ScholarDigital Library
E. A. Fox, Q. F. Chen, A. M. Daoud, and L. S. Heath. 1991. Order-preserving minimal perfect hash functions and information retrieval. ACM Trans. Inf. Syst. 9, 3, 281--308. Google ScholarDigital Library
A. Guttman. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'84). 47--57. Google ScholarDigital Library
T. H. Haveliwala, A. Gionis, D. Klein, and P. Indyk. 2002. Evaluating strategies for similarity search on the web. In Proceedings of the 11^th International Conference on World Wide Web (WWW'02). 432--442. Google ScholarDigital Library
A. Huang. 2008. Similarity measures for text document clustering. In Proceedings of the New Zealand Computer Science Research Student Conference. 49--56.Google Scholar
Y. Huang, N. Jing, and E. A. Rundensteiner. 1997. A cost model for estimating the performance of spatial joins using r-trees. In Proceedings of the 9^th International Conference on Scientific and Statistical Database Management (SSDBM'97). 30--38. Google ScholarDigital Library
T. Johnson and D. Shasha. 1994. 2q: A low overhead high performance buffer management replacement algorithm. In Proceedings of the 20^th International Conference on Very Large Data Bases (VLDB'94). 439--450. Google ScholarDigital Library
I. Kamel and C. Faloutsos. 1993. On packing r-trees. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'93). 490--499. Google ScholarDigital Library
J. M. Kang, M. F. Mokbel, S. Shekhar, T. Xia, and D. Zhang. 2007. Continuous evaluation of monochromatic and bichromatic reverse nearest neighbors. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). 806--815.Google Scholar
A. Khodaei, C. Shahabi, and C. Li. 2012. Skif-p: A point-based indexing and ranking of web documents for spatial-keyword search. Geoinformatica 16, 3, 563--596. Google ScholarDigital Library
F. Korn and S. Muthukrishnan. 2000. Influenced sets based on reverse nearest neighbor queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'00). 201--212. Google ScholarDigital Library
F. Korn, B. Pagel, and C. Faloutsos. 2001. On the ‘dimensionlity curse’ and the ‘self-similarity blessing’. IEEE Trans. Knowl. Data Engin. 13, 1, 96--111. Google ScholarDigital Library
S. Kullback and R. A. Leibler. 1951. On information and sufficiency. Ann. Math. Statist. 22, 1, 79--86.Google ScholarCross Ref
M. D. Lee and M. Welsh. 2005. An empirical evaluation of models of text document similarity. In Proceedings of the Annual Conference of the Cognitive Science Society (CogSci'05). 1254--1259.Google Scholar
Z. Li, K. C. K. Lee, B. Zheng, W.-C. Lee, D. L. Lee, and X. Wang. 2011. Ir-tree: An efficient index for geographic document search. IEEE Trans. Knowl. Data Engin. 23, 4, 585--599. Google ScholarDigital Library
K.-I. Lin, M. Nolen, and C. Yang. 2003. Applying bulk insertion techniques for dynamic reverse nearest neighbor problems. In Proceedings of the International Database Engineering and Applications Symposium (IDEAS'03). 290--297.Google Scholar
J. Lu, Y. Lu, and G. Cong. 2011. Reverse spatial and textual k nearest neighbor search. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'11). 349--360. Google ScholarDigital Library
B. Pagel, H.-W. Six, H. Toben, and P. Widmayer. 1993. Towards an analysis of range query performance in spatial data structures. In Proceeding of the 12^th ACM SIGACT-SIGMODE-SIGART Symposium on Principles of Database Systems. 214--221. Google ScholarDigital Library
A. Papadopoulos and Y. Manolopoulos. 1997. Performance of nearest neighbour queries in r-trees. In Proceeding of the 6^th International Conference on Database Theory. 394--408. Google ScholarDigital Library
N. Roussopoulos, S. Kelley, and F. Vincent. 1995. Nearest neighbor queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'95). 71--79. Google ScholarDigital Library
S. Saltenis, C. S. Jensen, S. T. Leutenegger, and M. A. Lopez. 2000. Indexing the positions of continuously moving objects. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'00). 331--342. Google ScholarDigital Library
Salton. 1988. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. Int. J. 24, 5, 513--523. Google ScholarDigital Library
A. Singh, H. Ferhatosmanoglu, and A. S. Tosun. 2003. High dimensional reverse nearest neighbor queries. In Proceedings of the 12^th International Conference on Information and Knowledge Management (CIKM'03). 91--98. Google ScholarDigital Library
I. Stanoi, D. Agrawal, and A. E. Abbadi. 2000. Reverse nearest neighbor queries for dynamic databases. In Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. 44--53.Google Scholar
I. Stanoi, M. Riedewald, D. Agrawal, and A. Abbadi. 2001. Discovery of influence sets in frequently updated databases. In Proceedings of the 27^th International Conference on Very Large Data Bases (VLDB'01). 99--108. Google ScholarDigital Library
A. Strehl, E. Strehl, J. Ghosh, and R. Mooney. 2000. Impact of similarity measures on web-page clustering. In Proceedings of the Workshop on Artificial Intelligence for Web Search (AAAI'00). 58--64.Google Scholar
P.-N. Tan, M. Steinbach, and V. Kumar. 2005. Introduction to Data Mining. Addison-Wesley. Google ScholarDigital Library
Y. Tao and D. Papadias. 2003. Spatial queries in dynamic environments. ACM Trans. Database Syst. 28, 2, 101--139. Google ScholarDigital Library
Y. Tao, D. Papadias, and X. Lian. 2004a. Reverse knn search in arbitrary dimensionality. In Proceedings of the 13^th International Conference on Very Large Data Bases (VLDB'04). 744--755. Google ScholarDigital Library
Y. Tao, J. Zhang, D. Papadias, and N. Mamoulis. 2004b. An efficient cost model for optimization of nearest neighbour search in low and medium dimensional spaces. IEEE Trans. Knowl. Data Engin. 16, 1169--1184. Google ScholarDigital Library
Y. Theodoridis and T. Sellis. 1996. A model for the prediction of r-tree performance. In Proceedings of the 15^th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'96). 161--171. Google ScholarDigital Library
Y. Theodoridis, E. Stefanakis, and T. Sellis. 2000. Efficient cost models for spatial queries using r-trees. IEEE Trans. Knowl. Data Engin. 12, 1, 19--32. Google ScholarDigital Library
E. C. Titchmarsh. 2005. The Theory of the Riemann Zeta-Function. Oxford University Press.Google Scholar
S. Vaid, C. B. Jones, H. Joho, and M. Sanderson. 2005. Spatio-textual indexing for geographical search on the web. In Proceedings of the International Conference on Advances in Spatial and Temporal Databases (SSTD'05). 218--235. Google ScholarDigital Library
A. Vlachou, C. Doulkeridis, Y. Kotidis, and K. Nørvag. 2010. Reverse top-k queries. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'10). 365--376.Google Scholar
W. Wu, F. Yang, C. Y. Chan, and K.-L. Tan. 2008a. Continuous reverse k-nearest-neighbor monitoring. In Proceedings of the International Conference on Mobile Data Management (MDM'08). 132--139. Google ScholarDigital Library
W. Wu, F. Yang, C.-Y. Chan, and K.-L. Tan. 2008b. Finch: Evaluating reverse k-nearest-neighbor queries on location data. Proc. VLDB. Endow. 1, 1, 1056--1067. Google ScholarDigital Library
D. Zhang, Y. M. Chee, A. Mondal, A. K. H. Tung, and M. Kitsuregawa. 2009. Keyword search in spatial databases: Towards searching by document. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). 688--699. Google ScholarDigital Library
Y. Zhou, X. Xie, C. Wang, Y. Gong, and W.-Y. Ma. 2005. Hybrid index structures for location-based web search. In Proceedings of the 14^th ACM International Conference on Information and Knowledge Management (CIKM'05). ACM Press, New York, 155--162. Google ScholarDigital Library

Index Terms

Efficient Algorithms and Cost Models for Reverse Spatial-Keyword k-Nearest Neighbor Search
1. Information systems
  1. Information systems applications

Recommendations

Reverse spatial and textual k nearest neighbor search
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Geographic objects associated with descriptive texts are becoming prevalent. This gives prominence to spatial keyword queries that take into account both the locations and textual descriptions of content. Specifically, the relevance of an object to a ...
Read More
Efficient algorithms for answering reverse spatial-keyword nearest neighbor queries
SIGSPATIAL '15: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems

With the proliferation of local services and GPS-enabled mobile phones, reverse spatial-keyword Nearest Neighbor queries are becoming an important type of query. Given a service object (e.g., shop) q as the query, which has a location and a text ...
Read More
Ranked Reverse Nearest Neighbor Search

Given a set of data points P and a query point q in a multidimensional space, Reverse Nearest Neighbor (RNN) query finds data points in P whose nearest neighbors are q. Reverse k-Nearest Neighbor (RkNN) query (where k ≥ 1) generalizes RNN query to find ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Database Systems Volume 39, Issue 2
May 2014
336 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/2627748
Issue’s Table of Contents

Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 May 2014
- Accepted: 1 January 2014
- Revised: 1 July 2013
- Received: 1 June 2012
Published in tods Volume 39, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Reverse k-nearest neighbor queries
performance analysis
spatial-keyword query
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 614
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient Algorithms and Cost Models for Reverse Spatial-Keyword k-Nearest Neighbor Search

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Reverse spatial and textual k nearest neighbor search

Efficient algorithms for answering reverse spatial-keyword nearest neighbor queries

Ranked Reverse Nearest Neighbor Search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient Algorithms and Cost Models for Reverse Spatial-Keyword k-Nearest Neighbor Search

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Reverse spatial and textual k nearest neighbor search

Efficient algorithms for answering reverse spatial-keyword nearest neighbor queries

Ranked Reverse Nearest Neighbor Search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media