skip to main content
research-article

Efficient Algorithms and Cost Models for Reverse Spatial-Keyword k-Nearest Neighbor Search

Authors Info & Claims
Published:26 May 2014Publication History
Skip Abstract Section

Abstract

Geographic objects associated with descriptive texts are becoming prevalent, justifying the need for spatial-keyword queries that consider both locations and textual descriptions of the objects. Specifically, the relevance of an object to a query is measured by spatial-textual similarity that is based on both spatial proximity and textual similarity. In this article, we introduce the Reverse Spatial-Keyword k-Nearest Neighbor (RSKkNN) query, which finds those objects that have the query as one of their k-nearest spatial-textual objects. The RSKkNN queries have numerous applications in online maps and GIS decision support systems.

To answer RSKkNN queries efficiently, we propose a hybrid index tree, called IUR-tree (Intersection-Union R-tree) that effectively combines location proximity with textual similarity. Subsequently, we design a branch-and-bound search algorithm based on the IUR-tree. To accelerate the query processing, we improve IUR-tree by leveraging the distribution of textual description, leading to some variants of the IUR-tree called Clustered IUR-tree (CIUR-tree) and combined clustered IUR-tree (C2IUR-tree), for each of which we develop optimized algorithms. We also provide a theoretical cost model to analyze the efficiency of our algorithms. Our empirical studies show that the proposed algorithms are efficient and scalable.

References

  1. E. Achtert, C. Böhm, P. Kröger, and P. Kunath. 2006. Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). 515--526. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Achtert, H.-P. Kriegel, P. Kröger, M. Renz, and A. Zufle. 2009. Reverse k-nearest neighbor search in dynamic and general metric databases. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (EDBT'09). 886--897. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Berchtold, C. Bohm, D. Keim, and H. Kriegel. 1997. A cost model for nearest neighbour search in high-dimensional data space. In Proceedings of the 16th ACM Conference on Principles of Database Systems (PODS'97). 78--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Bohm and H. Kriegel. 2001. A cost model and index architecture for the similarity join. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'01). 411--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Boriah, V. Chandola, and V. Kumar. 2008. Similarity measures for categorical data: A comparative evaluation. In Proceedings of the SIAM International Conference on Data Mining. 243--254.Google ScholarGoogle Scholar
  6. X. Cao, G. Cong, and C. S. Jensen. 2010. Retrieving top-k prestige-based relevant spatial web objects. Proc. VLDB Endow. 3, 1, 373--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. A. Cheema, X. Lin, W. Zhang, and Y. Zhang. 2011. Influence zone: Efficiently processing reverse k nearest neighbors queries. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). 577--588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. A. Cheema, X. Lin, W. Zhang, and Y. Zhang. 2012. Efficiently processing snapshot and continuous reverse k nearest neighbors queries. The VLDB J. 21, 5, 703--728. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. A. Cheema, X. Lin, Y. Zhang, W. Wang, and W. Zhang. 2009. Lazy updates: An efficient technique to continuously monitoring reverse knn. Proc. VLDB Endow. 2, 1, 1138--1149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Cong, C. S. Jensen, and D. Wu. 2009. Efficient retrieval of the top-k most relevant spatial web objects. Proc. VLDB Endow. 2, 1, 337--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Corral, Y. Manolopoulos, Y. Theodoridis, and M. Vassilakopoulos. 2006. Cost models for distance joins queries using r-trees. Data Knowl. Engin. 57, 1, 1--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. H. Degroot and M. J. Schervish. 2004. Probability and Statistics. Pearson Education.Google ScholarGoogle Scholar
  13. T. Emrich, H.-P. Kriegel, P. Kroger, M. Renz, N. Xu, and A. Zufle. 2010. Reverse k-nearest neighbor monitoring on mobile objects. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS'10). 494--497. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge, Discovery and Data Mining (KDD'96). 226--231.Google ScholarGoogle Scholar
  15. R. Fagin, A. Lotem, and M. Naor. 2003. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 614--656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Faloutsos and I. Kamel. 1994. Beyond uniformity and independence: Analysis of r-trees using the concept of fractal dimension. In Proceeding of the 13th ACM SIGACT-SIGMODE-SIGART Symposium on Principles of Database Systems. 4--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Faloutsos, T. K. Sellis, and N. Roussopoulos. 1987. Analysis of object oriented spatial access methods. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'87). 426--439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. D. Felipe, V. Hristidis, and N. Rishe. 2008. Keyword search on spatial databases. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'08). 656--665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. A. Fox, Q. F. Chen, A. M. Daoud, and L. S. Heath. 1991. Order-preserving minimal perfect hash functions and information retrieval. ACM Trans. Inf. Syst. 9, 3, 281--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Guttman. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'84). 47--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. H. Haveliwala, A. Gionis, D. Klein, and P. Indyk. 2002. Evaluating strategies for similarity search on the web. In Proceedings of the 11th International Conference on World Wide Web (WWW'02). 432--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Huang. 2008. Similarity measures for text document clustering. In Proceedings of the New Zealand Computer Science Research Student Conference. 49--56.Google ScholarGoogle Scholar
  23. Y. Huang, N. Jing, and E. A. Rundensteiner. 1997. A cost model for estimating the performance of spatial joins using r-trees. In Proceedings of the 9th International Conference on Scientific and Statistical Database Management (SSDBM'97). 30--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Johnson and D. Shasha. 1994. 2q: A low overhead high performance buffer management replacement algorithm. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94). 439--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. I. Kamel and C. Faloutsos. 1993. On packing r-trees. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'93). 490--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. M. Kang, M. F. Mokbel, S. Shekhar, T. Xia, and D. Zhang. 2007. Continuous evaluation of monochromatic and bichromatic reverse nearest neighbors. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). 806--815.Google ScholarGoogle Scholar
  27. A. Khodaei, C. Shahabi, and C. Li. 2012. Skif-p: A point-based indexing and ranking of web documents for spatial-keyword search. Geoinformatica 16, 3, 563--596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. Korn and S. Muthukrishnan. 2000. Influenced sets based on reverse nearest neighbor queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'00). 201--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F. Korn, B. Pagel, and C. Faloutsos. 2001. On the ‘dimensionlity curse’ and the ‘self-similarity blessing’. IEEE Trans. Knowl. Data Engin. 13, 1, 96--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Kullback and R. A. Leibler. 1951. On information and sufficiency. Ann. Math. Statist. 22, 1, 79--86.Google ScholarGoogle ScholarCross RefCross Ref
  31. M. D. Lee and M. Welsh. 2005. An empirical evaluation of models of text document similarity. In Proceedings of the Annual Conference of the Cognitive Science Society (CogSci'05). 1254--1259.Google ScholarGoogle Scholar
  32. Z. Li, K. C. K. Lee, B. Zheng, W.-C. Lee, D. L. Lee, and X. Wang. 2011. Ir-tree: An efficient index for geographic document search. IEEE Trans. Knowl. Data Engin. 23, 4, 585--599. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K.-I. Lin, M. Nolen, and C. Yang. 2003. Applying bulk insertion techniques for dynamic reverse nearest neighbor problems. In Proceedings of the International Database Engineering and Applications Symposium (IDEAS'03). 290--297.Google ScholarGoogle Scholar
  34. J. Lu, Y. Lu, and G. Cong. 2011. Reverse spatial and textual k nearest neighbor search. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'11). 349--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. B. Pagel, H.-W. Six, H. Toben, and P. Widmayer. 1993. Towards an analysis of range query performance in spatial data structures. In Proceeding of the 12th ACM SIGACT-SIGMODE-SIGART Symposium on Principles of Database Systems. 214--221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Papadopoulos and Y. Manolopoulos. 1997. Performance of nearest neighbour queries in r-trees. In Proceeding of the 6th International Conference on Database Theory. 394--408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. N. Roussopoulos, S. Kelley, and F. Vincent. 1995. Nearest neighbor queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'95). 71--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Saltenis, C. S. Jensen, S. T. Leutenegger, and M. A. Lopez. 2000. Indexing the positions of continuously moving objects. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'00). 331--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Salton. 1988. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. Int. J. 24, 5, 513--523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. A. Singh, H. Ferhatosmanoglu, and A. S. Tosun. 2003. High dimensional reverse nearest neighbor queries. In Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM'03). 91--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. I. Stanoi, D. Agrawal, and A. E. Abbadi. 2000. Reverse nearest neighbor queries for dynamic databases. In Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. 44--53.Google ScholarGoogle Scholar
  42. I. Stanoi, M. Riedewald, D. Agrawal, and A. Abbadi. 2001. Discovery of influence sets in frequently updated databases. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01). 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Strehl, E. Strehl, J. Ghosh, and R. Mooney. 2000. Impact of similarity measures on web-page clustering. In Proceedings of the Workshop on Artificial Intelligence for Web Search (AAAI'00). 58--64.Google ScholarGoogle Scholar
  44. P.-N. Tan, M. Steinbach, and V. Kumar. 2005. Introduction to Data Mining. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Y. Tao and D. Papadias. 2003. Spatial queries in dynamic environments. ACM Trans. Database Syst. 28, 2, 101--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Y. Tao, D. Papadias, and X. Lian. 2004a. Reverse knn search in arbitrary dimensionality. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB'04). 744--755. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Y. Tao, J. Zhang, D. Papadias, and N. Mamoulis. 2004b. An efficient cost model for optimization of nearest neighbour search in low and medium dimensional spaces. IEEE Trans. Knowl. Data Engin. 16, 1169--1184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Y. Theodoridis and T. Sellis. 1996. A model for the prediction of r-tree performance. In Proceedings of the 15th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'96). 161--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Y. Theodoridis, E. Stefanakis, and T. Sellis. 2000. Efficient cost models for spatial queries using r-trees. IEEE Trans. Knowl. Data Engin. 12, 1, 19--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. E. C. Titchmarsh. 2005. The Theory of the Riemann Zeta-Function. Oxford University Press.Google ScholarGoogle Scholar
  51. S. Vaid, C. B. Jones, H. Joho, and M. Sanderson. 2005. Spatio-textual indexing for geographical search on the web. In Proceedings of the International Conference on Advances in Spatial and Temporal Databases (SSTD'05). 218--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. A. Vlachou, C. Doulkeridis, Y. Kotidis, and K. Nørvag. 2010. Reverse top-k queries. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'10). 365--376.Google ScholarGoogle Scholar
  53. W. Wu, F. Yang, C. Y. Chan, and K.-L. Tan. 2008a. Continuous reverse k-nearest-neighbor monitoring. In Proceedings of the International Conference on Mobile Data Management (MDM'08). 132--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. W. Wu, F. Yang, C.-Y. Chan, and K.-L. Tan. 2008b. Finch: Evaluating reverse k-nearest-neighbor queries on location data. Proc. VLDB. Endow. 1, 1, 1056--1067. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. D. Zhang, Y. M. Chee, A. Mondal, A. K. H. Tung, and M. Kitsuregawa. 2009. Keyword search in spatial databases: Towards searching by document. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). 688--699. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Y. Zhou, X. Xie, C. Wang, Y. Gong, and W.-Y. Ma. 2005. Hybrid index structures for location-based web search. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM'05). ACM Press, New York, 155--162. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient Algorithms and Cost Models for Reverse Spatial-Keyword k-Nearest Neighbor Search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Database Systems
      ACM Transactions on Database Systems  Volume 39, Issue 2
      May 2014
      336 pages
      ISSN:0362-5915
      EISSN:1557-4644
      DOI:10.1145/2627748
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 May 2014
      • Accepted: 1 January 2014
      • Revised: 1 July 2013
      • Received: 1 June 2012
      Published in tods Volume 39, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader