Abstract
There is a significant commercial and research interest in location-based web search engines. Given a number of search keywords and one or more locations (geographical points) that a user is interested in, a location-based web search retrieves and ranks the most textually and spatially relevant web pages. In this type of search, both the spatial and textual information should be indexed. Currently, no efficient index structure exists that can handle both the spatial and textual aspects of data simultaneously and accurately. Existing approaches either index space and text separately or use inefficient hybrid index structures with poor performance and inaccurate results. Moreover, most of these approaches cannot accurately rank web-pages based on a combination of space and text and are not easy to integrate into existing search engines. In this paper, we propose a new index structure called Spatial-Keyword Inverted File for Points to handle point-based indexing of web documents in an integrated/efficient manner. To seamlessly find and rank relevant documents, we develop a new distance measure called spatial tf-idf. We propose four variants of spatial-keyword relevance scores and two algorithms to perform top-k searches. As verified by experiments, our proposed techniques outperform existing index structures in terms of search performance and accuracy.












Similar content being viewed by others
Notes
We use terms documents and objects interchangeably throughout this paper.
We have to note that although representing locations as points are the current representation of choice on the web, it is not necessarily the most accurate one. For objects with spatial extents, the most accurate presentation is probably a polygon representation. We focus on points in this paper since 1) currently, spatial feature of most objects/documents on the web are represented as points, and 2) we have already showed how to handle cases when the spatial feature is a region in our previous work [1].
For simplicity, we assume that each document has only one location. Multiple locations can be easily handled by using the same methods multiple times—once for each focal point.
β = 0.1 and the number of clusters = 5.
Since the results for DATASET1 and DATASET2 were very similar, we only report the results of the larger dataset—DATASET2.
References
Khodaei A, Shahabi C, Li C (2010) Hybrid indexing and seamless ranking of spatial and textual features of web documents. In: DEXA, pp 450–466
Zhou Y, Xie X, Wang C, Gong Y, Ma W-Y (2005) Hybrid index structures for location-based web search. In: CIKM, pp 155–162
Hariharan R, Hore B, Li C, Mehrotra S (2007) Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems. In: SSDBM, p 16
De Felipe I, Hristidis V, Rishe N (2008) Keyword search on spatial databases. In: ICDE
Zobel J, Moffat A (2006) Inverted files for text search engines. ACM Comput Surv 38(2):6
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Reading
Chen Y, Suel T, Markowetz A (2006) Efficient query processing in geographic web search engines. In: SIGMOD, pp 277–288
McCurley KS (2001) Geospatial mapping and navigation of the web. In: WWW, pp 221–229
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Readings in information retrieval. Morgan Kaufmann Publishers Inc
Cong G, Jensen CS, Wu D (2009) Efficient retrieval of the top-k most relevant spatial web objects. In: Proc. VLDB endow. 2, 1 (August 2009), pp 337–348
Vaid S, Jones CB, Joho H, Sanderson M (2005) Spatio-textual indexing for geographical search on the web. In: SSTD
Amitay E, HarEl N, Sivan R, Soffer A (2004) Web-a-where: geotagging web content. In: SIGIR, pp 273–280
Ding J, Gravano L, Shivakumar N (2000) Computing geographical scopes of web resources. In: VLDB, pp 545–556
Gao W, Lee HC, Miao Y (2006) Geographically focused collaborative crawling. In: WWW
Zobel J (1995) Adding compression to a full-text retrieval system. Softw Pract Exp 25(8):891–903
Haveliwala T (2002) Topic-sensitive PageRank. In: WWW
Manning C, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Jeong B-S, Omiecinski E (1995) Inverted file partitioning schemes in multiple disk systems. IEEE Trans Parallel Distrib Syst 6:2
Alsubaiee S, Behm A, Chen L (2010) Supporting location-based approximate-keyword queries. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems (GIS ’10), pp 61–70
Wang Z, Du M, Le J (2009) gR*-tree: an index for querying approximate keywords in geographic information system. In: Information engineering and computer science
Zhang D, Chee YM, Mondal A, Tung AKH, Kitsuregawa M (2009) Keyword search in spatial databases: towards searching by document. In: ICDE, pp 688–699
Cormode R, Shkapenyuk V, Srivastava D, Xu B (2009) Forward decay: a practical time decay model for streaming systems. In: ICDE
Cohen E, Strauss MJ (2006) Maintaining time-decaying stream aggregates. J Algorithms 59:1
Cao X, Cong G, Jensen CS (2010) Retrieving top-k prestige-based relevant spatial web objects. In: Proc. VLDB endow., vol 3, pp 1–2
Tobler W (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46:234–240
Long X, Suel T (2005) Three-level caching for effcient query processing in large web search engines. In: WWW, pp 257–266
Acknowledgements
Ali Khodaei and Cyrus Shahabi’s research has been funded in part by NSF grants CNS-0831505 (CyberTrust) and IS-1115153, the USC Integrated Media Systems Center (IMSC), and unrestricted cash and equipment gifts from Google, Microsoft and Qualcomm. Chen Li is partially supported by the US NSF IIS 1030002 award and the National Natural Science Foundation of China (No. 61129002). Any opinions,findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Khodaei, A., Shahabi, C. & Li, C. SKIF-P: a point-based indexing and ranking of web documents for spatial-keyword search. Geoinformatica 16, 563–596 (2012). https://doi.org/10.1007/s10707-011-0142-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-011-0142-7