Skip to main content
Log in

SKIF-P: a point-based indexing and ranking of web documents for spatial-keyword search

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

There is a significant commercial and research interest in location-based web search engines. Given a number of search keywords and one or more locations (geographical points) that a user is interested in, a location-based web search retrieves and ranks the most textually and spatially relevant web pages. In this type of search, both the spatial and textual information should be indexed. Currently, no efficient index structure exists that can handle both the spatial and textual aspects of data simultaneously and accurately. Existing approaches either index space and text separately or use inefficient hybrid index structures with poor performance and inaccurate results. Moreover, most of these approaches cannot accurately rank web-pages based on a combination of space and text and are not easy to integrate into existing search engines. In this paper, we propose a new index structure called Spatial-Keyword Inverted File for Points to handle point-based indexing of web documents in an integrated/efficient manner. To seamlessly find and rank relevant documents, we develop a new distance measure called spatial tf-idf. We propose four variants of spatial-keyword relevance scores and two algorithms to perform top-k searches. As verified by experiments, our proposed techniques outperform existing index structures in terms of search performance and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. We use terms documents and objects interchangeably throughout this paper.

  2. http://www.twitter.com

  3. http://www.flickr.com

  4. http://www.youtube.com

  5. http://www.nytimes.com

  6. http://www.yelp.com.

  7. http://www.foursquare.com

  8. We have to note that although representing locations as points are the current representation of choice on the web, it is not necessarily the most accurate one. For objects with spatial extents, the most accurate presentation is probably a polygon representation. We focus on points in this paper since 1) currently, spatial feature of most objects/documents on the web are represented as points, and 2) we have already showed how to handle cases when the spatial feature is a region in our previous work [1].

  9. For simplicity, we assume that each document has only one location. Multiple locations can be easily handled by using the same methods multiple times—once for each focal point.

  10. http://developer.yahoo.com/geo/placemaker/

  11. http://www.flickr.com/services/api/

  12. β = 0.1 and the number of clusters = 5.

  13. Since the results for DATASET1 and DATASET2 were very similar, we only report the results of the larger dataset—DATASET2.

  14. https://www.mturk.com/

References

  1. Khodaei A, Shahabi C, Li C (2010) Hybrid indexing and seamless ranking of spatial and textual features of web documents. In: DEXA, pp 450–466

  2. Zhou Y, Xie X, Wang C, Gong Y, Ma W-Y (2005) Hybrid index structures for location-based web search. In: CIKM, pp 155–162

  3. Hariharan R, Hore B, Li C, Mehrotra S (2007) Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems. In: SSDBM, p 16

  4. De Felipe I, Hristidis V, Rishe N (2008) Keyword search on spatial databases. In: ICDE

  5. Zobel J, Moffat A (2006) Inverted files for text search engines. ACM Comput Surv 38(2):6

    Article  Google Scholar 

  6. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Reading

    Google Scholar 

  7. Chen Y, Suel T, Markowetz A (2006) Efficient query processing in geographic web search engines. In: SIGMOD, pp 277–288

  8. McCurley KS (2001) Geospatial mapping and navigation of the web. In: WWW, pp 221–229

  9. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Readings in information retrieval. Morgan Kaufmann Publishers Inc

  10. Cong G, Jensen CS, Wu D (2009) Efficient retrieval of the top-k most relevant spatial web objects. In: Proc. VLDB endow. 2, 1 (August 2009), pp 337–348

  11. Vaid S, Jones CB, Joho H, Sanderson M (2005) Spatio-textual indexing for geographical search on the web. In: SSTD

  12. Amitay E, HarEl N, Sivan R, Soffer A (2004) Web-a-where: geotagging web content. In: SIGIR, pp 273–280

  13. Ding J, Gravano L, Shivakumar N (2000) Computing geographical scopes of web resources. In: VLDB, pp 545–556

  14. Gao W, Lee HC, Miao Y (2006) Geographically focused collaborative crawling. In: WWW

  15. Zobel J (1995) Adding compression to a full-text retrieval system. Softw Pract Exp 25(8):891–903

    Article  Google Scholar 

  16. Haveliwala T (2002) Topic-sensitive PageRank. In: WWW

  17. Manning C, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Google Scholar 

  18. Jeong B-S, Omiecinski E (1995) Inverted file partitioning schemes in multiple disk systems. IEEE Trans Parallel Distrib Syst 6:2

    Google Scholar 

  19. Alsubaiee S, Behm A, Chen L (2010) Supporting location-based approximate-keyword queries. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems (GIS ’10), pp 61–70

  20. Wang Z, Du M, Le J (2009) gR*-tree: an index for querying approximate keywords in geographic information system. In: Information engineering and computer science

  21. Zhang D, Chee YM, Mondal A, Tung AKH, Kitsuregawa M (2009) Keyword search in spatial databases: towards searching by document. In: ICDE, pp 688–699

  22. Cormode R, Shkapenyuk V, Srivastava D, Xu B (2009) Forward decay: a practical time decay model for streaming systems. In: ICDE

  23. Cohen E, Strauss MJ (2006) Maintaining time-decaying stream aggregates. J Algorithms 59:1

    Article  Google Scholar 

  24. Cao X, Cong G, Jensen CS (2010) Retrieving top-k prestige-based relevant spatial web objects. In: Proc. VLDB endow., vol 3, pp 1–2

  25. Tobler W (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46:234–240

    Article  Google Scholar 

  26. Long X, Suel T (2005) Three-level caching for effcient query processing in large web search engines. In: WWW, pp 257–266

Download references

Acknowledgements

Ali Khodaei and Cyrus Shahabi’s research has been funded in part by NSF grants CNS-0831505 (CyberTrust) and IS-1115153, the USC Integrated Media Systems Center (IMSC), and unrestricted cash and equipment gifts from Google, Microsoft and Qualcomm. Chen Li is partially supported by the US NSF IIS 1030002 award and the National Natural Science Foundation of China (No. 61129002). Any opinions,findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Khodaei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khodaei, A., Shahabi, C. & Li, C. SKIF-P: a point-based indexing and ranking of web documents for spatial-keyword search. Geoinformatica 16, 563–596 (2012). https://doi.org/10.1007/s10707-011-0142-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-011-0142-7

Keywords

Navigation