Skip to main content

Efficient Combined Text and Spatial Search

  • Conference paper
  • First Online:
Book cover Computational Science and Its Applications -- ICCSA 2015 (ICCSA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9157))

Included in the following conference series:

Abstract

We present a search engine called TexSpaSearch that can search text documents with associated locations in space. We defined three search queries denoted as Q1(t), Q2(tr) and Q3(pr) for finding documents containing text t intersecting a disc centered at position p with radius r. Testing was performed using the UNB Connell Memorial Herbarium database whose records normally contain the location where plant specimens were collected along with associated textual data. The sample herbarium database of size \(N= 40,791\) records with associated locations was indexed using a novel R*-tree and suffix tree data structure to achieve efficient search for the defined queries. Significant preprocessing was performed to transform the database into the index data structure used by TexSpaSearch. Testing was performed with 20 example Q1 text only queries to compare TexSpaSearch to a Google Search Appliance, as well as a significant number of example Q2 and Q3 queries. TexSpaSearch search results are ranked by a modified Lucene scoring algorithm, and combined with a spatial rank for Q2 search. A theoretical analysis shows that TexSpaSearch requires \(O(A^{2}\overline{|b|})\) average time for Q1 search, where A is the number of single words in the query string t, and \(\overline{|b|}\) is the average length of a subphrase in t. Q2 and Q3 queries require \(O(A^{2}\overline{|b|} + Z\log _{\mathcal {M}}\mathcal {D}_N + y)\) and \(O(\log _{\mathcal {M}}\mathcal {D}_N + y)\) time, respectively, where Z is the number of point records in the list \(\mathcal {P}\) of text search results, \(\mathcal {D}_N\) is the number of data objects indexed in the R*-tree for N records, \(\mathcal {M}\) is the maximum number of entries of an interior node in the R*-tree, and y is the number of R*-tree leaf nodes found in range in a Q3 query.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lucene as a ranking engine. http://www.wortcook.com/pdf/lucene-ranking.pdf (accessed November 10, 2013)

  2. Specimen Label Data for the Connell Memorial Herbarium. http://herbarium.biology.unb.ca/fmi/iwp/res/iwp_auth.html

  3. Stopwords. http://www.ranks.nl/stopwords (accessed May 5, 2014)

  4. Suffix tree. http://en.wikipedia.org/wiki/Suffix_tree (accessed June 23, 2011)

  5. Arge, L., de Berg, M., Haverkort, H.J., Yi, K.: The priority r-tree: A practically efficient and worst-case optimal r-tree. ACM Transactions on Algorithms 4(1) (2008)

    Google Scholar 

  6. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The r*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD Conference, pp. 322–331 (1990)

    Google Scholar 

  7. Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: An experimental evaluation. PVLDB 6(3), 217–228 (2013). http://www.vldb.org/pvldb/vol6/p217-chen.pdf

    Google Scholar 

  8. Christoforaki, M., He, J., Dimopoulos, C., Markowetz, A., Suel, T.: Text vs. space: efficient geo-search query processing. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management, CIKM 2011, Glasgow, United Kingdom, pp. 423–432, October 24–28, 2011. http://doi.acm.org/10.1145/2063576.2063641

  9. Fan, J., Li, G., Zhou, L., Chen, S., Hu, J.: SEAL: spatio-textual similarity search. PVLDB 5(9), 824–835 (2012). http://vldb.org/pvldb/vol5/p824_jufan_vldb2012.pdf

    Google Scholar 

  10. Farach, M.: Optimal suffix tree construction with large alphabets. In: FOCS, pp. 137–143 (1997)

    Google Scholar 

  11. Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: From theory to practice. J. Exp. Algorithmics 13, 12:1.12–12:1.31 (2009). http://doi.acm.org/10.1145/1412228.1455268

    Article  MATH  Google Scholar 

  12. Foundation, A.S.: Apache lucene - scoring (2011). letzter Zugriff: 20, Oktober 2011. http://lucene.apache.org/java/3_4_0/scoring.html

  13. Göbel, R., Henrich, A., Niemann, R., Blank, D.: A hybrid index structure for geo-textual searches. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2–6, 2009, pp. 1625–1628. http://doi.acm.org/10.1145/1645953.1646188

  14. Gusfield, D.: Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press (1997)

    Google Scholar 

  15. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD Conference, pp. 47–57 (1984)

    Google Scholar 

  16. Han, D., Nickerson, B.G.: Comparison of text search ranking algorithms. Tech. rep., TR11-209, Faculty of Computer Science. University of New Brunswick, August, 2011

    Google Scholar 

  17. Han, D.A.: Efficient text search with spatial constraints. Tech. rep., TR14-233, Faculty of Computer Science. University of New Brunswick, August, 2014

    Google Scholar 

  18. Heuer, J.T., Dupke, S.: Towards a spatial search engine using geotags. In: Probst, F., Keßler, C. (eds.) GI-Days 2007 - Young Researchers Forum. IfGIprints (2007). http://www.gi-tage.de/downloads/acceptedPapers/heuer.pdf

  19. Jones, C.B., Abdelmoty, A.I., Finch, D., Fu, G., Vaid, S.: The SPIRIT spatial search engine: architecture, ontologies and spatial indexing. In: Egenhofer, M., Freksa, C., Miller, H.J. (eds.) GIScience 2004. LNCS, vol. 3234, pp. 125–139. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  20. Li, Z., Lee, K.C.K., Zheng, B., Lee, W., Lee, D.L., Wang, X.: Ir-tree: An efficient index for geographic document search. IEEE Trans. Knowl. Data Eng. 23(4), 585–599 (2011). http://dx.doi.org/10.1109/TKDE.2010.149

    Article  Google Scholar 

  21. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23(2), 262–272 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  22. Roussopoulos, N., Leifker, D.: Direct spatial search on pictorial databases using packed r-trees. SIGMOD Rec. 14(4), 17–31 (1985). http://doi.acm.org.proxy.hil.unb.ca/10.1145/971699.318900

    Article  Google Scholar 

  23. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  24. Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th Annual Symposium on Switching and Automata Theory, SWAT 1973, pp. 1–11. IEEE Computer Society, Washington, DC (1973) http://portal.acm.org/citation.cfm?id=1441424.1441766

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bradford G. Nickerson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Han, A., Nickerson, B.G. (2015). Efficient Combined Text and Spatial Search. In: Gervasi, O., et al. Computational Science and Its Applications -- ICCSA 2015. ICCSA 2015. Lecture Notes in Computer Science(), vol 9157. Springer, Cham. https://doi.org/10.1007/978-3-319-21470-2_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21470-2_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21469-6

  • Online ISBN: 978-3-319-21470-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics