Skip to main content
Log in

Ontology-driven discovery of geospatial evidence in web pages

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

When users need to find something on the Web that is related to a place, chances are place names will be submitted along with some other keywords to a search engine. However, automatic recognition of geographic characteristics embedded in Web documents, which would allow for a better connection between documents and places, remains a difficult task. We propose an ontology-driven approach to facilitate the process of recognizing, extracting, and geocoding partial or complete references to places embedded in text. Our approach combines an extraction ontology with urban gazetteers and geocoding techniques. This ontology, called OnLocus, is used to guide the discovery of geospatial evidence from the contents of Web pages. We show that addresses and positioning expressions, along with fragments such as postal codes or telephone area codes, provide satisfactory support for local search applications, since they are able to determine approximations to the physical location of services and activities named within Web pages. Our experiments show the feasibility of performing automated address extraction and geocoding to identify locations associated to Web pages. Combining location identifiers with basic addresses improved the precision of extractions and reduced the number of false positive results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://maps.google.com

  2. http://local.yahoo.com

  3. Regular expressions are constructs that specify a pattern used for matching character strings, usually employed in text processing [1, 19].

  4. http://middleware.alexandria.ucsb.edu/client/gaz/adl/index.jsp

  5. http://www.geonames.org/

  6. Previous works [15] have precisely characterized spatial relations using point-set and other mathematical concepts, and named each resulting relation. While these names are traditional in the GIS community, not everybody uses them in natural language, and their interpretation remains ambiguous for our purposes.

  7. http://protege.stanford.edu

  8. http://www.opencyc.com

  9. http://www.ontologyportal.org

  10. http://www.getty.edu/research/conducting_research/vocabularies/tgn/

References

  1. Aho AV (1990) Algorithms for finding patterns in strings. Handbook of theoretical computer science. In: van Leeuwen J (ed) Volume A: Algorithms and complexity. The MIT Press, pp 255–300

  2. Amitay E, Har’El N, Sivan R, Soffer A (2004) Web-a-Where: Geotagging Web Content. Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, pp 273–280

  3. Arampatzis A, van Kreveld M, Reinbacher I, Jones CB, Vaid S, Clough P, Joho H, Sanderson M (2006) Web-based delineation of imprecise regions. Comput Environ Urban Syst 30:436–459

    Article  Google Scholar 

  4. Borges KAV (2006) Use of an ontology of urban places for recognition and extraction of geospatial evidences on the web (in Portuguese). Belo Horizonte (MG), Brazil, Federal University of Minas Gerais

  5. Borges KAV, Davis CA Jr, Laender AHF (2001) OMT-G: an object-oriented data model for geographic applications. GeoInformatica 5(3):221–260

    Article  Google Scholar 

  6. Borges KAV, Laender AHF, Medeiros CB, Davis CA Jr (2007) Discovering geographic locations in web pages using urban addresses. Proceedings of the 4th ACM Workshop on Geographic Information Retrieval, Lisbon, Portugal, pp 31–36

  7. Borges KAV, Laender AHF, Medeiros CB, Silva AS, Davis CA Jr (2003) The web as a data source for spatial databases. Proc. of the V Brazilian Symp. on GeoInformatics, Campos do Jordão (SP), Brazil: CD-ROM

  8. Buneman P, Khanna S, Tan W-C (2000) Data provenance: some basic issues. FST TCS 2000: Foundations of software technology and theoretical computer science: 20th conference. New Delhi, India: p87

  9. Casati R, Varzi AC (1996) The structure of spatial localization. Philos Stud 82:205–239

    Article  Google Scholar 

  10. Clementini E, DiFelice P, van Oosterom P (1993) A small set of formal topological relationships suitable for end-user interaction. 3rd Symposium on Spatial Database Systems: 277–295

  11. Davis CA Jr, Fonseca FT (2007) Assessing the certainty of locations produced by an address geocoding system. Geoinformatica 11(1):103–129

    Article  Google Scholar 

  12. Davis CA Jr, Fonseca FT, Borges KAV (2003) A flexible addressing system for approximate urban geocoding. V Brazilian Symposium on GeoInformatics (GeoInfo 2003), Campos do Jordão (SP):CD-ROM

  13. Delboni TM, Borges KAV, Laender AHF, Davis CA Jr (2007) Semantic expansion of geographic web queries based on natural language positioning expressions. Trans GIS 11(3):377–397

    Article  Google Scholar 

  14. Ding J, Gravano L, Shivakumar N (2000) Computing geographical scopes of web resources. Proceedings of the 26th International Conference on Very Large Databases, Cairo, Egypt: 545–556

  15. Egenhofer M, Franzosa R (1991) Point-set topological spatial relations. Int J Geogr Inf Syst 5(2):161–174

    Article  Google Scholar 

  16. Egenhofer MJ (2002) Toward the semantic geospatial web. Geographic Information Science 2002. McLean, Virginia, pp 1–4

    Google Scholar 

  17. Embley DW (2004) Toward semantic understanding—an approach based on information extraction ontologies. Proceedings of the 15th Australasian Database Conference, Dunedin, New Zealand, pp 18–22

  18. Embley DW, Campbell DM, Jiang YS, Liddle SW, Lonsdale DW, Ng Y-K, Quass D, Smith RD (1999) Conceptual-model-based data extraction from multiple-record web pages. Data Knowl Eng 31(3):227–251

    Article  Google Scholar 

  19. Friedl J (2002) Mastering regular expressions. O’Reilly

  20. Fu G, Jones CB, Abdelmoty A (2005) Building a geographical ontology for intelligent spatial search on the web. Proc. of the IASTED Int’l Conf. on Databases and Applications, Innsbruck, Austria, pp 167–172

  21. Goodchild MF, Hill LL (2008) Introduction to digital gazetteer research. Int J Geogr Inf Sci 22(10):1039–1044

    Article  Google Scholar 

  22. Goyal RK (2000) Similarity assessment for caardinal directions between extended spatial objects. Orono, Maine, University of Maine, p189

  23. Hill LL (2000) Core elements of digital gazetteers: placenames, categories, and footprints. 4th European Conference on Research and Advanced Technology for Digital Libraries, pp 280–290

  24. Himmelstein H (2005) Local search: the internet is the yellow pages. IEEE Comput 38(2):26–35

    Google Scholar 

  25. Jones CB, Purves R, Ruas A, Sanderson M, Sester M, van Kreveld M, Weibel R (2002) Spatial information retrieval and geographic ontologies: an overview of the SPIRIT project. ACM SIGIR conference on Research and development in information retrieval, Tampere, Finland, pp 387–388

  26. Jones CB, Purves RS, Clough PD, Joho H (2008) Modelling vague places with knowledge from the web. Int J Geogr Inf Sci 22(10):1045–1065

    Article  Google Scholar 

  27. Laender AHF, Borges KAV, Carvalho JCP, Medeiros CB, Silva AS, Davis CA Jr (2005) Integrating web data and geographic knowledge into spatial databases. Spatial databases: techniques, technologies and trends. In: Manolopoulos Y, Papadopoulos A, Vassilakopoulos M. Hershey Pennsylvania, USA, Idea Group Publishing, pp 23–48.

  28. Larson RR (1996) Geographic information retrieval and spatial browsing. Geographic information systems and libraries: patrons, maps, and spatial information. In: Smith LC, Gluck M (eds). Urbana, IL, Un. of Illinois, pp 81–123

  29. Manov D, Kiryakov A, Popov B, Bontcheva K, Maynard D, Cunningham H (2003) Experiments with knowledge for extraction. Proceedings of the Human Language Technology Conference Workshop on Analysis of Geographic, Edmonton, Canada, pp 1–9

  30. Martins B, Silva MJ, Freitas S, Afonso AP (2006) Handling locations in search engine queries. Proceedings of the 3rd ACM Workshop on Geographical Information Retrieval (GIR 2006), Seattle, Washington, USA

  31. McCurley KS (2001) Geospatial mapping and navigation on the web. Tenth International World Wide Web Conference (WWW10), Hong Kong, ACM, pp 221–229

  32. Miller C (2006) A beast in the field: the google maps mashup as GIS/2. Cartographica Int J Geogr Inf Vis 41(3):187–199

    Article  Google Scholar 

  33. Modesto M, Pereira Á Jr, Ziviani N, Castillo C, Baeza-Yates R (2005) A new portrait of the Brazilian Web (in Portuguese). Proceedings of the XXXII Seminar on Integrated Software and Hardware (SEMISH 2005), São Leopoldo (RS), Brazil, pp 2005–2016

  34. Rhind G (1999) Global sourcebook of address data management: a guide to address formats and data in 194 countries gower

  35. Rushton G, Armstrong MP, Gittler J, Greene BR, Pavlik CE, West MM, Zimmerman DL (2006) Geocoding in cancer research: a review. Am J Preventative Med 30(2S):S16–S24

    Article  Google Scholar 

  36. Sanderson M, Kohler J (2004) Analyzing geographic queries. Proc. of the ACM SIGIR Workshop on Geographic Information Retrieval, Sheffield, UK, pp 1–2

  37. Schockaert S, De Cock M, Kerre EE (2008) Location approximation for local search services using natural language hints. Int J Geogr Inf Sci 22(3):315–336

    Article  Google Scholar 

  38. Scowen RS (1993) Extended BNF—a generic base standard. Proceedings of the 1993 Software Engineering standards Symposium (SESS’93), Brighton, UK

  39. Sengar V, Joshi T, Joy J, Prakash S, Toyama K (2007) Robust location search from text queries. Proceedings of th 15th International Conference on Advances in Geographic Information Systems (ACM GIS 2007), Seattle, Washington, USA

  40. Silva MJ, Martins B, Chaves M, Cardoso N, Afonso AP (2006) Adding geographic scopes to web resources. Comput Environ Urban Syst 30:378–399

    Article  Google Scholar 

  41. Smith J, Smith D (1977) Database abstractions: aggregation and generalization. ACM Trans Database Syst 2(2):105–133

    Article  Google Scholar 

  42. Souza LA, Davis CA Jr, Borges KAV, Delboni TM, Laender AHF (2005) The role of gazetteers in geographic knowledge discovery on the web. 3rd Latin American Web Congress, Buenos Aires, Argentina, pp 157–165

  43. Spaccapietra S, Cullot N, Parent C, Vangenot C (2004) On spatial ontologies. VI Brazilian Symposium on GeoInformatics (GeoInfo 2004), Campos do Jordão (SP), Brazil:CD-ROM

  44. Sui DT (2008) The wikification of GIS and its consequences: or Angelina Jolie’s new tattoo and the future of GIS. Comput Environ Urban Syst 32(1):1–5

    Article  Google Scholar 

  45. Sun G, Chen J, Guo W, Ray Liu KJ (2005) Signal processing techniques in network-aided positioning: a survey of state-of-the-art positioning designs. IEEE Signal Process Mag 22(4):12–23

    Article  Google Scholar 

  46. Tsichritzis D, Klug AC (1978) The ANSI/X3/SPARC DBMS framework report of the study group on dabatase management systems. Inf Syst 3(3):173–191

    Article  Google Scholar 

  47. U.S. Census Bureau. (2003, March 2003). “108th CD Census 2000 TIGER/Line Files Technical Documentation.” Retrieved March 2009, from http://www.census.gov/geo/www/tiger/tgrcd108/tgr108cd.pdf

  48. Wang C, Xie X, Wang L, Lu Y, Ma W (2005) Detecting geographic locations from web resources. Proc. of the 2nd Int’l Workshop on Geographic Information Retrieval, Bremen, Germany, pp 17–24

  49. Zandbergen PA (2008) A comparison of address point, parcel and street geocoding techniques. Comput Environ Urban Syst 32(2008):214–232

    Article  Google Scholar 

  50. Zong W, Wu D, Sun A, Lim E, Goh DHG (2005) On assigning place names to geographic related web pages. Proc. of the 5th ACM/IEEE-CS Joint Conf. on Digital Libraries, Denver, Colorado, USA, pp 354–362

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Clodoveu A. Davis Jr.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Borges, K.A.V., Davis, C.A., Laender, A.H.F. et al. Ontology-driven discovery of geospatial evidence in web pages. Geoinformatica 15, 609–631 (2011). https://doi.org/10.1007/s10707-010-0118-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-010-0118-z

Keywords

Navigation