Skip to main content
Log in

Publishing deep web geographic data

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

This article introduces a design process, called W-RayS, to describe Deep Web geographic data and to publish the descriptions both on the Web of Data and on the Surface Web. The article also outlines a toolkit that supports the process and discusses an experiment in which the toolkit was used to publish data stored in a large map server. Briefly, to describe geographic data in vector format, the designer should first specify views over the underlying geographic database that capture the basic characteristics of the geographic objects and their topological relationships represented in the vector data. The same idea is applied to raster data, but using a gazetteer or any other geographic database that covers the same area as the raster data. Then, the designer should map the view definitions to an RDF schema, following the Linked Data principles. The descriptions of the geographic data are therefore formalized as sets of RDF triples synthesized from the conventional data. To publish geographic data descriptions on the Web of Data, the designer may decide to materialize the RDF triples and store them in a repository or create a SPARQL endpoint to access the triples on demand. To publish geographic data descriptions on the Surface Web, W-RayS offers the designer tools to transform the RDF triples to natural language sentences, organized as static Web pages with embedded RDFa. The inclusion of RDFa preserves the structure of the data and allows more specific queries, processed by engines that analyze Web pages with RDFa.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.w3.org/standards/semanticweb/data

  2. www.inf.puc-rio.br/~hpiccinini/

  3. A Part-of-Speech Tagger marks (“tags”) a word in a text (within a corpus) with its corresponding part of speech, based on its definition and its relationship with adjacent and related words or phrases in a clause or paragraph. Part-of-speech is a linguistic category of words or lexical items that is usually defined by the syntactic or morphologic behavior of the lexical item in question. Common linguistic categories include nouns and verbs.

  4. http://mapas.ibge.gov.br

  5. The definition of the RDF schema is available at www.inf.puc-rio.br/~hpiccinini/wray/biome.owl

  6. The complete RDF schema is available at www.inf.puc-rio.br/~hpiccinini/wray/image.owl

  7. http://mapas.ibge.gov.br

References

  1. Bergman MK (2001) The Deep Web: Surfacing Hidden Value. The Journal of Electronic Publishing 7(1). doi:10.3998/3336451.0007.104

  2. Madhavan J, Afanasiev L, Antova L, Halevy A (2009) Harnessing the Deep Web: Present and Future (Vol. cs.DB). Presented at the Fourth Biennial Conference on Innovative Data Systems Research

  3. Esri (2013) ArcGIS Online. esri.com. Retrieved November 3, 2013, from http://www.esri.com/software/arcgis/arcgisonline

  4. MapServer open source web mapping. (2013). MapServer open source web mapping. mapserver.org. Retrieved November 3, 2013, from http://mapserver.org

  5. Martins B, Silva MJ, Chaves M (2007). O sistema CaGE no HAREM-reconhecimento de entidades geográficas em textos em língua portuguesa. Linguateca

  6. Szekely P, Knoblock CA, Gupta S, Taheriyan M, Wu B (2011) Exploiting semantics of web services for geospatial data fusion (pp. 32–39). Presented at the Proceedings of the 1st ACM SIGSPATIAL International Workshop on Spatial Semantics and Ontologies, ACM Press. doi:10.1145/2068976.2068981

  7. Cafarella MJ, Halevy A, Madhavan J (2011) Structured data on the web. Commun ACM 54(2):72–79. doi:10.1145/1897816.1897839

    Article  Google Scholar 

  8. Madhavan J, Ko D, Kot L, Ganapathy V, Rasmussen A, Halevy A (2008) Google’s Deep Web crawl (Vol. 1, pp. 1241–1252). Presented at the Proceedings of the VLDB Endowment, VLDB Endowment

  9. Maiti A, Dasgupta A, Zhang N, Das G (2009) HDSampler: revealing data behind web form interfaces (pp. 1131–1134). Presented at the Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, ACM. doi:10.1145/1559845.1560001

  10. Raghavan S, Garcia-Molina H (2001) Crawling the Hidden Web - Stanford InfoLab Publication Server. Presented at the Proceedings of the 27th International Conference on Very Large Data Bases. Retrieved from http://ilpubs.stanford.edu:8090/725/

  11. Callan J (2002) Distributed Information Retrieval. In Advances in Information Retrieval 7:127–150. doi:10.1007/0-306-47019-5_5, Springer

    Article  Google Scholar 

  12. Cafarella MJ, Halevy A, Khoussainova N (2009) Data integration for the relational web. Proceedings of the VLDB Endowment (PVLDB) 2(1):1090–1101

    Article  Google Scholar 

  13. He B, Zhang Z, Chang K. C.-C (2005) MetaQuerier: querying structured web sources on-the-fly (pp. 927–929). Presented at the Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, ACM Request Permissions. doi:10.1145/1066157.1066291

  14. He H, Meng W, Yu C, Wu Z (2005) WISE-Integrator: a system for extracting and integrating complex web search interfaces of the deep web (pp. 1314–1317). Presented at the Proceedings of the 31st International Conference on Very Large Data Bases

  15. Kabisch T, Dragut E. C, Yu C, Leser U (2010) Deep web integration with VisQI (Vol. 3, pp. 1613–1616). Presented at the Proceedings of the VLDB Endowment (PVLDB)

  16. Rajaraman A (2009) Kosmix: high-performance topic exploration using the deep web (Vol. 2, pp. 1524–1529). Presented at the Proceedings of the VLDB Endowment (PVLDB)

  17. Berners-Lee T. (2006, July 27). Linked Data - Design Issues. w3.org. W3C. Retrieved October 31, 2012, from http://www.w3.org/DesignIssues/LinkedData.html

  18. Herman I, Adida B, Sporny M, Birbeck M (2012) RDFa 1.1 Primer - Rich Structured Data Markup for Web Documents. W3C. Retrieved from http://www.w3.org/TR/rdfa-primer/

  19. Google. (2013) Google Search Engine Optimization Starter Guide. google.com. Google. Retrieved November 3, 2013, from http://books.google.com/books?id=LK_ebEqnbzcC&dq=intitle:Google+Search+Engine+Optimization+Starter+Guide&hl=&cd=2&source=gbs_api

  20. SearchMonkey. (2013) SearchMonkey Support for RDFa Enabled. yahoo.com. Retrieved November 3, 2013, from http://developer.yahoo.com/blogs/ydn/posts/2008/09/search monkey_ support_for_rdfa_enabled/

  21. Zheng Z (2002) AnswerBus question answering system (pp. 399–404). Presented at the Proceedings of the 2nd International Conference on Human Language Technology Research

  22. Nguyen TH, Nguyen H, Freire J (2010) PruSM: a prudent schema matching approach for web forms (pp. 1385–1388). Presented at the Proceedings of the 19th ACM international conference on Information and knowledge management, ACM Request Permissions. doi:10.1145/1871437.1871627

  23. Hitzler P, Krötzsch M, Parsia B, Patel-Schneider PF, Rudolph S (2012) OWL 2 Web Ontology Language Primer. W3C. Retrieved from http://www.w3.org/TR/owl-primer

  24. Fuchs NE, Kaljurand K, Kuhn T (2008) Attempto Controlled English for Knowledge Representation. In Reasoning Web 5224:104–124. doi:10.1007/978-3-540-85658-0_3, Springer

    Article  Google Scholar 

  25. Hewlett D, Kalyanpur A, Kolovski V, Halaschek-Wiener C (2005) Effective NL paraphrasing of ontologies on the Semantic Web. Presented at the Proceedings of the Workshop on End-User Semantic Web Interaction of the 4th International Semantic Web Conference

  26. Fliedl G, Kop C, Vöhringer J (2010) Guideline based evaluation and verbalization of OWL class and property labels. Data Knowl Eng 69(4):331–342. doi:10.1016/j.datak.2009.08.004

    Article  Google Scholar 

  27. Hollink L, Schreiber G, Wielemaker J, Wielinga B (2003) Semantic annotation of image collections. Proceedings of the Workshop on Knowledge Markup and Semantic Annotation of the Second International Conference on Knowledge Capture

  28. Auer S, Feigenbaum L, Miranker D, Fogarolli A, Sequeda J (2010) Use Cases and Requirements for Mapping Relational Databases to RDF. W3C. Retrieved from http://www.w3.org/TR/rdb2rdf-ucr/

  29. Sahoo SS, Halb W, Hellmann S, Idehen K, Thibodeau T, Auer S, Sequeda J (2009) A survey of current approaches for mapping of relational databases to rdf. W3C RDB2RDF Incubator Group. Retrieved from http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport.pdf

  30. Das S, Sundara S, Cyganiak R (2012) R2RML: RDB to RDF Mapping Language. W3C. Retrieved from http://www.w3.org/TR/r2rml/

  31. Auer S, Dietzold S, Lehmann J, Hellmann S, Aumueller D (2009) Triplify: light-weight linked data publication from relational databases (pp. 621–630). Presented at the Proceedings of the 18th International Conference on World Wide Web, ACM. doi:10.1145/1526709.1526793

  32. Bizer C, Seaborne A (2004) D2RQ - Treating Non-RDF Databases as Virtual RDF Graphs. Presented at the Proceedings of the 3rd International Semantic Web Conference

  33. Cullot N, Ghawi R, Yétongnon K (2007) DB2OWL: A Tool for Automatic Database-to-Ontology Mapping. (pp. 491–494). Presented at the Proceedings of the 15th Italian Symposium on Advanced Database Systems

  34. Cerbah F (2008) Learning Highly Structured Semantic Repositories from Relational Databases (Vol. 5021, pp. 777–781). Presented at the Proceedings of the 5th European Semantic Web Conference. doi:10.1007/978-3-540-68234-9_57

  35. Knoblock C, Szekely P, Ambite J, Goel A, Gupta S, Lerman K, Muslea M, Taheriyan M, Mallick P (2012) Semi-automatically mapping structured sources into the semantic web. (pp. 375–390). Presented at the Proceedings of the 9th International Conference on the Semantic Web: Research and Applications, Springer-Verlag. doi:10.1007/978-3-642-30284-8_32

  36. Bizer C, Heath T, Berners-Lee T (2009) Linked Data - The Story So Far. IGI Global. International Journal on Semantic Web and Information Systems, 5(3)

  37. SUMO - Suggested Upper Merged Ontology. (2013). SUMO - Suggested Upper Merged Ontology. ontologyportal.org. Retrieved November 3, 2013, from http://www.ontologyportal.org/

  38. Project ADL, University of California at Santa Barbara. (Eds.). (2002, July 3). Alexandria Digital Library Feature Type Thesaurus (RDF version). Retrieved November 3, 2013, from http://www.alexandria.ucsb.edu/~lhill/FeatureTypes/FTT_metadata.htm

  39. Ordnance Survey Ontologies (2013) Ordnance Survey Ontologies. data.ordnancesurvey.co.uk. Retrieved November 3, 2013, from http://data.ordnancesurvey.co.uk/ontology

  40. Lieberman J, Singh R, Goad C (2007) W3C Geospatial Ontologies. W3C Incubator Group. Retrieved from http://www.w3.org/2005/Incubator/geo/XGR-geo-ont-20071023/

  41. Noy N, Rector A (2006) Defining N-ary Relations on the Semantic Web. W3C. Retrieved from http://www.w3.org/TR/swbp-n-aryRelations

  42. GeoNames Gazetteer. (2013) GeoNames Gazetteer. geonames.org. Retrieved November 3, 2013, from http://www.geonames.org/

  43. Caldwell B, Chisholm W, Slatin J (2008) Web content accessibility guidelines 2.0. W3C. Retrieved from http://www.w3.org/TR/WCAG20/

  44. Figueredo LAGA, Masello J (2005) SIDRA - Aggregate Database – Definition and Loading. Diretoria de Informática, IBGE, Rio de Janeiro, Brazil

    Google Scholar 

  45. Piccinini H, Lemos M, Casanova MA, Furtado AL (2010) W-Ray: A Strategy to Publish Deep Web Geographic Data (Vol. 6413, pp. 2–11). Presented at the Proceedings of the Workshop on Semantic and Conceptual Issues in GIS of the 29th International Conference on Conceptual Modeling. doi:10.1007/978-3-642-16385-2_2

Download references

Acknowledgments

This work was partly supported by IBGE, for H. Piccinini, and by CNPq under grants 301497/2006-0, 475717/2011-2, and CAPES/PROCAD NF 21/2009, for M.A. Casanova and A.L. Furtado.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco A. Casanova.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Piccinini, H., Casanova, M.A., Leme, L.A.P.P. et al. Publishing deep web geographic data. Geoinformatica 18, 769–792 (2014). https://doi.org/10.1007/s10707-013-0201-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-013-0201-3

Keywords

Navigation