Abstract
This article introduces a design process, called W-RayS, to describe Deep Web geographic data and to publish the descriptions both on the Web of Data and on the Surface Web. The article also outlines a toolkit that supports the process and discusses an experiment in which the toolkit was used to publish data stored in a large map server. Briefly, to describe geographic data in vector format, the designer should first specify views over the underlying geographic database that capture the basic characteristics of the geographic objects and their topological relationships represented in the vector data. The same idea is applied to raster data, but using a gazetteer or any other geographic database that covers the same area as the raster data. Then, the designer should map the view definitions to an RDF schema, following the Linked Data principles. The descriptions of the geographic data are therefore formalized as sets of RDF triples synthesized from the conventional data. To publish geographic data descriptions on the Web of Data, the designer may decide to materialize the RDF triples and store them in a repository or create a SPARQL endpoint to access the triples on demand. To publish geographic data descriptions on the Surface Web, W-RayS offers the designer tools to transform the RDF triples to natural language sentences, organized as static Web pages with embedded RDFa. The inclusion of RDFa preserves the structure of the data and allows more specific queries, processed by engines that analyze Web pages with RDFa.









Similar content being viewed by others
Notes
A Part-of-Speech Tagger marks (“tags”) a word in a text (within a corpus) with its corresponding part of speech, based on its definition and its relationship with adjacent and related words or phrases in a clause or paragraph. Part-of-speech is a linguistic category of words or lexical items that is usually defined by the syntactic or morphologic behavior of the lexical item in question. Common linguistic categories include nouns and verbs.
The definition of the RDF schema is available at www.inf.puc-rio.br/~hpiccinini/wray/biome.owl
The complete RDF schema is available at www.inf.puc-rio.br/~hpiccinini/wray/image.owl
References
Bergman MK (2001) The Deep Web: Surfacing Hidden Value. The Journal of Electronic Publishing 7(1). doi:10.3998/3336451.0007.104
Madhavan J, Afanasiev L, Antova L, Halevy A (2009) Harnessing the Deep Web: Present and Future (Vol. cs.DB). Presented at the Fourth Biennial Conference on Innovative Data Systems Research
Esri (2013) ArcGIS Online. esri.com. Retrieved November 3, 2013, from http://www.esri.com/software/arcgis/arcgisonline
MapServer open source web mapping. (2013). MapServer open source web mapping. mapserver.org. Retrieved November 3, 2013, from http://mapserver.org
Martins B, Silva MJ, Chaves M (2007). O sistema CaGE no HAREM-reconhecimento de entidades geográficas em textos em língua portuguesa. Linguateca
Szekely P, Knoblock CA, Gupta S, Taheriyan M, Wu B (2011) Exploiting semantics of web services for geospatial data fusion (pp. 32–39). Presented at the Proceedings of the 1st ACM SIGSPATIAL International Workshop on Spatial Semantics and Ontologies, ACM Press. doi:10.1145/2068976.2068981
Cafarella MJ, Halevy A, Madhavan J (2011) Structured data on the web. Commun ACM 54(2):72–79. doi:10.1145/1897816.1897839
Madhavan J, Ko D, Kot L, Ganapathy V, Rasmussen A, Halevy A (2008) Google’s Deep Web crawl (Vol. 1, pp. 1241–1252). Presented at the Proceedings of the VLDB Endowment, VLDB Endowment
Maiti A, Dasgupta A, Zhang N, Das G (2009) HDSampler: revealing data behind web form interfaces (pp. 1131–1134). Presented at the Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, ACM. doi:10.1145/1559845.1560001
Raghavan S, Garcia-Molina H (2001) Crawling the Hidden Web - Stanford InfoLab Publication Server. Presented at the Proceedings of the 27th International Conference on Very Large Data Bases. Retrieved from http://ilpubs.stanford.edu:8090/725/
Callan J (2002) Distributed Information Retrieval. In Advances in Information Retrieval 7:127–150. doi:10.1007/0-306-47019-5_5, Springer
Cafarella MJ, Halevy A, Khoussainova N (2009) Data integration for the relational web. Proceedings of the VLDB Endowment (PVLDB) 2(1):1090–1101
He B, Zhang Z, Chang K. C.-C (2005) MetaQuerier: querying structured web sources on-the-fly (pp. 927–929). Presented at the Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, ACM Request Permissions. doi:10.1145/1066157.1066291
He H, Meng W, Yu C, Wu Z (2005) WISE-Integrator: a system for extracting and integrating complex web search interfaces of the deep web (pp. 1314–1317). Presented at the Proceedings of the 31st International Conference on Very Large Data Bases
Kabisch T, Dragut E. C, Yu C, Leser U (2010) Deep web integration with VisQI (Vol. 3, pp. 1613–1616). Presented at the Proceedings of the VLDB Endowment (PVLDB)
Rajaraman A (2009) Kosmix: high-performance topic exploration using the deep web (Vol. 2, pp. 1524–1529). Presented at the Proceedings of the VLDB Endowment (PVLDB)
Berners-Lee T. (2006, July 27). Linked Data - Design Issues. w3.org. W3C. Retrieved October 31, 2012, from http://www.w3.org/DesignIssues/LinkedData.html
Herman I, Adida B, Sporny M, Birbeck M (2012) RDFa 1.1 Primer - Rich Structured Data Markup for Web Documents. W3C. Retrieved from http://www.w3.org/TR/rdfa-primer/
Google. (2013) Google Search Engine Optimization Starter Guide. google.com. Google. Retrieved November 3, 2013, from http://books.google.com/books?id=LK_ebEqnbzcC&dq=intitle:Google+Search+Engine+Optimization+Starter+Guide&hl=&cd=2&source=gbs_api
SearchMonkey. (2013) SearchMonkey Support for RDFa Enabled. yahoo.com. Retrieved November 3, 2013, from http://developer.yahoo.com/blogs/ydn/posts/2008/09/search monkey_ support_for_rdfa_enabled/
Zheng Z (2002) AnswerBus question answering system (pp. 399–404). Presented at the Proceedings of the 2nd International Conference on Human Language Technology Research
Nguyen TH, Nguyen H, Freire J (2010) PruSM: a prudent schema matching approach for web forms (pp. 1385–1388). Presented at the Proceedings of the 19th ACM international conference on Information and knowledge management, ACM Request Permissions. doi:10.1145/1871437.1871627
Hitzler P, Krötzsch M, Parsia B, Patel-Schneider PF, Rudolph S (2012) OWL 2 Web Ontology Language Primer. W3C. Retrieved from http://www.w3.org/TR/owl-primer
Fuchs NE, Kaljurand K, Kuhn T (2008) Attempto Controlled English for Knowledge Representation. In Reasoning Web 5224:104–124. doi:10.1007/978-3-540-85658-0_3, Springer
Hewlett D, Kalyanpur A, Kolovski V, Halaschek-Wiener C (2005) Effective NL paraphrasing of ontologies on the Semantic Web. Presented at the Proceedings of the Workshop on End-User Semantic Web Interaction of the 4th International Semantic Web Conference
Fliedl G, Kop C, Vöhringer J (2010) Guideline based evaluation and verbalization of OWL class and property labels. Data Knowl Eng 69(4):331–342. doi:10.1016/j.datak.2009.08.004
Hollink L, Schreiber G, Wielemaker J, Wielinga B (2003) Semantic annotation of image collections. Proceedings of the Workshop on Knowledge Markup and Semantic Annotation of the Second International Conference on Knowledge Capture
Auer S, Feigenbaum L, Miranker D, Fogarolli A, Sequeda J (2010) Use Cases and Requirements for Mapping Relational Databases to RDF. W3C. Retrieved from http://www.w3.org/TR/rdb2rdf-ucr/
Sahoo SS, Halb W, Hellmann S, Idehen K, Thibodeau T, Auer S, Sequeda J (2009) A survey of current approaches for mapping of relational databases to rdf. W3C RDB2RDF Incubator Group. Retrieved from http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport.pdf
Das S, Sundara S, Cyganiak R (2012) R2RML: RDB to RDF Mapping Language. W3C. Retrieved from http://www.w3.org/TR/r2rml/
Auer S, Dietzold S, Lehmann J, Hellmann S, Aumueller D (2009) Triplify: light-weight linked data publication from relational databases (pp. 621–630). Presented at the Proceedings of the 18th International Conference on World Wide Web, ACM. doi:10.1145/1526709.1526793
Bizer C, Seaborne A (2004) D2RQ - Treating Non-RDF Databases as Virtual RDF Graphs. Presented at the Proceedings of the 3rd International Semantic Web Conference
Cullot N, Ghawi R, Yétongnon K (2007) DB2OWL: A Tool for Automatic Database-to-Ontology Mapping. (pp. 491–494). Presented at the Proceedings of the 15th Italian Symposium on Advanced Database Systems
Cerbah F (2008) Learning Highly Structured Semantic Repositories from Relational Databases (Vol. 5021, pp. 777–781). Presented at the Proceedings of the 5th European Semantic Web Conference. doi:10.1007/978-3-540-68234-9_57
Knoblock C, Szekely P, Ambite J, Goel A, Gupta S, Lerman K, Muslea M, Taheriyan M, Mallick P (2012) Semi-automatically mapping structured sources into the semantic web. (pp. 375–390). Presented at the Proceedings of the 9th International Conference on the Semantic Web: Research and Applications, Springer-Verlag. doi:10.1007/978-3-642-30284-8_32
Bizer C, Heath T, Berners-Lee T (2009) Linked Data - The Story So Far. IGI Global. International Journal on Semantic Web and Information Systems, 5(3)
SUMO - Suggested Upper Merged Ontology. (2013). SUMO - Suggested Upper Merged Ontology. ontologyportal.org. Retrieved November 3, 2013, from http://www.ontologyportal.org/
Project ADL, University of California at Santa Barbara. (Eds.). (2002, July 3). Alexandria Digital Library Feature Type Thesaurus (RDF version). Retrieved November 3, 2013, from http://www.alexandria.ucsb.edu/~lhill/FeatureTypes/FTT_metadata.htm
Ordnance Survey Ontologies (2013) Ordnance Survey Ontologies. data.ordnancesurvey.co.uk. Retrieved November 3, 2013, from http://data.ordnancesurvey.co.uk/ontology
Lieberman J, Singh R, Goad C (2007) W3C Geospatial Ontologies. W3C Incubator Group. Retrieved from http://www.w3.org/2005/Incubator/geo/XGR-geo-ont-20071023/
Noy N, Rector A (2006) Defining N-ary Relations on the Semantic Web. W3C. Retrieved from http://www.w3.org/TR/swbp-n-aryRelations
GeoNames Gazetteer. (2013) GeoNames Gazetteer. geonames.org. Retrieved November 3, 2013, from http://www.geonames.org/
Caldwell B, Chisholm W, Slatin J (2008) Web content accessibility guidelines 2.0. W3C. Retrieved from http://www.w3.org/TR/WCAG20/
Figueredo LAGA, Masello J (2005) SIDRA - Aggregate Database – Definition and Loading. Diretoria de Informática, IBGE, Rio de Janeiro, Brazil
Piccinini H, Lemos M, Casanova MA, Furtado AL (2010) W-Ray: A Strategy to Publish Deep Web Geographic Data (Vol. 6413, pp. 2–11). Presented at the Proceedings of the Workshop on Semantic and Conceptual Issues in GIS of the 29th International Conference on Conceptual Modeling. doi:10.1007/978-3-642-16385-2_2
Acknowledgments
This work was partly supported by IBGE, for H. Piccinini, and by CNPq under grants 301497/2006-0, 475717/2011-2, and CAPES/PROCAD NF 21/2009, for M.A. Casanova and A.L. Furtado.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Piccinini, H., Casanova, M.A., Leme, L.A.P.P. et al. Publishing deep web geographic data. Geoinformatica 18, 769–792 (2014). https://doi.org/10.1007/s10707-013-0201-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-013-0201-3