ABSTRACT
Resolving semantic heterogeneity across distinct data sources remains a highly relevant problem in the GIS domain requiring innovative solutions. Our approach, called GSim, semantically aligns tables from respective GIS databases by first choosing attributes for comparison. We then examine their instances and calculate a similarity value between them called entropy-based distribution (EBD) by combining two separate methods. Our primary method discerns the geographic types from instances of compared attributes. If geographic type matching is not possible, we then apply a generic schema matching method which employs normalized Google distance. We show the effectiveness of our approach over the traditional N-gram approach across multi-jurisdictional datasets by generating impressive results.
- Luiz André P. Paes Leme, Marco A. Casanova, Karin Koogan Breitman, Antonio L. Furtado: Instance-Based OWL Schema Matching. ICEIS 2009: 14--26.Google Scholar
- Daniela F. Brauner, Chantal Intrator, João Carlos Freitas, Marco A. Casanova: An Instance-based Approach for Matching Export Schemas of Geographical Database Web Services. GeoInfo 2007: 109--120.Google Scholar
- E. Rahm and P. A. Bernstein, "A survey of approaches to automatic schema matching", VLDB Journal, vol. V10, pp. 334--350, 2001. Google ScholarDigital Library
- Bing Tian Dai, Nick Koudas, Divesh Srivastava, Anthony K. H. Tung, and Suresh Venkatasubramanian, "Validating Multi-column Schema Matchings by Type," 24th International Conference on Data Engineering (ICDE), 2008. Google ScholarDigital Library
- Changqing Zhou, Dan Frankowski, Pamela J. Ludford, Shashi Shekhar, Loren G. Terveen: Discovering personal gazetteers: an interactive clustering approach. GIS 2004: 266--273. Google ScholarDigital Library
- www.geonames.orgGoogle Scholar
- Jeffrey Partyka, Neda Alipanah Latifur Khan, Bhavani Thuraisingham and Shashi Shekhar, "Content-based Ontology Matching for GIS Datasets", In Proc 16th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems (ACM GIS 2008) November, 2008, Irvine, CA, USA. Google ScholarDigital Library
- http://www.ertico.com/en/about_ertico/links/gdf_-_geographic_data_files.htmGoogle Scholar
- Rudi Cilibrasi, Paul M. B. Vitányi: The Google Similarity Distance CoRR abs/cs/0412098:(2004) Google ScholarDigital Library
Index Terms
- Geographically-typed semantic schema matching
Recommendations
Enhanced geographically typed semantic schema matching
Resolving semantic heterogeneity across distinct data sources remains a highly relevant problem in the GIS domain requiring innovative solutions. Our approach, called GSim, semantically aligns tables from respective GIS databases by first choosing ...
Content-Based Geospatial Schema Matching Using Semi-supervised Geosemantic Clustering and Hierarchy
ICSC '11: Proceedings of the 2011 IEEE Fifth International Conference on Semantic ComputingThe problem of semantic similarity across heterogeneous geospatial data sources continues to attract interest. Semantic similarity across data sources typically involves 1:1 matching of attributes and their instances between tables. Using clustering ...
Schema matching based on SQL statements
AbstractSchema matching is a critical step in numerous database applications such as web data sources integrating, data warehouse loading and information exchanging among several authorities. In this paper, we propose to exploit the similarities of the ...
Comments