Abstract
With vast amounts of tabular data freely available under several Open-Data initiatives, semantic integration of such datasets is a pressing need. Multiple research efforts have addressed the problem of annotating tabular data. However, to the best of our knowledge, they do not adequately address the problem of semantic integration of tables. A given collection of tables can be semantically integrated along several perspectives or themes. This makes semantic integration a “divergent aggregation” problem. Most existing approaches have focused on interpreting a single table, or rewriting tables to describe an overarching theme that is already provided. In this work, we address semantic integration along two levels: Theme identification (identifying dominant topics or perspectives through which the data can be characterized) and Schematic characterization (classes, relationships and instances that best characterize the data within the theme). The theme need not be represented by a single column, and may span across multiple columns or tables. We use Linked Open data (LOD) cloud to map ontologies that best suit the datasets. Our work also identifies incoherent datasets where a given collection may not have common topics. In such cases we are able to provide guidance on the intersection of semantic footprints of the tables for a judicious selection of the datasets for semantic integration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The Linking Open Data cloud diagram: http://lod-cloud.net/.
- 3.
RDFConverter: http://www.w3.org/wiki/ConverterToRdf.
- 4.
Virtuoso Sponger: http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtSponger.
- 5.
Linked Data Design Issues: http://www.w3.org/DesignIssues/LinkedData.html.
- 6.
Open Refine: https://github.com/OpenRefine.
- 7.
- 8.
- 9.
- 10.
- 11.
References
Auer, S., Dietzold, S., Lehmann, J., Hellmann, S., Aumueller, D.: Triplify: light-weight linked data publication from relational databases. In: 18th International Conference on World Wide Web, pp. 621–630. ACM, April 2009
Bhagavatula, C.S., Noraset, T., Downey, D.: TabEL: entity linking in web tables. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 425–441. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25007-6_25
Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems, pp. 121–124. ACM, September 2013
Ding, L., DiFranzo, D., Graves, A., Michaelis, J., Li, X., McGuinness, D.L., Hendler, J.: Data-gov Wiki: towards linking government data. In: AAAI Spring Symposium: Linked data Meets Artificial Intelligence, vol. 10, p. 1 (2010)
Ding, L., Lebo, T., Erickson, J.S., DiFranzo, D., Williams, G.T., Li, X., Michaelis, J., Graves, A., Zheng, J.G., Shangguan, Z., Flores, J.: TWC LOGD: A portal for linked open government data ecosystems. Web Semant. Sci. Serv. Agents World Wide Web 9(3), 325–333 (2011)
Ermilov, I., Auer, S., Stadler, C.: Csv2rdf: User-driven csv to rdf mass conversion framework. In: ISEM, vol. 13, pp. 04–06, September 2013
Han, L., Finin, T., Parr, C., Sachs, J., Joshi, A.: RDF123: from spreadsheets to RDF. In: Sheth, A., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 451–466. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88564-1_29
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endowment 3(1–2), 1338–1347 (2010)
Marshall, M.S., Boyce, R., Deus, H.F., Zhao, J., Willighagen, E.L., Samwald, M., Pichler, E., Hajagos, J., Prud’hommeaux, E., Stephens, S.: Emerging practices for mapping and linking life sciences data using RDF A case series. Web Semant. Sci. Serv. Agents World Wide Web 14, 2–13 (2012)
Miles, A., Matthews, B., Wilson, M., Brickley, D.: Core: Simple knowledge organisation for the web. In: International Conference on Dublin Core and Metadata Applications (2005)
Mulwad, V.V.: TABEL A Domain Independent and Extensible Framework for Inferring the Semantics of Tables (Doctoral dissertation, University of Maryland) (2015)
Sekhavat, Y.A., Di Paolo, F., Barbosa, D., Merialdo, P.: Knowledge base augmentation using tabular data. In: LDOW (2014)
Srinivasa, S., Agrawal, S., Jog, C., Deshmukh, J.: Characterizing open utilitarian knowledge. In: Proceedings of the First IKDD Conference on Data Sciences (CoDS 2014), New Delhi, India, March 2014
Subramanian, A.: Inferencing in the large characterizing semantic integration of open tabular data. In: ISWC-DC 2015 The ISWC 2015 Doctoral Consortium, pp. 74–81. CEUR-WS.org, Pennsylvania (2015)
Subramanian, A., Srinivasa, S., Kumar, P., Vignesh, S.: Semantic integration of structured data powered by linked open data. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, p. 13. ACM, Cyprus (2015)
Unbehauen, J., Hellmann, S., Auer, S., Stadler, C.: Knowledge extraction from structured sources. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 7538, pp. 34–52. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34213-4_3
Zhang, Z.: Start small, build complete: Effective and efficient semantic table interpretation using tableminer. Semant. Web J. Under Transparent Rev. (2014)
Acknowledgments
We would like to thank the members of the Web Science Lab for cooperating with the Evaluation as Human Evaluators. The lab members who voluntered as human evaluators are all MS and PhD students pursuing research in the broad areas of Web Science, Semantic Web, Text Mining and Knowledge Representation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Subramanian, A., Mathai, V.K., Manikanta, V., Joshi, J.V., Srinivasa, S. (2016). Semantic Integration of Open-Data Tables. In: Debruyne, C., et al. On the Move to Meaningful Internet Systems: OTM 2016 Conferences. OTM 2016. Lecture Notes in Computer Science(), vol 10033. Springer, Cham. https://doi.org/10.1007/978-3-319-48472-3_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-48472-3_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48471-6
Online ISBN: 978-3-319-48472-3
eBook Packages: Computer ScienceComputer Science (R0)