Skip to main content

Semantic Integration of Open-Data Tables

  • Conference paper
  • First Online:
Book cover On the Move to Meaningful Internet Systems: OTM 2016 Conferences (OTM 2016)

Abstract

With vast amounts of tabular data freely available under several Open-Data initiatives, semantic integration of such datasets is a pressing need. Multiple research efforts have addressed the problem of annotating tabular data. However, to the best of our knowledge, they do not adequately address the problem of semantic integration of tables. A given collection of tables can be semantically integrated along several perspectives or themes. This makes semantic integration a “divergent aggregation” problem. Most existing approaches have focused on interpreting a single table, or rewriting tables to describe an overarching theme that is already provided. In this work, we address semantic integration along two levels: Theme identification (identifying dominant topics or perspectives through which the data can be characterized) and Schematic characterization (classes, relationships and instances that best characterize the data within the theme). The theme need not be represented by a single column, and may span across multiple columns or tables. We use Linked Open data (LOD) cloud to map ontologies that best suit the datasets. Our work also identifies incoherent datasets where a given collection may not have common topics. In such cases we are able to provide guidance on the intersection of semantic footprints of the tables for a judicious selection of the datasets for semantic integration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    CSV: https://en.wikipedia.org/wiki/Comma-separated_values.

  2. 2.

    The Linking Open Data cloud diagram: http://lod-cloud.net/.

  3. 3.

    RDFConverter: http://www.w3.org/wiki/ConverterToRdf.

  4. 4.

    Virtuoso Sponger: http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtSponger.

  5. 5.

    Linked Data Design Issues: http://www.w3.org/DesignIssues/LinkedData.html.

  6. 6.

    Open Refine: https://github.com/OpenRefine.

  7. 7.

    https://data.gov.in/.

  8. 8.

    https://support.spatialkey.com/spatialkey-sample-csv-data/.

  9. 9.

    https://www.cse.iitb.ac.in/~sunita/wwt/#data.

  10. 10.

    http://www.cs.princeton.edu/courses/archive/spr09/cos435/Notes/relevance_topost.pdf.

  11. 11.

    http://wsl.iiitb.ac.in/index.php/2016/07/17/inferencing-in-the-large-semantic-integration-of-open-data-tables//.

References

  1. Auer, S., Dietzold, S., Lehmann, J., Hellmann, S., Aumueller, D.: Triplify: light-weight linked data publication from relational databases. In: 18th International Conference on World Wide Web, pp. 621–630. ACM, April 2009

    Google Scholar 

  2. Bhagavatula, C.S., Noraset, T., Downey, D.: TabEL: entity linking in web tables. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 425–441. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25007-6_25

    Chapter  Google Scholar 

  3. Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems, pp. 121–124. ACM, September 2013

    Google Scholar 

  4. Ding, L., DiFranzo, D., Graves, A., Michaelis, J., Li, X., McGuinness, D.L., Hendler, J.: Data-gov Wiki: towards linking government data. In: AAAI Spring Symposium: Linked data Meets Artificial Intelligence, vol. 10, p. 1 (2010)

    Google Scholar 

  5. Ding, L., Lebo, T., Erickson, J.S., DiFranzo, D., Williams, G.T., Li, X., Michaelis, J., Graves, A., Zheng, J.G., Shangguan, Z., Flores, J.: TWC LOGD: A portal for linked open government data ecosystems. Web Semant. Sci. Serv. Agents World Wide Web 9(3), 325–333 (2011)

    Article  Google Scholar 

  6. Ermilov, I., Auer, S., Stadler, C.: Csv2rdf: User-driven csv to rdf mass conversion framework. In: ISEM, vol. 13, pp. 04–06, September 2013

    Google Scholar 

  7. Han, L., Finin, T., Parr, C., Sachs, J., Joshi, A.: RDF123: from spreadsheets to RDF. In: Sheth, A., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 451–466. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88564-1_29

    Chapter  Google Scholar 

  8. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  9. Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endowment 3(1–2), 1338–1347 (2010)

    Article  Google Scholar 

  10. Marshall, M.S., Boyce, R., Deus, H.F., Zhao, J., Willighagen, E.L., Samwald, M., Pichler, E., Hajagos, J., Prud’hommeaux, E., Stephens, S.: Emerging practices for mapping and linking life sciences data using RDF A case series. Web Semant. Sci. Serv. Agents World Wide Web 14, 2–13 (2012)

    Article  Google Scholar 

  11. Miles, A., Matthews, B., Wilson, M., Brickley, D.: Core: Simple knowledge organisation for the web. In: International Conference on Dublin Core and Metadata Applications (2005)

    Google Scholar 

  12. Mulwad, V.V.: TABEL A Domain Independent and Extensible Framework for Inferring the Semantics of Tables (Doctoral dissertation, University of Maryland) (2015)

    Google Scholar 

  13. Sekhavat, Y.A., Di Paolo, F., Barbosa, D., Merialdo, P.: Knowledge base augmentation using tabular data. In: LDOW (2014)

    Google Scholar 

  14. Srinivasa, S., Agrawal, S., Jog, C., Deshmukh, J.: Characterizing open utilitarian knowledge. In: Proceedings of the First IKDD Conference on Data Sciences (CoDS 2014), New Delhi, India, March 2014

    Google Scholar 

  15. Subramanian, A.: Inferencing in the large characterizing semantic integration of open tabular data. In: ISWC-DC 2015 The ISWC 2015 Doctoral Consortium, pp. 74–81. CEUR-WS.org, Pennsylvania (2015)

    Google Scholar 

  16. Subramanian, A., Srinivasa, S., Kumar, P., Vignesh, S.: Semantic integration of structured data powered by linked open data. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, p. 13. ACM, Cyprus (2015)

    Google Scholar 

  17. Unbehauen, J., Hellmann, S., Auer, S., Stadler, C.: Knowledge extraction from structured sources. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 7538, pp. 34–52. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34213-4_3

    Chapter  Google Scholar 

  18. Zhang, Z.: Start small, build complete: Effective and efficient semantic table interpretation using tableminer. Semant. Web J. Under Transparent Rev. (2014)

    Google Scholar 

Download references

Acknowledgments

We would like to thank the members of the Web Science Lab for cooperating with the Evaluation as Human Evaluators. The lab members who voluntered as human evaluators are all MS and PhD students pursuing research in the broad areas of Web Science, Semantic Web, Text Mining and Knowledge Representation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asha Subramanian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Subramanian, A., Mathai, V.K., Manikanta, V., Joshi, J.V., Srinivasa, S. (2016). Semantic Integration of Open-Data Tables. In: Debruyne, C., et al. On the Move to Meaningful Internet Systems: OTM 2016 Conferences. OTM 2016. Lecture Notes in Computer Science(), vol 10033. Springer, Cham. https://doi.org/10.1007/978-3-319-48472-3_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48472-3_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48471-6

  • Online ISBN: 978-3-319-48472-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics