ABSTRACT
Companies and institutions now realize the potential of Linked Open Data (LOD) and they start publishing their own data as LOD. However, publishing LOD is still a challenging task. One of the main reasons is a lack of user friendly tooling which would properly support the whole LOD publishing process. The process typically consists of source data extraction, transformation to RDF, alignment with commonly used vocabularies, linking to other datasets, computing metadata, publishing on the web as a dump, loading into a triplestore and recording the dataset in a data catalog such as CKAN. In this paper we present LinkedPipes ETL, a tool for ETL-like LOD publishing, which mainly focuses on supporting such LOD publishing workflows in a user friendly way. In addition, the tool also eases consumption of already existing LOD data sources as it addresses some of the practical issues associated with it. Finally, the tool itself uses Linked Data technologies for representation of the ETL processes. We describe LinkedPipes ETL and its main distinguishing features in context of the use cases in which the tool has already been deployed. They include an institution of public administration, a municipality, a university, a software company and an open data initiative.
- Sören Auer, Sebastian Dietzold, Jens Lehmann, Sebastian Hellmann, and David Aumueller. 2009. Triplify: Light-weight Linked Data Publication from Relational Databases. In Proceedings of the 18th International Conference on World Wide Web (WWW '09). ACM, New York, NY, USA, 621--630. Google ScholarDigital Library
- Anastasia Dimou, Pieter Heyvaert, Wouter Maroy, Laurens De Graeve, Ruben Verborgh, and Erik Mannens. 2016. Towards an Interface for User-Friendly Linked Data Generation Administration. In Proceedings of the 15th International Semantic Web Conference: Posters and Demos (CEUR Workshop Proceedings), Takahiro Kawamura and Heiko Paulheim (Eds.), Vol. 1690. http://ceur-ws.org/Vol-1690/paper98.pdfGoogle Scholar
- Adrian Gschwend, Alessia C. Neuroni, Thomas Gehrig, and Marco Combettoo. 2015. Publication and Reuse of Linked Data: The Fusepool Publish-Process-Perform Platform for Linked Data. Innovation and the Public Sector 22 (2015), 116--123. http://ebooks.iospress.nl/volumearticle/40812Google Scholar
- Jakub Klímek and Petr Škoda. 2017. Speeding up publication of Linked Data using data chunking in LinkedPipes ETL. In Proceedings of the 16th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE 2017) (Lecture Notes in Computer Science), Vol. 10574. Springer.Google ScholarDigital Library
- Jakub Klímek, Petr Škoda, and Martin Nečaský. 2016. LinkedPipes ETL: Evolved Linked Data Preparation. In The Semantic Web - ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29 - June 2, 2016, Revised Selected Papers. 95--100.Google Scholar
- Tomáš Knap, Peter Hanečák, Jakub Klímek, Christian Mader, Martin Nečaský, Bert Van Nuffelen, and Petr Škoda. 2017. UnifiedViews: An ETL Tool for RDF Data Management. Semantic Web accepted for publication (2017). http://semantic-web-journal.net/content/unifiedviews-etl-tool-rdf-data-management-0.Google Scholar
- Craig A. Knoblock, Pedro Szekely, Jose Luis Ambite, Shubham Gupta, Aman Goel, Maria Muslea, Kristina Lerman, Mohsen Taheriyan, and Parag Mallick. 2012. Semi-Automatically Mapping Structured Sources into the Semantic Web. In Proceedings of the Extended Semantic Web Conference. Crete, Greece. Google ScholarDigital Library
- Jakub Kozák, Martin Nečaský, and Jaroslav Pokorný. 2015. Drug Encyclopedia - Linked Data Application for Physicians. In The Semantic Web - ISWC 2015 - 14th International Semantic Web Conference, Bethlehem, PA, USA, October 11--15, 2015, Proceedings, Part II. 41--56.Google Scholar
- Ben De Meester, Wouter Maroy, Anastasia Dimou, Ruben Verborgh, and Erik Mannens. 2017. Declarative Data Transformations for Linked Data Generation: The Case of DBpedia. In The Semantic Web - 14th International Conference, ESWC 2017, Portorož, Slovenia, May 28 - June 1, 2017, Proceedings, Part II. 33--48.Google Scholar
- François Scharffe, Ghislain Atemezing, Raphaël Troncy, Fabien Gandon, Serena Villata, Bénédicte Bucher, Fayçal Hamdi, Laurent Bihanic, Gabriel Képéklian, Franck Cotton, Jérôme Euzenat, Zhengjie Fan, Pierre-Yves Vandenbussche, and Bernard Vatant. 2012. Enabling linked data publication with the Datalift platform. In Proc. AAAI workshop on semantic cities. Toronto, Canada. https://hal.inria.fr/hal-00768424Google Scholar
- Klaudia Thellmann, Fabrizio Orlandi, and Sören Auer. 2014. LinDA - Visualising and Exploring Linked Data. In Proceedings of the Posters and Demos Track of 10th International Conference on Semantic Systems - SEMANTiCS2014. Leipzig, Germany. http://ceur-ws.org/Vol-1224/paper10.pdfGoogle Scholar
- Pierre-Yves Vandenbussche, Ghislain Atemezing, María Poveda-Villalón, and Bernard Vatant. 2017. Linked Open Vocabularies (LOV): A gateway to reusable semantic vocabularies on the Web. Semantic Web 8, 3 (2017), 437--452.Google ScholarDigital Library
Index Terms
- LinkedPipes ETL in use: practical publication and consumption of linked data
Recommendations
Using the relation ontology Metarel for modelling Linked Data as multi-digraphs
Linked Data for Health Care and the Life SciencesThe Semantic Web standards OWL and RDF are often used to represent biomedical information as Linked Data; however, the OWL/RDF syntax, which combines both, was never optimised for querying. By combining two formal paradigms for modelling Linked Data, ...
Speeding up Publication of Linked Data Using Data Chunking in LinkedPipes ETL
On the Move to Meaningful Internet Systems. OTM 2017 ConferencesAbstractThere is a multitude of tools for preparation of Linked Data from data sources such as CSV and XML files. These tools usually perform as expected when processing examples, or smaller real world data. However, a majority of these tools become hard ...
Using SPARQL to query bioportal ontologies and metadata
ISWC'12: Proceedings of the 11th international conference on The Semantic Web - Volume Part IIBioPortal is a repository of biomedical ontologies--the largest such repository, with more than 300 ontologies to date. This set includes ontologies that were developed in OWL, OBO and other languages, as well as a large number of medical terminologies ...
Comments