FiLiPo: A Sample Driven Approach for Finding Linkage Points Between RDF Data and APIs

Zeimetz, Tobias; Schenkel, Ralf

doi:10.1007/978-3-030-82472-3_18

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12843))

Included in the following conference series:

European Conference on Advances in Databases and Information Systems

562 Accesses
3 Citations

Abstract

Data integration is an important task in order to create comprehensive RDF knowledge bases. Many data sources are used to extend a given dataset or to correct errors. Since several data providers make their data publicly available only via Web APIs they also must be included in the integration process. However, APIs often come with limitations in terms of access frequencies and speed due to latencies and other constraints. On the other hand, APIs always provide access to the latest data. So far, integrating APIs has been mainly a manual task due to the heterogeneity of API responses. To tackle this problem we present in this paper the FiLiPo (Finding Linkage Points) system which automatically finds connections (i.e., linkage points) between data provided by APIs and local knowledge bases. FiLiPo is an open source sample-driven schema matching system that models API services as parameterized queries. Furthermore, our approach is able to find valid input values for APIs automatically (e.g. IDs) and can determine valid alignments between KBs and APIs. Our results on ten pairs of KBs and APIs show that FiLiPo performs well in terms of precision and recall and outperforms the current state-of-the-art system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://dblp.uni-trier.de/.
2.
Code available at https://github.com/dbis-trier-university/FiLiPo.
3.
Link to the extended paper version: https://arxiv.org/abs/2103.06253.
4.
All used similarity methods are listed in our manual at https://github.com/dbis-trier-university/FiLiPo/blob/master/README.md.
5.
Provided by dblp: https://basilika.uni-trier.de/nextcloud/s/A92AbECHzmHiJRF.
6.
http://www.cs.toronto.edu/~oktie/linkedmdb/linkedmdb-18-05-2009-dump.nt.
7.
Code and gold standard can be found at https://zenodo.org/record/4778531.

References

Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: COLING, pp. 1638–1649 (2018)
Google Scholar
Baltes, S., Dumani, L., Treude, C., Diehl, S.: SOTorrent: reconstructing and analyzing the evolution of stack overflow posts. In: MSR, pp. 319–330. ACM (2018)
Google Scholar
Bernstein, P.A., Madhavan, J., Rahm, E.: Generic schema matching, ten years later. Proc. VLDB Endow. 4(11), 695–701 (2011)
Article Google Scholar
Dhamankar, R., Lee, Y., Doan, A., Halevy, A.Y., Domingos, P.M.: iMAP: discovering complex mappings between database schemas. In: SIGMOD, pp. 383–394. ACM (2004). https://doi.org/10.1145/1007568.1007612
Hogan, A., Polleres, A., Umbrich, J., Zimmermann, A.: Some entities are more equal than others: statistical methods to consolidate linked data. In: 4th Workshop on New Forms of Reasoning for the Semantic Web: Scalable & Dynamic (2010)
Google Scholar
Koutraki, M., Preda, N., Vodislav, D.: SOFYA: semantic on-the-fly relation alignment. In: EDBT, pp. 690–691 (2016). https://doi.org/10.5441/002/edbt.2016.89
Koutraki, M., Preda, N., Vodislav, D.: Online relation alignment for linked datasets. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 152–168. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58068-5_10
Chapter Google Scholar
Koutraki, M., Vodislav, D., Preda, N.: Deriving intensional descriptions for web services. In: CIKM, pp. 971–980. ACM (2015). https://doi.org/10.1145/2806416.2806447
Koutraki, M., Vodislav, D., Preda, N.: DORIS: discovering ontological relations in services. In: ISWC. CEUR Workshop Proceedings, vol. 1486 (2015)
Google Scholar
Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.Y.: Corpus-based schema matching. In: ICDE, pp. 57–68 (2005). https://doi.org/10.1109/ICDE.2005.39
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: VLDB, pp. 49–58. Morgan Kaufmann (2001)
Google Scholar
Derouiche, N., Cautis, B., Abdessalem, T.: Automatic extraction of structured web data with domain knowledge. In: ICDE. IEEE Computer Society (2012). https://doi.org/10.1109/ICDE.2012.90
Qian, L., Cafarella, M.J., Jagadish, H.V.: Sample-driven schema mapping. In: SIGMOD, pp. 73–84. ACM (2012). https://doi.org/10.1145/2213836.2213846
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001). https://doi.org/10.1007/s007780100057
Article MATH Google Scholar
Senellart, P., Mittal, A., Muschick, D., Gilleron, R., Tommasi, M.: Automatic wrapper induction from hidden-web sources with domain knowledge. In: WIDM. ACM (2008). https://doi.org/10.1145/1458502.1458505
Suchanek, F.M., Abiteboul, S., Senellart, P.: PARIS: probabilistic alignment of relations, instances, and schema. Proc. VLDB Endow. 5(3), 157–168 (2011). https://doi.org/10.14778/2078331.2078332
Zeimetz, T., Schenkel, R.: Sample driven data mapping for linked data and web apis. In: CIKM, pp. 3481–3484. ACM (2020). https://doi.org/10.1145/3340531.3417438

Download references

Author information

Authors and Affiliations

Trier University, 54286, Trier, Germany
Tobias Zeimetz & Ralf Schenkel

Authors

Tobias Zeimetz
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Schenkel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tobias Zeimetz .

Editor information

Editors and Affiliations

LIAS/ISAE-ENSMA, Futuroscope Chasseneuil Cedex, France
Ladjel Bellatreche
University of Tartu, Tartu, Estonia
Marlon Dumas
Aarhus University, Aarhus, Denmark
Panagiotis Karras
University of Tartu, Tartu, Estonia
Raimundas Matulevičius

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeimetz, T., Schenkel, R. (2021). FiLiPo: A Sample Driven Approach for Finding Linkage Points Between RDF Data and APIs. In: Bellatreche, L., Dumas, M., Karras, P., Matulevičius, R. (eds) Advances in Databases and Information Systems. ADBIS 2021. Lecture Notes in Computer Science(), vol 12843. Springer, Cham. https://doi.org/10.1007/978-3-030-82472-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-82472-3_18
Published: 16 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82471-6
Online ISBN: 978-3-030-82472-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics