Towards a Multi-way Similarity Join Operator

Galkin, Mikhail; Vidal, Maria-Esther; Auer, Sören

doi:10.1007/978-3-319-67162-8_26

Mikhail Galkin^16,17,18,
Maria-Esther Vidal¹⁷ &
Sören Auer^16,17

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 767))

Included in the following conference series:

European Conference on Advances in Databases and Information Systems

1044 Accesses

Abstract

Increasing volumes of data consumed and managed by enterprises demand effective and efficient data integration approaches. Additionally, the amount and variety of data sources impose further challenges for query engines. However, the majority of existing query engines rely on binary join-based query planners and execution methods with complexity that depends on the number of involved data sources. Moreover, traditional binary join operators are not able to distinguish between similar and different tuples, treating every incoming tuple as an independent object. Thus, if tuples are represented differently but refer to the same real-world entity, they are still considered as non-related objects. We propose MSimJoin, an approach towards a multi-way similarity join operator. MSimJoin accepts more than two inputs and is able to identify duplicates that correspond to similar entities from incoming tuples using Semantic Web technologies. Therefore, MSimJoin allows for the reduction of both the height of tree query plans and duplicated results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Extending SPARQL with Similarity Joins

Similarity Joins and Beyond: An Extended Set of Binary Operators with Order

Enhancing Multi-Attribute Similarity Join using Reduced and Adaptive Index Trees

Article 09 April 2024

Notes

References

Acosta, M., Vidal, M.-E.: Networks of linked data eddies: an adaptive web query processing engine for RDF data. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 111–127. Springer, Cham (2015). doi:10.1007/978-3-319-25007-6_7
Chapter Google Scholar
Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for sparql endpoints. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25073-6_2
Chapter Google Scholar
Buil-Aranda, C., Arenas, M., Corcho, O., Polleres, A.: Federating queries in SPARQL1.1: syntax, semantics and evaluation. Web Semant. Sci. Serv. Agents World Wide Web 18, 1–17 (2013)
Article Google Scholar
Feng, J., Wang, J., Li, G.: Trie-join: a trie-based method for efficient string similarity joins. VLDB J. 21(4), 437–461 (2012)
Article Google Scholar
Fernández, J.D., Llaves, A., Corcho, O.: Efficient RDF interchange (ERI) format for RDF data streams. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8797, pp. 244–259. Springer, Cham (2014). doi:10.1007/978-3-319-11915-1_16
Google Scholar
Li, G., Deng, D., Wang, J., Feng, J.: Pass-join: a partition-based method for similarity joins. PVLDB 5(3), 253–264 (2011)
Google Scholar
Mann, W., Augsten, N., Bouros, P.: An empirical evaluation of set similarity join techniques. PVLDB 9(9), 636–647 (2016)
Google Scholar
Morales, C., Collarana, D., Vidal, M.-E., Auer, S.: MateTee: a semantic similarity metric based on translation embeddings for knowledge graphs. In: Cabot, J., Virgilio, R., Torlone, R. (eds.) ICWE 2017. LNCS, vol. 10360, pp. 246–263. Springer, Cham (2017). doi:10.1007/978-3-319-60131-1_14
Chapter Google Scholar
Ribeiro, L.A., Cuzzocrea, A., Bezerra, K.A.A., Nascimento, B.H.B.: Incorporating clustering into set similarity join algorithms: the SjClust framework. In: Hartmann, S., Ma, H. (eds.) DEXA 2016. LNCS, vol. 9827, pp. 185–204. Springer, Cham (2016). doi:10.1007/978-3-319-44403-1_12
Chapter Google Scholar
Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 245–260. Springer, Cham (2014). doi:10.1007/978-3-319-11964-9_16
Google Scholar
Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25073-6_38
Chapter Google Scholar
Shang, Z., Liu, Y., Li, G., Feng, J.: K-join: knowledge-aware similarity join. IEEE Trans. Knowl. Data Eng. 28(12), 3293–3308 (2016)
Article Google Scholar
Traverso, I., Vidal, M.-E., Kämpgen, B., Sure-Vetter, Y.: Gades: a graph-based semantic similarity measure. In: SEMANTiCS, pp. 101–104. ACM (2016)
Google Scholar
Verborgh, R., Sande, M.V., Hartig, O., Herwegen, J.V., Vocht, L.D., Meester, B.D., Haesendonck, G., Colpaert, P.: Triple pattern fragments: a low-cost knowledge graph interface for the web. J. Web Sem. 37–38, 184–206 (2016)
Article Google Scholar
Vidal, M.-E., Castillo, S., Acosta, M., Montoya, G., Palma, G.: On the selection of SPARQL endpoints to efficiently execute federated SPARQL queries. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXV. LNCS, vol. 9620, pp. 109–149. Springer, Heidelberg (2016). doi:10.1007/978-3-662-49534-6_4
Chapter Google Scholar
Wandelt, S., Deng, D., Gerdjikov, S., Mishra, S., Mitankin, P., Patil, M., Siragusa, E., Tiskin, A., Wang, W., Wang, J., Leser, U.: State-of-the-art in string similarity search and join. SIGMOD Rec. 43(1), 64–76 (2014)
Article Google Scholar
Wang, Y., Wang, H., Li, J., Gao, H.: Efficient graph similarity join for information integration on graphs. Front. Comput. Sci. 10(2), 317–329 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Enterprise Information Systems (EIS), University of Bonn, Bonn, Germany
Mikhail Galkin & Sören Auer
Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Sankt Augustin, Germany
Mikhail Galkin, Maria-Esther Vidal & Sören Auer
ITMO University, Saint Petersburg, Russia
Mikhail Galkin

Authors

Mikhail Galkin
View author publications
You can also search for this author in PubMed Google Scholar
Maria-Esther Vidal
View author publications
You can also search for this author in PubMed Google Scholar
Sören Auer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mikhail Galkin .

Editor information

Editors and Affiliations

Riga Technical University , Riga, Latvia
Mārīte Kirikova
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Cyprus , Nicosia, Cyprus
George A. Papadopoulos
Free University of Bozen-Bolzano , Bozen-Bolzano, Italy
Johann Gamper
Institute of Computing Science, Poznan University of Technology, Poznan, Poland
Robert Wrembel
Université Lumière Lyon 2, Lyon, France
Jérôme Darmont
University of Bologna , Bologna, Italy
Stefano Rizzi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Galkin, M., Vidal, ME., Auer, S. (2017). Towards a Multi-way Similarity Join Operator. In: Kirikova, M., et al. New Trends in Databases and Information Systems. ADBIS 2017. Communications in Computer and Information Science, vol 767. Springer, Cham. https://doi.org/10.1007/978-3-319-67162-8_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-67162-8_26
Published: 09 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67161-1
Online ISBN: 978-3-319-67162-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics