Heterogeneous Embeddings for Relational Data Integration Tasks

Li, Xuehui; Wang, Guangqi; Shen, Derong; Nie, Tiezheng; Kou, Yue

doi:10.1007/978-3-030-87571-8_59

Xuehui Li¹³,
Guangqi Wang¹⁴,
Derong Shen¹³,
Tiezheng Nie¹³ &
…
Yue Kou¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12999))

Included in the following conference series:

International Conference on Web Information Systems and Applications

2541 Accesses
1 Citations

Abstract

Data integration technology can integrate data from different data sources, making it convenient and prompt to use heterogeneous data when processing big data. Therefore, data integration plays an important role in many industries. Recently, more and more work is devoted to data integration for relational data aiming at mining the underlying knowledge from it. Through embedding technology, the features of data can be extracted and expressed in the low-dimensional vectors. Some existing methods took records, attributes and cell values in relational data as various research objects to calculate their embedding representations, but the three types of data objects were trained uniformly in these methods ignoring the differences between multiple types of data. In this paper, we transform the relational data into a heterogeneous graph where different levels of data are treated as different types of nodes. In the training process, different calculation methods are adopted for corresponding node types according to their own characteristics, so that to obtain more accurate embedding representations for data. Then the embeddings are applied to the specific tasks of data integration. The experimental results show that the data embeddings trained by proposed model have good universality and achieve satisfying results in both schema matching and entity resolution tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). https://arxiv.org/abs/1301.3781
Bordawekar, R., Shmueli, O.: Using word embedding to enable semantic queries in relational databases. In: Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning, pp. 1–4 (2017). https://doi.org/10.1145/3076246.3076251
Zhang, L., Zhang, S., Balog, K.: Table2vec: Neural word and entity embeddings for table population and retrieval. In: SIGIR, pp. 1029–1032 (2019). https://doi.org/10.1145/3331184.3331333.
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000). https://doi.org/10.1126/science.290.5500.2323
Article Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. Nips 14, 585–591 (2001)
Google Scholar
Ahmed, A., Shervashidze, N., Narayanamurthy, S., Josifovski, V., Smola, A.J.: Distributed large-scale natural graph factorization. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 37–48 (2013). https://doi.org/10.1145/2488388.2488393
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: KDD, pp. 701–710 (2014). https://doi.org/10.1145/2623330.2623732
Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: KDD, pp. 855–864, (2016). https://doi.org/10.1145/2939672.2939754
Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In KDD, pp. 1225–1234 (2016). https://doi.org/10.1145/2939672.2939753
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks (2016). https://arxiv.org/abs/1609.02907
Cilibrasi, R.L., Vitanyi, P.M.B.: The Google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007). https://doi.org/10.1109/TKDE.2007.48
Article Google Scholar
Guo, T., Shen, D., Nie, T., Kou, Y.: Web table column type detection using deep learning and probability graph model. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds.) WISA 2020. LNCS, vol. 12432, pp. 401–414. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60029-7_37
Chapter Google Scholar
Koutras, C., Fragkoulis, M., Katsifodimos, A., Lofi, C.: REMA: graph embeddings-based relational schema matching. In: EDBT/ICDT Workshops (2020)
Google Scholar
Konda, P., Das, S., Suganthan, G.C.P., Doan, A., Ardalan, A., Ballard, J.R., et al.: Magellan: toward building entity matching management systems. Proc. VLDB Endow. 9(12), 1197–1208 (2016). https://doi.org/10.14778/2994509.2994535
Article Google Scholar
Ebraheem, M., Thirumuruganathan, S., Joty, S., Ouzzani, M., Tang, N.: Distributed representations of tuples for entity resolution. Proc. VLDB Endow. 11(11), 1454–1467 (2018). https://doi.org/10.5555/3236187.3269461
Article Google Scholar
Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G., et al.: Deep learning for entity matching: a design space exploration. SIGMOD Conf. 2018, 19–34 (2018). https://doi.org/10.1145/3183713.3196926
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al.: Attention is all you need. Nips 30, 5998–6008 (2017)
Google Scholar
Li, B., Wang, W., Sun, Y., Zhang, L., Ali, M.A., Wang, Y.: GraphER: token-centric entity resolution with graph convolutional neural networks. AAAI 34(5), 8172–8179 (2020). https://doi.org/10.1609/AAAI.V34I05.6330
Article Google Scholar
Cappuzzo, R., Papotti, P., Thirumuruganathan, S.: Creating embeddings of heterogeneous relational datasets for data integration tasks. SIGMOD Conf. 2020, 1335–1349 (2020). https://doi.org/10.1145/3318464.3389742
Article Google Scholar
Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Adv. Neural. Inf. Process. Syst. 30, 1024–1034 (2017)
Google Scholar

Download references

Acknowledgment

This work is supported by the National Natural Science Foundation of China (62072084, 62072086), the National Defense Basic Scientific Research Program of China (JCKY2018205C012) and the Fundamental Research Funds for the Central Universities (N2116008).

Author information

Authors and Affiliations

Northeastern University, Shenyang, 110004, China
Xuehui Li, Derong Shen, Tiezheng Nie & Yue Kou
Liaoning Provincial Higher and Secondary Education Enrollment Examination Committee Office, Shenyang, 110031, China
Guangqi Wang

Authors

Xuehui Li
View author publications
You can also search for this author in PubMed Google Scholar
Guangqi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Derong Shen
View author publications
You can also search for this author in PubMed Google Scholar
Tiezheng Nie
View author publications
You can also search for this author in PubMed Google Scholar
Yue Kou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Derong Shen .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Chunxiao Xing
Institute of Computer Science, University of Göttingen, Goettingen, Germany
Xiaoming Fu
Tsinghua University, Beijing, China
Yong Zhang
Chinese Academy of Sciences, Beijing, China
Guigang Zhang
Renmin University of China, Beijing, China
Chaolemen Borjigin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Wang, G., Shen, D., Nie, T., Kou, Y. (2021). Heterogeneous Embeddings for Relational Data Integration Tasks. In: Xing, C., Fu, X., Zhang, Y., Zhang, G., Borjigin, C. (eds) Web Information Systems and Applications. WISA 2021. Lecture Notes in Computer Science(), vol 12999. Springer, Cham. https://doi.org/10.1007/978-3-030-87571-8_59

Download citation

DOI: https://doi.org/10.1007/978-3-030-87571-8_59
Published: 17 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87570-1
Online ISBN: 978-3-030-87571-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)