Skip to main content

Heterogeneous Embeddings for Relational Data Integration Tasks

  • Conference paper
  • First Online:
Web Information Systems and Applications (WISA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12999))

Included in the following conference series:

Abstract

Data integration technology can integrate data from different data sources, making it convenient and prompt to use heterogeneous data when processing big data. Therefore, data integration plays an important role in many industries. Recently, more and more work is devoted to data integration for relational data aiming at mining the underlying knowledge from it. Through embedding technology, the features of data can be extracted and expressed in the low-dimensional vectors. Some existing methods took records, attributes and cell values in relational data as various research objects to calculate their embedding representations, but the three types of data objects were trained uniformly in these methods ignoring the differences between multiple types of data. In this paper, we transform the relational data into a heterogeneous graph where different levels of data are treated as different types of nodes. In the training process, different calculation methods are adopted for corresponding node types according to their own characteristics, so that to obtain more accurate embedding representations for data. Then the embeddings are applied to the specific tasks of data integration. The experimental results show that the data embeddings trained by proposed model have good universality and achieve satisfying results in both schema matching and entity resolution tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). https://arxiv.org/abs/1301.3781

  2. Bordawekar, R., Shmueli, O.: Using word embedding to enable semantic queries in relational databases. In: Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning, pp. 1–4 (2017). https://doi.org/10.1145/3076246.3076251

  3. Zhang, L., Zhang, S., Balog, K.: Table2vec: Neural word and entity embeddings for table population and retrieval. In: SIGIR, pp. 1029–1032 (2019). https://doi.org/10.1145/3331184.3331333.

  4. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000). https://doi.org/10.1126/science.290.5500.2323

    Article  Google Scholar 

  5. Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. Nips 14, 585–591 (2001)

    Google Scholar 

  6. Ahmed, A., Shervashidze, N., Narayanamurthy, S., Josifovski, V., Smola, A.J.: Distributed large-scale natural graph factorization. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 37–48 (2013). https://doi.org/10.1145/2488388.2488393

  7. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: KDD, pp. 701–710 (2014). https://doi.org/10.1145/2623330.2623732

  8. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: KDD, pp. 855–864, (2016). https://doi.org/10.1145/2939672.2939754

  9. Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In KDD, pp. 1225–1234 (2016). https://doi.org/10.1145/2939672.2939753

  10. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks (2016). https://arxiv.org/abs/1609.02907

  11. Cilibrasi, R.L., Vitanyi, P.M.B.: The Google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007). https://doi.org/10.1109/TKDE.2007.48

    Article  Google Scholar 

  12. Guo, T., Shen, D., Nie, T., Kou, Y.: Web table column type detection using deep learning and probability graph model. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds.) WISA 2020. LNCS, vol. 12432, pp. 401–414. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60029-7_37

    Chapter  Google Scholar 

  13. Koutras, C., Fragkoulis, M., Katsifodimos, A., Lofi, C.: REMA: graph embeddings-based relational schema matching. In: EDBT/ICDT Workshops (2020)

    Google Scholar 

  14. Konda, P., Das, S., Suganthan, G.C.P., Doan, A., Ardalan, A., Ballard, J.R., et al.: Magellan: toward building entity matching management systems. Proc. VLDB Endow. 9(12), 1197–1208 (2016). https://doi.org/10.14778/2994509.2994535

    Article  Google Scholar 

  15. Ebraheem, M., Thirumuruganathan, S., Joty, S., Ouzzani, M., Tang, N.: Distributed representations of tuples for entity resolution. Proc. VLDB Endow. 11(11), 1454–1467 (2018). https://doi.org/10.5555/3236187.3269461

    Article  Google Scholar 

  16. Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G., et al.: Deep learning for entity matching: a design space exploration. SIGMOD Conf. 2018, 19–34 (2018). https://doi.org/10.1145/3183713.3196926

    Article  Google Scholar 

  17. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al.: Attention is all you need. Nips 30, 5998–6008 (2017)

    Google Scholar 

  18. Li, B., Wang, W., Sun, Y., Zhang, L., Ali, M.A., Wang, Y.: GraphER: token-centric entity resolution with graph convolutional neural networks. AAAI 34(5), 8172–8179 (2020). https://doi.org/10.1609/AAAI.V34I05.6330

    Article  Google Scholar 

  19. Cappuzzo, R., Papotti, P., Thirumuruganathan, S.: Creating embeddings of heterogeneous relational datasets for data integration tasks. SIGMOD Conf. 2020, 1335–1349 (2020). https://doi.org/10.1145/3318464.3389742

    Article  Google Scholar 

  20. Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Adv. Neural. Inf. Process. Syst. 30, 1024–1034 (2017)

    Google Scholar 

Download references

Acknowledgment

This work is supported by the National Natural Science Foundation of China (62072084, 62072086), the National Defense Basic Scientific Research Program of China (JCKY2018205C012) and the Fundamental Research Funds for the Central Universities (N2116008).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derong Shen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, X., Wang, G., Shen, D., Nie, T., Kou, Y. (2021). Heterogeneous Embeddings for Relational Data Integration Tasks. In: Xing, C., Fu, X., Zhang, Y., Zhang, G., Borjigin, C. (eds) Web Information Systems and Applications. WISA 2021. Lecture Notes in Computer Science(), vol 12999. Springer, Cham. https://doi.org/10.1007/978-3-030-87571-8_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87571-8_59

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87570-1

  • Online ISBN: 978-3-030-87571-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics