research-article

Learning from Textual Data in Database Systems

Authors:

Michael Günther,

Philipp Oehme,

Maik Thiele,

Wolfgang LehnerAuthors Info & Claims

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Pages 375 - 384

https://doi.org/10.1145/3340531.3412056

Published: 19 October 2020 Publication History

Get Access

Abstract

Relational database systems hold massive amounts of text, valuable for many machine learning (ML) tasks. Since ML techniques depend on numerical input representations, pre-trained word embeddings are increasingly utilized to convert text values into meaningful numbers. However, a naïve one-to-one mapping of each word in a database to a word embedding vector misses incorporating rich context information given by the database schema. Thus, we propose a novel relational retrofitting framework Retro to learn numerical representations of text values in databases, capturing the rich information encoded by pre-trained word embedding models as well as context information provided by tabular and foreign key relations in the database. We defined relation retrofitting as an optimization problem, present an efficient algorithm solving it, and investigate the influence of various hyperparameters. Further, we develop simple feed-forward and complex graph convolutional neural network architectures to operate on those representations. Our evaluation shows that the proposed embeddings and models are ready-to-use for many ML tasks, such as text classification, imputation, and link prediction, and even outperform state-of-the-art techniques.

Supplementary Material

MP4 File (3340531.3412056.mp4)

Relational database systems hold massive amounts of text, valuable for many machine learning (ML) tasks. Since ML techniques depend on numerical input representations, pre-trained word embeddings are increasingly utilized to convert text values into meaningful numbers. However, a naive one-to-one mapping of each word in a database to a word embedding vector misses incorporating rich context information given by the database schema. Thus, we propose a novel relational retrofitting framework RETRO to learn numerical representations of text values in databases. Further, we develop simple feed-forward and complex graph convolutional neural network architectures to operate on those representations. Our evaluation shows that the proposed embeddings and models are ready-to-use for many ML tasks, such as text classification, imputation, and link prediction, and even outperform state-of-the-art techniques.

Download
24.47 MB

References

[1]

A. Alghunaim, M. Mohtarami, S. Cyphers, and J. Glass. A Vector Space Approach for Aspect Based Sentiment Analysis. In Proc. of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pages 116--122, 2015.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Progress in Database Search Strategies

Migration of Relational Database to Document-Oriented Database: Structure Denormalization and Data Transformation

Migration of data from relational database to graph database

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations