research-article

GCF-RD: A Graph-based Contrastive Framework for Semi-Supervised Learning on Relational Databases

Authors:
Runjin Chen

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Tong Li

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Yanyan Shen

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Luyu Qiu

Huawei Research Hong Kong, Hong Kong, China

Huawei Research Hong Kong, Hong Kong, China
View Profile

,
Kaidi Li

Huawei Technologies, Shen Zhen, China

Huawei Technologies, Shen Zhen, China
View Profile

,
Caleb Chen Cao

Huawei Research Hong Kong, Hong Kong, China

Huawei Research Hong Kong, Hong Kong, China
View Profile

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementOctober 2022Pages 222–231https://doi.org/10.1145/3511808.3557331

Published:17 October 2022Publication History

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Pages 222–231

ABSTRACT

Relational databases are the main storage model of structured data in most businesses, which usually involves multiple tables with key-foreign-key relationships. In practice, data analysts often want to pose predictive classification queries over relational databases. To answer such queries, many existing approaches perform supervised learning to train classification models, which heavily rely on the availability of sufficient labeled data. In this paper, we propose a novel graph-based contrastive framework for semi-supervised learning on relational databases, achieving promising predictive classification performance with only a handful of labeled data. Our framework utilizes contrastive learning to exploit additional supervision signals from massive unlabeled data. Specifically, we develop two contrastive graph views that are 1) advantageous for modeling complex relationships and correlations among structured data in a relational database, and 2) complementary to each other for learning robust representations of structured data to be classified. We also leverage label information in contrastive learning to mitigate its negative effect in knowledge transfer on the supervised counterpart. We conduct extensive experiments on three real-world relational databases and the results demonstrate that our framework is able to achieve the state-of-the-art predictive performance in limited labeled data settings, compared with various supervised and semi-supervised learning approaches.

Supplemental Material

CIKM22-fp0343.mp4

mp4

165.2 MB

Download

References

Dara Bahri, Heinrich Jiang, Yi Tay, and Donald Metzler. 2021. SCARF: Self supervised contrastive learning using random feature corruption. arXiv preprint arXiv:2106.15147 (2021).Google Scholar
Vadim Borisov, Tobias Leemann, Kathrin Seßler, Johannes Haug, Martin Pawelczyk, and Gjergji Kasneci. 2021. Deep neural networks and tabular data: A survey. arXiv preprint arXiv:2110.01889 (2021).Google Scholar
Shaofeng Cai, Kaiping Zheng, Gang Chen, HV Jagadish, Beng Chin Ooi, and Meihui Zhang. 2021. ARM-Net: Adaptive Relation Modeling Network for Structured Data. In Proceedings of the 2021 International Conference on Management of Data. 207--220.Google ScholarDigital Library
Deli Chen, Yanyai Lin, Lei Li, Xuancheng Ren Li, Jie Zhou, Xu Sun, et al. 2020. Distance-wise graph contrastive learning. arXiv preprint arXiv:2012.07437 (2020).Google Scholar
Runjin Chen, Yanyan Shen, and Dongxiang Zhang. 2021. GNEM: A Generic Oneto- Set Neural Entity Matching Framework. In WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (Eds.). ACM / IW3C2, 1686--1694. https://doi.org/10.1145/3442381.3450119Google ScholarDigital Library
Tianqi Chen, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, et al. 2015. Xgboost: extreme gradient boosting. R package version 0.4-2 1, 4 (2015), 1--4.Google Scholar
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7--10.Google ScholarDigital Library
Giovanni Da San Martino and Alessandro Sperduti. 2010. Mining structured data. IEEE Computational Intelligence Magazine 5, 1 (2010), 42--49.Google ScholarDigital Library
Sajad Darabi, Shayan Fazeli, Ali Pazoki, Sriram Sankararaman, and Majid Sarrafzadeh. 2021. Contrastive Mixup: Self-and Semi-Supervised learning for Tabular Domain. arXiv preprint arXiv:2108.12296 (2021).Google Scholar
Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, and Artem Babenko. 2021. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
Hakim Hafidi, Mounir Ghogho, Philippe Ciblat, and Ananthram Swami. 2020. Graphcl: Contrastive self-supervised learning of graph representations. arXiv preprint arXiv:2007.08025 (2020).Google Scholar
John T Hancock and Taghi M Khoshgoftaar. 2020. Survey on categorical data for neural networks. Journal of Big Data 7, 1 (2020), 1--41.Google ScholarCross Ref
Kaveh Hassani and Amir Hosein Khasahmadi. 2020. Contrastive multi-view representation learning on graphs. In International Conference on Machine Learning. PMLR, 4116--4126.Google Scholar
Fenyu Hu, Yanqiao Zhu, Shu Wu, Liang Wang, and Tieniu Tan. 2019. Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification. In IJCAI.Google Scholar
Xunqiang Jiang, Yuanfu Lu, Yuan Fang, and Chuan Shi. 2021. Contrastive Pre-Training of GNNs on Heterogeneous Graphs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 803--812.Google ScholarDigital Library
Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. international conference on learning representations (2015).Google Scholar
Thomas N Kipf and MaxWelling. 2017. Semi-supervised classification with graph convolutional networks. ICLR (2017).Google Scholar
Qingsong Lv, Ming Ding, Qiang Liu, Yuxiang Chen, Wenzheng Feng, Siming He, Chang Zhou, Jianguo Jiang, Yuxiao Dong, and Jie Tang. 2021. Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous graph neural networks. (2021).Google Scholar
Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, and Jie Tang. 2020. Gcc: Graph contrastive coding for graph neural network pre-training. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1150--1160.Google ScholarDigital Library
Gowthami Somepalli, Micah Goldblum, Avi Schwarzschild, C Bayan Bruss, and Tom Goldstein. 2021. SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training. arXiv preprint arXiv:2106.01342 (2021).Google Scholar
Talip Ucar, Ehsan Hajiramezanali, and Lindsay Edwards. 2021. SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. stat 1050 (2017), 20.Google Scholar
Sheng Wan, Shirui Pan, Jian Yang, and Chen Gong. 2021. Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 10049--10057.Google ScholarCross Ref
ShengWan, Yibing Zhan, Liu Liu, Baosheng Yu, Shirui Pan, and Chen Gong. 2021. Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
Xiao Wang, Nian Liu, Hui Han, and Chuan Shi. 2021. Self-supervised heterogeneous graph neural network with co-contrastive learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1726--1736.Google ScholarDigital Library
Yiwei Wang, Wei Wang, Yuxuan Liang, Yujun Cai, Juncheng Liunod, and Bryan Hooi. 2020. Nodeaug: Semi-supervised node classification with data augmentation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 207--217.Google ScholarDigital Library
Jinsung Yoon, Yao Zhang, James Jordon, and Mihaela van der Schaar. 2020. Vime: Extending the success of self-and semi-supervised learning to tabular domain. Advances in Neural Information Processing Systems 33 (2020).Google Scholar
Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. 2020. Graph contrastive learning with augmentations. Advances in Neural Information Processing Systems 33 (2020), 5812--5823.Google Scholar
Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6023--6032.Google ScholarCross Ref
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2018. mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations.Google Scholar
Yanqiao Zhu, Yichen Xu, Hejie Cui, Carl Yang, Qiang Liu, and Shu Wu. 2021. Structure-Aware Hard Negative Mining for Heterogeneous Graph Contrastive Learning. KDD Workshop on Deep Learning on Graphs: Method and Applications (2021).Google Scholar
Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. 2021. Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021. 2069--2080.Google ScholarDigital Library

Index Terms

GCF-RD: A Graph-based Contrastive Framework for Semi-Supervised Learning on Relational Databases
1. Computing methodologies
  1. Artificial intelligence
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Semi-supervised learning

Recommendations

Deep semi-supervised learning with contrastive learning and partial label propagation for image data
Abstract
Deep semi-supervised learning is becoming an active research topic because it jointly utilizes labeled and unlabeled samples in training deep neural networks. Recent advances are mainly focused on inductive semi-supervised learning ...
Read More
A Simple Semi-Supervised Joint Learning Framework for Few-shot Text Classification
AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

The lack of labeled data is the bottleneck restricting deep text classification algorithm. State-of-the-art for most existing deep text classification methods follow the two-step transfer learning paradigm: pre-training a large model on an auxiliary task,...
Read More
Boosting semi-supervised learning with Contrastive Complementary Labeling
Abstract
Semi-supervised learning (SSL) approaches have achieved great success in leveraging a large amount of unlabeled data to learn deep models. Among them, one popular approach is pseudo-labeling which generates pseudo labels only for those unlabeled ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
October 2022
5274 pages
ISBN:9781450392365
DOI:10.1145/3511808
General Chairs:
Mohammad Al Hasan
Indiana University Purdue University, Indianapolis, USA
,
Li Xiong
Emory University, Atlanta, USA
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
contrastive learning
relational database
semi-supervised learning
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '22 Paper Acceptance Rate621of2,257submissions,28%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 327
  Total Downloads
- Downloads (Last 12 months)142
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GCF-RD: A Graph-based Contrastive Framework for Semi-Supervised Learning on Relational Databases

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Deep semi-supervised learning with contrastive learning and partial label propagation for image data

A Simple Semi-Supervised Joint Learning Framework for Few-shot Text Classification

Boosting semi-supervised learning with Contrastive Complementary Labeling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

GCF-RD: A Graph-based Contrastive Framework for Semi-Supervised Learning on Relational Databases

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Deep semi-supervised learning with contrastive learning and partial label propagation for image data

A Simple Semi-Supervised Joint Learning Framework for Few-shot Text Classification

Boosting semi-supervised learning with Contrastive Complementary Labeling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media