ABSTRACT
Relational databases are the main storage model of structured data in most businesses, which usually involves multiple tables with key-foreign-key relationships. In practice, data analysts often want to pose predictive classification queries over relational databases. To answer such queries, many existing approaches perform supervised learning to train classification models, which heavily rely on the availability of sufficient labeled data. In this paper, we propose a novel graph-based contrastive framework for semi-supervised learning on relational databases, achieving promising predictive classification performance with only a handful of labeled data. Our framework utilizes contrastive learning to exploit additional supervision signals from massive unlabeled data. Specifically, we develop two contrastive graph views that are 1) advantageous for modeling complex relationships and correlations among structured data in a relational database, and 2) complementary to each other for learning robust representations of structured data to be classified. We also leverage label information in contrastive learning to mitigate its negative effect in knowledge transfer on the supervised counterpart. We conduct extensive experiments on three real-world relational databases and the results demonstrate that our framework is able to achieve the state-of-the-art predictive performance in limited labeled data settings, compared with various supervised and semi-supervised learning approaches.
Supplemental Material
- Dara Bahri, Heinrich Jiang, Yi Tay, and Donald Metzler. 2021. SCARF: Self supervised contrastive learning using random feature corruption. arXiv preprint arXiv:2106.15147 (2021).Google Scholar
- Vadim Borisov, Tobias Leemann, Kathrin Seßler, Johannes Haug, Martin Pawelczyk, and Gjergji Kasneci. 2021. Deep neural networks and tabular data: A survey. arXiv preprint arXiv:2110.01889 (2021).Google Scholar
- Shaofeng Cai, Kaiping Zheng, Gang Chen, HV Jagadish, Beng Chin Ooi, and Meihui Zhang. 2021. ARM-Net: Adaptive Relation Modeling Network for Structured Data. In Proceedings of the 2021 International Conference on Management of Data. 207--220.Google ScholarDigital Library
- Deli Chen, Yanyai Lin, Lei Li, Xuancheng Ren Li, Jie Zhou, Xu Sun, et al. 2020. Distance-wise graph contrastive learning. arXiv preprint arXiv:2012.07437 (2020).Google Scholar
- Runjin Chen, Yanyan Shen, and Dongxiang Zhang. 2021. GNEM: A Generic Oneto- Set Neural Entity Matching Framework. In WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (Eds.). ACM / IW3C2, 1686--1694. https://doi.org/10.1145/3442381.3450119Google ScholarDigital Library
- Tianqi Chen, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, et al. 2015. Xgboost: extreme gradient boosting. R package version 0.4-2 1, 4 (2015), 1--4.Google Scholar
- Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7--10.Google ScholarDigital Library
- Giovanni Da San Martino and Alessandro Sperduti. 2010. Mining structured data. IEEE Computational Intelligence Magazine 5, 1 (2010), 42--49.Google ScholarDigital Library
- Sajad Darabi, Shayan Fazeli, Ali Pazoki, Sriram Sankararaman, and Majid Sarrafzadeh. 2021. Contrastive Mixup: Self-and Semi-Supervised learning for Tabular Domain. arXiv preprint arXiv:2108.12296 (2021).Google Scholar
- Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, and Artem Babenko. 2021. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
- Hakim Hafidi, Mounir Ghogho, Philippe Ciblat, and Ananthram Swami. 2020. Graphcl: Contrastive self-supervised learning of graph representations. arXiv preprint arXiv:2007.08025 (2020).Google Scholar
- John T Hancock and Taghi M Khoshgoftaar. 2020. Survey on categorical data for neural networks. Journal of Big Data 7, 1 (2020), 1--41.Google ScholarCross Ref
- Kaveh Hassani and Amir Hosein Khasahmadi. 2020. Contrastive multi-view representation learning on graphs. In International Conference on Machine Learning. PMLR, 4116--4126.Google Scholar
- Fenyu Hu, Yanqiao Zhu, Shu Wu, Liang Wang, and Tieniu Tan. 2019. Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification. In IJCAI.Google Scholar
- Xunqiang Jiang, Yuanfu Lu, Yuan Fang, and Chuan Shi. 2021. Contrastive Pre-Training of GNNs on Heterogeneous Graphs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 803--812.Google ScholarDigital Library
- Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. international conference on learning representations (2015).Google Scholar
- Thomas N Kipf and MaxWelling. 2017. Semi-supervised classification with graph convolutional networks. ICLR (2017).Google Scholar
- Qingsong Lv, Ming Ding, Qiang Liu, Yuxiang Chen, Wenzheng Feng, Siming He, Chang Zhou, Jianguo Jiang, Yuxiao Dong, and Jie Tang. 2021. Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous graph neural networks. (2021).Google Scholar
- Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, and Jie Tang. 2020. Gcc: Graph contrastive coding for graph neural network pre-training. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1150--1160.Google ScholarDigital Library
- Gowthami Somepalli, Micah Goldblum, Avi Schwarzschild, C Bayan Bruss, and Tom Goldstein. 2021. SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training. arXiv preprint arXiv:2106.01342 (2021).Google Scholar
- Talip Ucar, Ehsan Hajiramezanali, and Lindsay Edwards. 2021. SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
- Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. stat 1050 (2017), 20.Google Scholar
- Sheng Wan, Shirui Pan, Jian Yang, and Chen Gong. 2021. Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 10049--10057.Google ScholarCross Ref
- ShengWan, Yibing Zhan, Liu Liu, Baosheng Yu, Shirui Pan, and Chen Gong. 2021. Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
- Xiao Wang, Nian Liu, Hui Han, and Chuan Shi. 2021. Self-supervised heterogeneous graph neural network with co-contrastive learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1726--1736.Google ScholarDigital Library
- Yiwei Wang, Wei Wang, Yuxuan Liang, Yujun Cai, Juncheng Liunod, and Bryan Hooi. 2020. Nodeaug: Semi-supervised node classification with data augmentation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 207--217.Google ScholarDigital Library
- Jinsung Yoon, Yao Zhang, James Jordon, and Mihaela van der Schaar. 2020. Vime: Extending the success of self-and semi-supervised learning to tabular domain. Advances in Neural Information Processing Systems 33 (2020).Google Scholar
- Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. 2020. Graph contrastive learning with augmentations. Advances in Neural Information Processing Systems 33 (2020), 5812--5823.Google Scholar
- Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6023--6032.Google ScholarCross Ref
- Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2018. mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations.Google Scholar
- Yanqiao Zhu, Yichen Xu, Hejie Cui, Carl Yang, Qiang Liu, and Shu Wu. 2021. Structure-Aware Hard Negative Mining for Heterogeneous Graph Contrastive Learning. KDD Workshop on Deep Learning on Graphs: Method and Applications (2021).Google Scholar
- Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. 2021. Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021. 2069--2080.Google ScholarDigital Library
Index Terms
- GCF-RD: A Graph-based Contrastive Framework for Semi-Supervised Learning on Relational Databases
Recommendations
Deep semi-supervised learning with contrastive learning and partial label propagation for image data
AbstractDeep semi-supervised learning is becoming an active research topic because it jointly utilizes labeled and unlabeled samples in training deep neural networks. Recent advances are mainly focused on inductive semi-supervised learning ...
A Simple Semi-Supervised Joint Learning Framework for Few-shot Text Classification
AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern RecognitionThe lack of labeled data is the bottleneck restricting deep text classification algorithm. State-of-the-art for most existing deep text classification methods follow the two-step transfer learning paradigm: pre-training a large model on an auxiliary task,...
Boosting semi-supervised learning with Contrastive Complementary Labeling
AbstractSemi-supervised learning (SSL) approaches have achieved great success in leveraging a large amount of unlabeled data to learn deep models. Among them, one popular approach is pseudo-labeling which generates pseudo labels only for those unlabeled ...
Comments