skip to main content
10.1145/3511808.3557331acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

GCF-RD: A Graph-based Contrastive Framework for Semi-Supervised Learning on Relational Databases

Authors Info & Claims
Published:17 October 2022Publication History

ABSTRACT

Relational databases are the main storage model of structured data in most businesses, which usually involves multiple tables with key-foreign-key relationships. In practice, data analysts often want to pose predictive classification queries over relational databases. To answer such queries, many existing approaches perform supervised learning to train classification models, which heavily rely on the availability of sufficient labeled data. In this paper, we propose a novel graph-based contrastive framework for semi-supervised learning on relational databases, achieving promising predictive classification performance with only a handful of labeled data. Our framework utilizes contrastive learning to exploit additional supervision signals from massive unlabeled data. Specifically, we develop two contrastive graph views that are 1) advantageous for modeling complex relationships and correlations among structured data in a relational database, and 2) complementary to each other for learning robust representations of structured data to be classified. We also leverage label information in contrastive learning to mitigate its negative effect in knowledge transfer on the supervised counterpart. We conduct extensive experiments on three real-world relational databases and the results demonstrate that our framework is able to achieve the state-of-the-art predictive performance in limited labeled data settings, compared with various supervised and semi-supervised learning approaches.

Skip Supplemental Material Section

Supplemental Material

CIKM22-fp0343.mp4

mp4

165.2 MB

References

  1. Dara Bahri, Heinrich Jiang, Yi Tay, and Donald Metzler. 2021. SCARF: Self supervised contrastive learning using random feature corruption. arXiv preprint arXiv:2106.15147 (2021).Google ScholarGoogle Scholar
  2. Vadim Borisov, Tobias Leemann, Kathrin Seßler, Johannes Haug, Martin Pawelczyk, and Gjergji Kasneci. 2021. Deep neural networks and tabular data: A survey. arXiv preprint arXiv:2110.01889 (2021).Google ScholarGoogle Scholar
  3. Shaofeng Cai, Kaiping Zheng, Gang Chen, HV Jagadish, Beng Chin Ooi, and Meihui Zhang. 2021. ARM-Net: Adaptive Relation Modeling Network for Structured Data. In Proceedings of the 2021 International Conference on Management of Data. 207--220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Deli Chen, Yanyai Lin, Lei Li, Xuancheng Ren Li, Jie Zhou, Xu Sun, et al. 2020. Distance-wise graph contrastive learning. arXiv preprint arXiv:2012.07437 (2020).Google ScholarGoogle Scholar
  5. Runjin Chen, Yanyan Shen, and Dongxiang Zhang. 2021. GNEM: A Generic Oneto- Set Neural Entity Matching Framework. In WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (Eds.). ACM / IW3C2, 1686--1694. https://doi.org/10.1145/3442381.3450119Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Tianqi Chen, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, et al. 2015. Xgboost: extreme gradient boosting. R package version 0.4-2 1, 4 (2015), 1--4.Google ScholarGoogle Scholar
  7. Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Giovanni Da San Martino and Alessandro Sperduti. 2010. Mining structured data. IEEE Computational Intelligence Magazine 5, 1 (2010), 42--49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sajad Darabi, Shayan Fazeli, Ali Pazoki, Sriram Sankararaman, and Majid Sarrafzadeh. 2021. Contrastive Mixup: Self-and Semi-Supervised learning for Tabular Domain. arXiv preprint arXiv:2108.12296 (2021).Google ScholarGoogle Scholar
  10. Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, and Artem Babenko. 2021. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems 34 (2021).Google ScholarGoogle Scholar
  11. Hakim Hafidi, Mounir Ghogho, Philippe Ciblat, and Ananthram Swami. 2020. Graphcl: Contrastive self-supervised learning of graph representations. arXiv preprint arXiv:2007.08025 (2020).Google ScholarGoogle Scholar
  12. John T Hancock and Taghi M Khoshgoftaar. 2020. Survey on categorical data for neural networks. Journal of Big Data 7, 1 (2020), 1--41.Google ScholarGoogle ScholarCross RefCross Ref
  13. Kaveh Hassani and Amir Hosein Khasahmadi. 2020. Contrastive multi-view representation learning on graphs. In International Conference on Machine Learning. PMLR, 4116--4126.Google ScholarGoogle Scholar
  14. Fenyu Hu, Yanqiao Zhu, Shu Wu, Liang Wang, and Tieniu Tan. 2019. Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification. In IJCAI.Google ScholarGoogle Scholar
  15. Xunqiang Jiang, Yuanfu Lu, Yuan Fang, and Chuan Shi. 2021. Contrastive Pre-Training of GNNs on Heterogeneous Graphs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 803--812.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. international conference on learning representations (2015).Google ScholarGoogle Scholar
  17. Thomas N Kipf and MaxWelling. 2017. Semi-supervised classification with graph convolutional networks. ICLR (2017).Google ScholarGoogle Scholar
  18. Qingsong Lv, Ming Ding, Qiang Liu, Yuxiang Chen, Wenzheng Feng, Siming He, Chang Zhou, Jianguo Jiang, Yuxiao Dong, and Jie Tang. 2021. Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous graph neural networks. (2021).Google ScholarGoogle Scholar
  19. Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, and Jie Tang. 2020. Gcc: Graph contrastive coding for graph neural network pre-training. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1150--1160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gowthami Somepalli, Micah Goldblum, Avi Schwarzschild, C Bayan Bruss, and Tom Goldstein. 2021. SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training. arXiv preprint arXiv:2106.01342 (2021).Google ScholarGoogle Scholar
  21. Talip Ucar, Ehsan Hajiramezanali, and Lindsay Edwards. 2021. SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning. Advances in Neural Information Processing Systems 34 (2021).Google ScholarGoogle Scholar
  22. Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. stat 1050 (2017), 20.Google ScholarGoogle Scholar
  23. Sheng Wan, Shirui Pan, Jian Yang, and Chen Gong. 2021. Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 10049--10057.Google ScholarGoogle ScholarCross RefCross Ref
  24. ShengWan, Yibing Zhan, Liu Liu, Baosheng Yu, Shirui Pan, and Chen Gong. 2021. Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels. Advances in Neural Information Processing Systems 34 (2021).Google ScholarGoogle Scholar
  25. Xiao Wang, Nian Liu, Hui Han, and Chuan Shi. 2021. Self-supervised heterogeneous graph neural network with co-contrastive learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1726--1736.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yiwei Wang, Wei Wang, Yuxuan Liang, Yujun Cai, Juncheng Liunod, and Bryan Hooi. 2020. Nodeaug: Semi-supervised node classification with data augmentation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 207--217.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jinsung Yoon, Yao Zhang, James Jordon, and Mihaela van der Schaar. 2020. Vime: Extending the success of self-and semi-supervised learning to tabular domain. Advances in Neural Information Processing Systems 33 (2020).Google ScholarGoogle Scholar
  28. Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. 2020. Graph contrastive learning with augmentations. Advances in Neural Information Processing Systems 33 (2020), 5812--5823.Google ScholarGoogle Scholar
  29. Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6023--6032.Google ScholarGoogle ScholarCross RefCross Ref
  30. Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2018. mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  31. Yanqiao Zhu, Yichen Xu, Hejie Cui, Carl Yang, Qiang Liu, and Shu Wu. 2021. Structure-Aware Hard Negative Mining for Heterogeneous Graph Contrastive Learning. KDD Workshop on Deep Learning on Graphs: Method and Applications (2021).Google ScholarGoogle Scholar
  32. Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. 2021. Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021. 2069--2080.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GCF-RD: A Graph-based Contrastive Framework for Semi-Supervised Learning on Relational Databases

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
        October 2022
        5274 pages
        ISBN:9781450392365
        DOI:10.1145/3511808
        • General Chairs:
        • Mohammad Al Hasan,
        • Li Xiong

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 October 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CIKM '22 Paper Acceptance Rate621of2,257submissions,28%Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      • Article Metrics

        • Downloads (Last 12 months)142
        • Downloads (Last 6 weeks)13

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader