research-article

Self-supervision for Tabular Data by Learning to Predict Additive Homoskedastic Gaussian Noise as Pretext

Authors:
Tahir Syed

Institute of Business Administration Karachi, Pakistan

Institute of Business Administration Karachi, Pakistan

0000-0003-0638-9689
View Profile

,
Behroz Mirza

National University of Computer and Emerging Sciences, Karachi, Pakistan

National University of Computer and Emerging Sciences, Karachi, Pakistan

0000-0003-0899-1526
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 17 Issue 9Article No.: 122pp 1–17https://doi.org/10.1145/3594720

Published:15 June 2023Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

The lack of scalability of data annotation translates to the need to decrease dependency on labels. Self-supervision offers a solution with data training themselves. However, it has received relatively less attention on tabular data, data that drive a large proportion of business and application domains. This work, which we name the Statistical Self-Supervisor (SSS), proposes a method for self-supervision on tabular data by defining a continuous perturbation as pretext. It enables a neural network to learn representations by learning to predict the level of additive isotropic Gaussian noise added to inputs. The choice of the pretext transformation is motivated by intrinsic characteristics of a neural network fundamentally performing linear fits under the widely adopted assumption of Gaussianity in its fitting error and the preservation of locality of a data example on the data manifold in the presence of small random perturbations. The transform condenses information in the generated representations, making them better employable for further task-specific prediction as evidenced by performance improvement of the downstream classifier. To evaluate the persistence of performance under low-annotation settings, SSS is evaluated against different levels of label availability to the downstream classifier (1% to 100%) and benchmarked against self- and semi-supervised methods. At the most label-constrained, 1% setting, we report a maximum increase of at least 2.5% against the next-best semi-supervised competing method. We report an increase of more than 1.5% against self-supervised state of the art. Ablation studies also reveal that increasing label availability from 0% to 1% results in a maximum increase of up to 50% on either of the five performance metrics and up to 15% thereafter, indicating diminishing returns in additional annotation.

REFERENCES

[1] Muhammad Ahmad, Behroz Mirza, Behraj Khan, and Tahir Syed. 2022. Task memorization for incremental learning with a common neural network. Retrieved from https://www.researchgate.net/profile/Tahir-Syed/publication/339165067_Task_memorization_for_incremental_learning_with_a_common_neural_network_architecture/links/5e9f300292851c2f52ba40ef/Task-memorization-for-incremental-learning-with-a-common-neural-network-architecture.pdf.Google Scholar
[2] E. Alpaydin. 1996. Pen based Recognition Dataset. http://archive.ics.uci.edu/ml/datasets/pen-based+recognition+of+handwritten+digits. [Online; accessed -2019].Google Scholar
[2] Arik Sercan and Pfister Tomas. (2021). TabNet: Attentive interpretable tabular learning. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
[4] Berthelot David, Carlini Nicholas, Goodfellow Ian, Papernot Nicolas, Oliver Avital, and Raffel Colin A.. 2019. Mixmatch: A holistic approach to semi-supervised learning. InAdvances in Neural Information Processing Systems (2019).Google Scholar
[5] Blum Avrim and Mitchell Tom. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory (1998), 92–100.Google Scholar
[6] Chen Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.Google Scholar
[7] Noam Chomsky et al. 2006. On cognitive structures and their development: A reply to Piaget. In Philosophy of Mind: Classical Problems/Contemporary Issues. Routledge and Kegan Paul, 751–755.Google Scholar
[8] Cui Yin, Jia Menglin, Lin Tsung-Yi, Song Yang, and Belongie Serge. 2019. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9268–9277.Google Scholar
[9] Doersch Carl, Gupta Abhinav, and Efros Alexei. 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision (2015), 1422–1430.Google Scholar
[10] Doncieux Stephane, Bredeche Nicolas, Goff Léni Le, Girard Benoît, Coninx Alexandre, Sigaud Olivier, Khamassi Mehdi, Díaz-Rodríguez Natalia, Filliat David, Hospedales Timothy, et al. 2020. DREAM Architecture: A developmental approach to open-ended learning in robotics. arXiv:2005.06223. Retrieved from https://arxiv.org/abs/2005.06223.Google Scholar
[11] Dragi Kocev, Vens Celine, Struyf Jan, and Džeroski Sašo. 2013. Tree ensembles for predicting structured outputs. Pattern Recogn. 46, 4 (2013), 817–833.Google Scholar
[12] Nikos Komodakis and Spyros Gidaris. 2018. Unsupervised representation learning by predicting image rotations. In International Conference on Learning Representations (ICLR’18).Google Scholar
[13] Gieseke Fabian, Airola Antti, Pahikkala Tapio, and Kramer Oliver. 2014. Fast and simple gradient-based optimization for semi-supervised support vector machines. Neurocomputing 123, 1 (2014), 23–32.Google ScholarDigital Library
[14] Chuan Guo, Jared Frank, and Kilian Weinberger. 2020. Low frequency adversarial perturbation In Uncertainty in Artificial Intelligence. 1127–1137.Google Scholar
[15] Hady Mohamed, Farouk Abdel, and Schwenker Friedhelm. 2008. Co-training by committee: A generalized framework for semi-supervised learning with committees. Int. J. Softw. Inf. 2, 2 (2008), 95–124.Google Scholar
[16] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729–9738.Google Scholar
[17] Kaggle. 2011. Give Me Some Credit. https://www.kaggle.com/datasets/brycecf/give-me-some-credit-dataset. [Online; accessed - 2019].Google Scholar
[18] Machine Learning Group. 2013. Credit Card Fraud Detection. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud. [Online; accessed -2019].Google Scholar
[19] Jeff Schlimmer. 1987. Mushroom Dataset. https://archive.ics.uci.edu/ml/datasets/Mushroom. [Online; accessed -2019].Google Scholar
[20] Ronny Kohavi. 1996. Adult Income Dataset. https://archive.ics.uci.edu/ml/datasets/Adult/. [Online; accessed -2019].Google Scholar
[21] Kostopoulos Georgios, Livieris Ioannis, Kotsiantis S., and Tampakas Vassilis. 2018. CST-voting: A semi-supervised ensemble method for classification problems. J. Intell. Fuzzy Syst. 35, 1 (2018), 99–109.Google ScholarDigital Library
[22] LeCun Yann. 2019. Retrieved from https://twitter.com/ylecun/status/1140445577408327683.Google Scholar
[23] Levatić Jurica, Ceci Michelangelo, Kocev Dragi, and Dźeroski Sašo. 2017. Semi-supervised classification trees. J. Intell. Inf. Syst. 49, 3 (2017), 481–486.Google ScholarDigital Library
[24] Loog Marco. 2015. Contrastive pessimistic likelihood estimation for semi-supervised classification. IEEE Trans. Pattern Anal. Mach. Intell. 38, 3 (2015), 462–475.Google ScholarDigital Library
[25] Martins Andre and Astudillo Ramon. 2016. From softmax to sparsemax: A sparse model of attention and multi-label classification. In Proceedings of the International Conference on Machine Learning.Google Scholar
[26] Misra Ishan and Maaten Laurens van der. 2020. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition. In Advances in Neural Information Processing Systems.Google Scholar
[27] Miyato Takeru, Maeda Shinichi, Koyama Masanori, and Ishii Shin. 2019. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 41, 8 (2019), 1979–1993.Google ScholarCross Ref
[28] Pathak Deepak, Krahenbuhl Philipp, Donahue Jeff, Darrell Trevor, and Efros Alexei A.. 2016. Context encoders: Feature learning by in painting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2536–2544.Google Scholar
[29] Pathak Deepak, Krahenbuhl Philipp, Donahue Jeff, Darrell Trevor, and Efros Alexei. 2016. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
[30] Patki Neha. 2016. The Synthetic Data Vault: Generative Modeling for Relational Databases. Ph.D. Dissertation.Google Scholar
[31] Oliver Roesler. 2013. EEG Eye State Dataset. https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State. [Online; accessed -2019].Google Scholar
[32] Sajjadi Mehdi, Javanmardi Mehran, and Tasdizen Tolga. 2016. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In Advances in Neural Information Processing Systems. 1163–1171.Google ScholarDigital Library
[33] Tanha Jafar, Someren Maarten van, and Afsarmanesh Hamideh. 2018. Semi-supervised self-training for decision tree classifiers. Int. J. Mach. Learn. Cybernet. 8, 1 (2018), 335–370.Google Scholar
[34] Tarvainen Antti and Valpola Harri. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems. 1195–1204.Google Scholar
[35] Ucar Talip, Hajiramezanali Ehsan, and Edwards Lindsay. 2021. SubTab: Subsetting features of tabular data for self-supervised representation learning. In Proceedings of the 35th Conference on Neural Information Processing Systems.Google Scholar
[36] Oord Aaron van den, Li Yazhe, and Vinyals Oriol. 2018. Representation learning with contrastive predictive coding. (unpublished).Google Scholar
[37] Vincent Pascal, Larochelle Hugo, Bengio Yoshua, and Manzagol Pierre-Antoine. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning. ACM.Google Scholar
[38] Wang Yue and Solomon Justin M.. 2019. PRNet: Self-supervised learning for partial-to-partial registration. In Advances in Neural Information Processing Systems. 8814–8826.Google Scholar
[39] Wu Zhirong, Xiong Yuanjun, Yu Stella X., and Lin Dahua. 2018. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3733–13742.Google Scholar
[40] Xiaohua Zhai, Oliver Avital, Kolesnikov Alexander, and Beyer Lucas. 2019. S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision.Google Scholar
[41] Yoon Jinsung, Zhang Yao, Jordon James, and Schaar Mihaela van der. 2020. VIME: Extending the success of self-and semi-supervised learning to tabular domain. In Advances in Neural Information Processing Systems.Google Scholar
[42] Zhang Hongyi, Cisse Moustapha, Dauphin Yann, and Lopez-Paz David. 2017. mixup: Beyond empirical risk minimization. arXiv:1710.09412. Retrieved from https://arxiv.org/abs/1710.09412.Google Scholar
[43] Zhang Richard, Isola Phillip, and Efros Alexei A.. 2016. Colorful image colorization. InProceedings of the European Conference on Computer Vision. 649–666.Google Scholar
[44] Zhang Richard, Isola Phillip, and Efros Alexei A.. 2017. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1058–1067.Google Scholar
[45] Zhu Xiaojin Jerry. 2005. Semi-supervised Learning Literature Survey. Technical Report. Department of Computer Sciences, University of Wisconsin—Madison.Google Scholar

Index Terms

Self-supervision for Tabular Data by Learning to Predict Additive Homoskedastic Gaussian Noise as Pretext
1. Computing methodologies
  1. Machine learning

Recommendations

Self-Supervision Can Be a Good Few-Shot Learner
Computer Vision – ECCV 2022
Abstract
Existing few-shot learning (FSL) methods rely on training with a large labeled dataset, which prevents them from leveraging abundant unlabeled data. From an information-theoretic perspective, we propose an effective unsupervised FSL method, ...
Read More
Better Self-training for Image Classification Through Self-supervision
AI 2021: Advances in Artificial Intelligence
Abstract
Self-training is a simple semi-supervised learning approach: Unlabelled examples that attract high-confidence predictions are labelled with their predictions and added to the training set, with this process being repeated multiple times. Recently, ...
Read More
Self-distillation and self-supervision for partial label learning
Abstract
As a main branch of weakly supervised learning paradigm, partial label learning (PLL) copes with the situation where each sample corresponds to ambiguous candidate labels containing the unknown true label. The primary difficulty of PLL lies in ...
Highlights
- The multi-task framework integrates self-supervision and self-distillation for PLL.
- Self-supervised module works as an auxiliary task to capture better representations.
- Self-distillation module is proposed by weightily aggregating ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 17, Issue 9
November 2023
373 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3604532
Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2023
- Online AM: 1 May 2023
- Accepted: 10 April 2023
- Revised: 20 October 2022
- Received: 9 December 2021
Published in tkdd Volume 17, Issue 9

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Self-supervised learning
representation learning
tabular data
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 292
  Total Downloads
- Downloads (Last 12 months)292
- Downloads (Last 6 weeks)24
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Self-supervision for Tabular Data by Learning to Predict Additive Homoskedastic Gaussian Noise as Pretext

ACM Transactions on Knowledge Discovery from Data

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Self-Supervision Can Be a Good Few-Shot Learner

Better Self-training for Image Classification Through Self-supervision

Self-distillation and self-supervision for partial label learning