Abstract
The lack of scalability of data annotation translates to the need to decrease dependency on labels. Self-supervision offers a solution with data training themselves. However, it has received relatively less attention on tabular data, data that drive a large proportion of business and application domains. This work, which we name the Statistical Self-Supervisor (SSS), proposes a method for self-supervision on tabular data by defining a continuous perturbation as pretext. It enables a neural network to learn representations by learning to predict the level of additive isotropic Gaussian noise added to inputs. The choice of the pretext transformation is motivated by intrinsic characteristics of a neural network fundamentally performing linear fits under the widely adopted assumption of Gaussianity in its fitting error and the preservation of locality of a data example on the data manifold in the presence of small random perturbations. The transform condenses information in the generated representations, making them better employable for further task-specific prediction as evidenced by performance improvement of the downstream classifier. To evaluate the persistence of performance under low-annotation settings, SSS is evaluated against different levels of label availability to the downstream classifier (1% to 100%) and benchmarked against self- and semi-supervised methods. At the most label-constrained, 1% setting, we report a maximum increase of at least 2.5% against the next-best semi-supervised competing method. We report an increase of more than 1.5% against self-supervised state of the art. Ablation studies also reveal that increasing label availability from 0% to 1% results in a maximum increase of up to 50% on either of the five performance metrics and up to 15% thereafter, indicating diminishing returns in additional annotation.
- [1] Muhammad Ahmad, Behroz Mirza, Behraj Khan, and Tahir Syed. 2022. Task memorization for incremental learning with a common neural network. Retrieved from https://www.researchgate.net/profile/Tahir-Syed/publication/339165067_Task_memorization_for_incremental_learning_with_a_common_neural_network_architecture/links/5e9f300292851c2f52ba40ef/Task-memorization-for-incremental-learning-with-a-common-neural-network-architecture.pdf.Google Scholar
- [2] E. Alpaydin. 1996. Pen based Recognition Dataset. http://archive.ics.uci.edu/ml/datasets/pen-based+recognition+of+handwritten+digits. [Online; accessed -2019].Google Scholar
- [2] . (2021). TabNet: Attentive interpretable tabular learning. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- [4] . 2019. Mixmatch: A holistic approach to semi-supervised learning. InAdvances in Neural Information Processing Systems (2019).Google Scholar
- [5] . 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory (1998), 92–100.Google Scholar
- [6] Chen Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.Google Scholar
- [7] Noam Chomsky et al. 2006. On cognitive structures and their development: A reply to Piaget. In Philosophy of Mind: Classical Problems/Contemporary Issues. Routledge and Kegan Paul, 751–755.Google Scholar
- [8] . 2019. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9268–9277.Google Scholar
- [9] . 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision (2015), 1422–1430.Google Scholar
- [10] . 2020. DREAM Architecture: A developmental approach to open-ended learning in robotics. arXiv:2005.06223. Retrieved from https://arxiv.org/abs/2005.06223.Google Scholar
- [11] . 2013. Tree ensembles for predicting structured outputs. Pattern Recogn. 46, 4 (2013), 817–833.Google Scholar
- [12] Nikos Komodakis and Spyros Gidaris. 2018. Unsupervised representation learning by predicting image rotations. In International Conference on Learning Representations (ICLR’18).Google Scholar
- [13] . 2014. Fast and simple gradient-based optimization for semi-supervised support vector machines. Neurocomputing 123, 1 (2014), 23–32.Google ScholarDigital Library
- [14] Chuan Guo, Jared Frank, and Kilian Weinberger. 2020. Low frequency adversarial perturbation In Uncertainty in Artificial Intelligence. 1127–1137.Google Scholar
- [15] . 2008. Co-training by committee: A generalized framework for semi-supervised learning with committees. Int. J. Softw. Inf. 2, 2 (2008), 95–124.Google Scholar
- [16] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729–9738.Google Scholar
- [17] Kaggle. 2011. Give Me Some Credit. https://www.kaggle.com/datasets/brycecf/give-me-some-credit-dataset. [Online; accessed - 2019].Google Scholar
- [18] Machine Learning Group. 2013. Credit Card Fraud Detection. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud. [Online; accessed -2019].Google Scholar
- [19] Jeff Schlimmer. 1987. Mushroom Dataset. https://archive.ics.uci.edu/ml/datasets/Mushroom. [Online; accessed -2019].Google Scholar
- [20] Ronny Kohavi. 1996. Adult Income Dataset. https://archive.ics.uci.edu/ml/datasets/Adult/. [Online; accessed -2019].Google Scholar
- [21] . 2018. CST-voting: A semi-supervised ensemble method for classification problems. J. Intell. Fuzzy Syst. 35, 1 (2018), 99–109.Google ScholarDigital Library
- [22] . 2019. Retrieved from https://twitter.com/ylecun/status/1140445577408327683.Google Scholar
- [23] . 2017. Semi-supervised classification trees. J. Intell. Inf. Syst. 49, 3 (2017), 481–486.Google ScholarDigital Library
- [24] . 2015. Contrastive pessimistic likelihood estimation for semi-supervised classification. IEEE Trans. Pattern Anal. Mach. Intell. 38, 3 (2015), 462–475.Google ScholarDigital Library
- [25] . 2016. From softmax to sparsemax: A sparse model of attention and multi-label classification. In Proceedings of the International Conference on Machine Learning.Google Scholar
- [26] . 2020. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition. In Advances in Neural Information Processing Systems.Google Scholar
- [27] . 2019. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 41, 8 (2019), 1979–1993.Google ScholarCross Ref
- [28] . 2016. Context encoders: Feature learning by in painting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2536–2544.Google Scholar
- [29] . 2016. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- [30] . 2016. The Synthetic Data Vault: Generative Modeling for Relational Databases. Ph.D. Dissertation.Google Scholar
- [31] Oliver Roesler. 2013. EEG Eye State Dataset. https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State. [Online; accessed -2019].Google Scholar
- [32] . 2016. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In Advances in Neural Information Processing Systems. 1163–1171.Google ScholarDigital Library
- [33] . 2018. Semi-supervised self-training for decision tree classifiers. Int. J. Mach. Learn. Cybernet. 8, 1 (2018), 335–370.Google Scholar
- [34] . 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems. 1195–1204.Google Scholar
- [35] . 2021. SubTab: Subsetting features of tabular data for self-supervised representation learning. In Proceedings of the 35th Conference on Neural Information Processing Systems.Google Scholar
- [36] . 2018. Representation learning with contrastive predictive coding. (unpublished).Google Scholar
- [37] . 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning. ACM.Google Scholar
- [38] . 2019. PRNet: Self-supervised learning for partial-to-partial registration. In Advances in Neural Information Processing Systems. 8814–8826.Google Scholar
- [39] . 2018. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3733–13742.Google Scholar
- [40] . 2019. S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision.Google Scholar
- [41] . 2020. VIME: Extending the success of self-and semi-supervised learning to tabular domain. In Advances in Neural Information Processing Systems.Google Scholar
- [42] . 2017. mixup: Beyond empirical risk minimization. arXiv:1710.09412. Retrieved from https://arxiv.org/abs/1710.09412.Google Scholar
- [43] . 2016. Colorful image colorization. InProceedings of the European Conference on Computer Vision. 649–666.Google Scholar
- [44] . 2017. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1058–1067.Google Scholar
- [45] . 2005. Semi-supervised Learning Literature Survey.
Technical Report . Department of Computer Sciences, University of Wisconsin—Madison.Google Scholar
Index Terms
- Self-supervision for Tabular Data by Learning to Predict Additive Homoskedastic Gaussian Noise as Pretext
Recommendations
Self-Supervision Can Be a Good Few-Shot Learner
Computer Vision – ECCV 2022AbstractExisting few-shot learning (FSL) methods rely on training with a large labeled dataset, which prevents them from leveraging abundant unlabeled data. From an information-theoretic perspective, we propose an effective unsupervised FSL method, ...
Better Self-training for Image Classification Through Self-supervision
AI 2021: Advances in Artificial IntelligenceAbstractSelf-training is a simple semi-supervised learning approach: Unlabelled examples that attract high-confidence predictions are labelled with their predictions and added to the training set, with this process being repeated multiple times. Recently, ...
Self-distillation and self-supervision for partial label learning
AbstractAs a main branch of weakly supervised learning paradigm, partial label learning (PLL) copes with the situation where each sample corresponds to ambiguous candidate labels containing the unknown true label. The primary difficulty of PLL lies in ...
Highlights- The multi-task framework integrates self-supervision and self-distillation for PLL.
- Self-supervised module works as an auxiliary task to capture better representations.
- Self-distillation module is proposed by weightily aggregating ...
Comments