skip to main content
research-article

Self-supervision for Tabular Data by Learning to Predict Additive Homoskedastic Gaussian Noise as Pretext

Published:15 June 2023Publication History
Skip Abstract Section

Abstract

The lack of scalability of data annotation translates to the need to decrease dependency on labels. Self-supervision offers a solution with data training themselves. However, it has received relatively less attention on tabular data, data that drive a large proportion of business and application domains. This work, which we name the Statistical Self-Supervisor (SSS), proposes a method for self-supervision on tabular data by defining a continuous perturbation as pretext. It enables a neural network to learn representations by learning to predict the level of additive isotropic Gaussian noise added to inputs. The choice of the pretext transformation is motivated by intrinsic characteristics of a neural network fundamentally performing linear fits under the widely adopted assumption of Gaussianity in its fitting error and the preservation of locality of a data example on the data manifold in the presence of small random perturbations. The transform condenses information in the generated representations, making them better employable for further task-specific prediction as evidenced by performance improvement of the downstream classifier. To evaluate the persistence of performance under low-annotation settings, SSS is evaluated against different levels of label availability to the downstream classifier (1% to 100%) and benchmarked against self- and semi-supervised methods. At the most label-constrained, 1% setting, we report a maximum increase of at least 2.5% against the next-best semi-supervised competing method. We report an increase of more than 1.5% against self-supervised state of the art. Ablation studies also reveal that increasing label availability from 0% to 1% results in a maximum increase of up to 50% on either of the five performance metrics and up to 15% thereafter, indicating diminishing returns in additional annotation.

REFERENCES

  1. [1] Muhammad Ahmad, Behroz Mirza, Behraj Khan, and Tahir Syed. 2022. Task memorization for incremental learning with a common neural network. Retrieved from https://www.researchgate.net/profile/Tahir-Syed/publication/339165067_Task_memorization_for_incremental_learning_with_a_common_neural_network_architecture/links/5e9f300292851c2f52ba40ef/Task-memorization-for-incremental-learning-with-a-common-neural-network-architecture.pdf.Google ScholarGoogle Scholar
  2. [2] E. Alpaydin. 1996. Pen based Recognition Dataset. http://archive.ics.uci.edu/ml/datasets/pen-based+recognition+of+handwritten+digits. [Online; accessed -2019].Google ScholarGoogle Scholar
  3. [2] Arik Sercan and Pfister Tomas. (2021). TabNet: Attentive interpretable tabular learning. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Berthelot David, Carlini Nicholas, Goodfellow Ian, Papernot Nicolas, Oliver Avital, and Raffel Colin A.. 2019. Mixmatch: A holistic approach to semi-supervised learning. InAdvances in Neural Information Processing Systems (2019).Google ScholarGoogle Scholar
  5. [5] Blum Avrim and Mitchell Tom. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory (1998), 92100.Google ScholarGoogle Scholar
  6. [6] Chen Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.Google ScholarGoogle Scholar
  7. [7] Noam Chomsky et al. 2006. On cognitive structures and their development: A reply to Piaget. In Philosophy of Mind: Classical Problems/Contemporary Issues. Routledge and Kegan Paul, 751–755.Google ScholarGoogle Scholar
  8. [8] Cui Yin, Jia Menglin, Lin Tsung-Yi, Song Yang, and Belongie Serge. 2019. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 92689277.Google ScholarGoogle Scholar
  9. [9] Doersch Carl, Gupta Abhinav, and Efros Alexei. 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision (2015), 14221430.Google ScholarGoogle Scholar
  10. [10] Doncieux Stephane, Bredeche Nicolas, Goff Léni Le, Girard Benoît, Coninx Alexandre, Sigaud Olivier, Khamassi Mehdi, Díaz-Rodríguez Natalia, Filliat David, Hospedales Timothy, et al. 2020. DREAM Architecture: A developmental approach to open-ended learning in robotics. arXiv:2005.06223. Retrieved from https://arxiv.org/abs/2005.06223.Google ScholarGoogle Scholar
  11. [11] Dragi Kocev, Vens Celine, Struyf Jan, and Džeroski Sašo. 2013. Tree ensembles for predicting structured outputs. Pattern Recogn. 46, 4 (2013), 817833.Google ScholarGoogle Scholar
  12. [12] Nikos Komodakis and Spyros Gidaris. 2018. Unsupervised representation learning by predicting image rotations. In International Conference on Learning Representations (ICLR’18).Google ScholarGoogle Scholar
  13. [13] Gieseke Fabian, Airola Antti, Pahikkala Tapio, and Kramer Oliver. 2014. Fast and simple gradient-based optimization for semi-supervised support vector machines. Neurocomputing 123, 1 (2014), 2332.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Chuan Guo, Jared Frank, and Kilian Weinberger. 2020. Low frequency adversarial perturbation In Uncertainty in Artificial Intelligence. 1127–1137.Google ScholarGoogle Scholar
  15. [15] Hady Mohamed, Farouk Abdel, and Schwenker Friedhelm. 2008. Co-training by committee: A generalized framework for semi-supervised learning with committees. Int. J. Softw. Inf. 2, 2 (2008), 95124.Google ScholarGoogle Scholar
  16. [16] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729–9738.Google ScholarGoogle Scholar
  17. [17] Kaggle. 2011. Give Me Some Credit. https://www.kaggle.com/datasets/brycecf/give-me-some-credit-dataset. [Online; accessed - 2019].Google ScholarGoogle Scholar
  18. [18] Machine Learning Group. 2013. Credit Card Fraud Detection. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud. [Online; accessed -2019].Google ScholarGoogle Scholar
  19. [19] Jeff Schlimmer. 1987. Mushroom Dataset. https://archive.ics.uci.edu/ml/datasets/Mushroom. [Online; accessed -2019].Google ScholarGoogle Scholar
  20. [20] Ronny Kohavi. 1996. Adult Income Dataset. https://archive.ics.uci.edu/ml/datasets/Adult/. [Online; accessed -2019].Google ScholarGoogle Scholar
  21. [21] Kostopoulos Georgios, Livieris Ioannis, Kotsiantis S., and Tampakas Vassilis. 2018. CST-voting: A semi-supervised ensemble method for classification problems. J. Intell. Fuzzy Syst. 35, 1 (2018), 99109.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] LeCun Yann. 2019. Retrieved from https://twitter.com/ylecun/status/1140445577408327683.Google ScholarGoogle Scholar
  23. [23] Levatić Jurica, Ceci Michelangelo, Kocev Dragi, and Dźeroski Sašo. 2017. Semi-supervised classification trees. J. Intell. Inf. Syst. 49, 3 (2017), 481486.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Loog Marco. 2015. Contrastive pessimistic likelihood estimation for semi-supervised classification. IEEE Trans. Pattern Anal. Mach. Intell. 38, 3 (2015), 462475.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Martins Andre and Astudillo Ramon. 2016. From softmax to sparsemax: A sparse model of attention and multi-label classification. In Proceedings of the International Conference on Machine Learning.Google ScholarGoogle Scholar
  26. [26] Misra Ishan and Maaten Laurens van der. 2020. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  27. [27] Miyato Takeru, Maeda Shinichi, Koyama Masanori, and Ishii Shin. 2019. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 41, 8 (2019), 19791993.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Pathak Deepak, Krahenbuhl Philipp, Donahue Jeff, Darrell Trevor, and Efros Alexei A.. 2016. Context encoders: Feature learning by in painting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 25362544.Google ScholarGoogle Scholar
  29. [29] Pathak Deepak, Krahenbuhl Philipp, Donahue Jeff, Darrell Trevor, and Efros Alexei. 2016. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  30. [30] Patki Neha. 2016. The Synthetic Data Vault: Generative Modeling for Relational Databases. Ph.D. Dissertation.Google ScholarGoogle Scholar
  31. [31] Oliver Roesler. 2013. EEG Eye State Dataset. https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State. [Online; accessed -2019].Google ScholarGoogle Scholar
  32. [32] Sajjadi Mehdi, Javanmardi Mehran, and Tasdizen Tolga. 2016. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In Advances in Neural Information Processing Systems. 11631171.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Tanha Jafar, Someren Maarten van, and Afsarmanesh Hamideh. 2018. Semi-supervised self-training for decision tree classifiers. Int. J. Mach. Learn. Cybernet. 8, 1 (2018), 335370.Google ScholarGoogle Scholar
  34. [34] Tarvainen Antti and Valpola Harri. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems. 11951204.Google ScholarGoogle Scholar
  35. [35] Ucar Talip, Hajiramezanali Ehsan, and Edwards Lindsay. 2021. SubTab: Subsetting features of tabular data for self-supervised representation learning. In Proceedings of the 35th Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar
  36. [36] Oord Aaron van den, Li Yazhe, and Vinyals Oriol. 2018. Representation learning with contrastive predictive coding. (unpublished).Google ScholarGoogle Scholar
  37. [37] Vincent Pascal, Larochelle Hugo, Bengio Yoshua, and Manzagol Pierre-Antoine. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning. ACM.Google ScholarGoogle Scholar
  38. [38] Wang Yue and Solomon Justin M.. 2019. PRNet: Self-supervised learning for partial-to-partial registration. In Advances in Neural Information Processing Systems. 88148826.Google ScholarGoogle Scholar
  39. [39] Wu Zhirong, Xiong Yuanjun, Yu Stella X., and Lin Dahua. 2018. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 373313742.Google ScholarGoogle Scholar
  40. [40] Xiaohua Zhai, Oliver Avital, Kolesnikov Alexander, and Beyer Lucas. 2019. S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision.Google ScholarGoogle Scholar
  41. [41] Yoon Jinsung, Zhang Yao, Jordon James, and Schaar Mihaela van der. 2020. VIME: Extending the success of self-and semi-supervised learning to tabular domain. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  42. [42] Zhang Hongyi, Cisse Moustapha, Dauphin Yann, and Lopez-Paz David. 2017. mixup: Beyond empirical risk minimization. arXiv:1710.09412. Retrieved from https://arxiv.org/abs/1710.09412.Google ScholarGoogle Scholar
  43. [43] Zhang Richard, Isola Phillip, and Efros Alexei A.. 2016. Colorful image colorization. InProceedings of the European Conference on Computer Vision. 649666.Google ScholarGoogle Scholar
  44. [44] Zhang Richard, Isola Phillip, and Efros Alexei A.. 2017. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10581067.Google ScholarGoogle Scholar
  45. [45] Zhu Xiaojin Jerry. 2005. Semi-supervised Learning Literature Survey. Technical Report. Department of Computer Sciences, University of Wisconsin—Madison.Google ScholarGoogle Scholar

Index Terms

  1. Self-supervision for Tabular Data by Learning to Predict Additive Homoskedastic Gaussian Noise as Pretext

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Knowledge Discovery from Data
          ACM Transactions on Knowledge Discovery from Data  Volume 17, Issue 9
          November 2023
          373 pages
          ISSN:1556-4681
          EISSN:1556-472X
          DOI:10.1145/3604532
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 June 2023
          • Online AM: 1 May 2023
          • Accepted: 10 April 2023
          • Revised: 20 October 2022
          • Received: 9 December 2021
          Published in tkdd Volume 17, Issue 9

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)292
          • Downloads (Last 6 weeks)24

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text