short-paper

Semi-Supervised Learning with Data Augmentation for Tabular Data

Authors:
Junpeng Fang

Ant Group, Hangzhou, China

Ant Group, Hangzhou, China
View Profile

,
Caizhi Tang

Ant Group, Hangzhou, China

Ant Group, Hangzhou, China
View Profile

,
Qing Cui

Ant Group, Hangzhou, China

Ant Group, Hangzhou, China
View Profile

,
Feng Zhu

Ant Group, Hangzhou, China

Ant Group, Hangzhou, China
View Profile

,
Longfei Li

Ant Group, Hangzhou, China

Ant Group, Hangzhou, China
View Profile

,
Jun Zhou

Ant Group, Hangzhou, China

Ant Group, Hangzhou, China
View Profile

,
Wei Zhu

Ant Group, Hangzhou, China

Ant Group, Hangzhou, China
View Profile

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementOctober 2022Pages 3928–3932https://doi.org/10.1145/3511808.3557699

Published:17 October 2022Publication History

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Pages 3928–3932

ABSTRACT

Data augmentation-based semi-supervised learning (SSL) methods have made great progress in computer vision and natural language processing areas. One of the most important factors is that the semantic structure invariance of these data allows the augmentation procedure (e.g., rotating images or masking words) to thoroughly utilize the enormous amount of unlabeled data. However, the tabular data does not possess an obvious invariant structure, and therefore similar data augmentation methods do not apply to it. To fill this gap, we present a simple yet efficient data augmentation method particular designed for tabular data and apply it to the SSL algorithm: SDAT (Semi-supervised learning with Data Augmentation for Tabular data). We adopt a multi-task learning framework that consists of two components: the data augmentation procedure and the consistency training procedure. The data augmentation procedure which perturbs in latent space employs a variational auto-encoder (VAE) to generate the reconstructed samples as augmented samples. The consistency training procedure constrains the predictions to be invariant between the augmented samples and the corresponding original samples. By sharing a representation network (encoder), we jointly train the two components to improve effectiveness and efficiency. Extensive experimental studies validate the effectiveness of the proposed method on the tabular datasets.

References

Naveed Akhtar and Ajmal Mian. 2018. Threat of adversarial attacks on deep learning in computer vision: A survey. Ieee Access 6 (2018), 14410--14430.Google ScholarCross Ref
David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, and Colin Raffel. 2020. ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring. In International Conference on Learning Representations.Google Scholar
David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. MixMatch: A Holistic Approach to Semi-Supervised Learning. In Advances in Neural Information Processing Systems, Vol. 32.Google Scholar
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1--27:27. Issue 3. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google ScholarDigital Library
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
Andre Esteva, Anuprit Kale, Romain Paulus, Kazuma Hashimoto, Wenpeng Yin, Dragomir Radev, and Richard Socher. 2021. COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization. NPJ digital medicine 4, 1 (2021), 1--9.Google Scholar
Zuohui Fu, Yikun Xian, Yaxin Zhu, Shuyuan Xu, Zelong Li, Gerard De Melo, and Yongfeng Zhang. 2021. HOOPS: Human-in-the-Loop Graph Reasoning for Conversational Recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2415--2421.Google ScholarDigital Library
Yves Grandvalet, Yoshua Bengio, et al. 2005. Semi-supervised learning by entropy minimization. In CAP. 281--296.Google Scholar
William Grant Hatcher, Cheng Qian, Weichao Gao, Fan Liang, Kun Hua, and Wei Yu. 2021. Towards efficient and intelligent internet of things search engine. IEEE Access 9 (2021), 15778--15795.Google ScholarCross Ref
Diederik P Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. stat 1050 (2014), 10.Google Scholar
Samuli Laine and Timo Aila. 2017. Temporal ensembling for semi-supervised learning. In International Conference on Learning Representations.Google Scholar
AJ Lawrance and PAW Lewis. 1977. An exponential moving-average sequence and point process (EMA1). Journal of Applied Probability (1977), 98--113.Google Scholar
Dong-Hyun Lee et al. 2013. Pseudo-label: The simple and efficient semisupervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, Vol. 3.Google Scholar
Yunqi Li, Yingqiang Ge, and Yongfeng Zhang. 2021. Tutorial on Fairness of Machine Learning in Recommender Systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2654--2657.Google ScholarDigital Library
Antti Tarvainen and Harri Valpola. 2017. Mean Teachers Are Better Role Models: Weight-Averaged Consistency Targets Improve Semi-Supervised Deep Learning Results. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Red Hook, NY, USA, 1195--1204.Google Scholar
Jesper E Van Engelen and Holger H Hoos. 2020. A survey on semi-supervised learning. Machine Learning 109, 2 (2020), 373--440.Google ScholarCross Ref
Athanasios Voulodimos, Nikolaos Doulamis, Anastasios Doulamis, and Eftychios Protopapadakis. 2018. Deep learning for computer vision: A brief review. Computational intelligence and neuroscience 2018 (2018).Google Scholar
Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. 2020. Unsupervised Data Augmentation for Consistency Training. In Advances in Neural Information Processing Systems, Vol. 33. 6256--6268.Google Scholar
Jinsung Yoon, Yao Zhang, James Jordon, and Mihaela van der Schaar. 2020. Vime: Extending the success of self-and semi-supervised learning to tabular domain. Advances in Neural Information Processing Systems 33 (2020), 11033--11043.Google Scholar

Index Terms

Semi-Supervised Learning with Data Augmentation for Tabular Data
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Semi-supervised learning settings

Recommendations

Deep semi-supervised learning with contrastive learning and partial label propagation for image data
Abstract
Deep semi-supervised learning is becoming an active research topic because it jointly utilizes labeled and unlabeled samples in training deep neural networks. Recent advances are mainly focused on inductive semi-supervised learning ...
Read More
Data augmentation and semi-supervised learning for deep neural networks-based text classifier
SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

User feedback is essential for understanding user needs. In this paper, we use free-text obtained from a survey on sleep-related issues to build a deep neural networks-based text classifier. However, to train the deep neural networks model, a lot of ...
Read More
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
October 2022
5274 pages
ISBN:9781450392365
DOI:10.1145/3511808
General Chairs:
Mohammad Al Hasan
Indiana University Purdue University, Indianapolis, USA
,
Li Xiong
Emory University, Atlanta, USA
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data augmentation
semi-supervised learning
tabular data
Qualifiers
- short-paper
Conference

Acceptance Rates
CIKM '22 Paper Acceptance Rate621of2,257submissions,28%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 456
  Total Downloads
- Downloads (Last 12 months)278
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Semi-Supervised Learning with Data Augmentation for Tabular Data

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Deep semi-supervised learning with contrastive learning and partial label propagation for image data

Data augmentation and semi-supervised learning for deep neural networks-based text classifier

Inductive Semi-supervised Multi-Label Learning with Co-Training

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Semi-Supervised Learning with Data Augmentation for Tabular Data

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Deep semi-supervised learning with contrastive learning and partial label propagation for image data

Data augmentation and semi-supervised learning for deep neural networks-based text classifier

Inductive Semi-supervised Multi-Label Learning with Co-Training

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media