skip to main content
10.1145/3637528.3671893acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Can a Deep Learning Model be a Sure Bet for Tabular Prediction?

Published: 24 August 2024 Publication History

Abstract

Data organized in tabular format is ubiquitous in real-world applications, and users often craft tables with biased feature definitions and flexibly set prediction targets of their interests. Thus, a rapid development of a robust, effective, dataset-versatile, user-friendly tabular prediction approach is highly desired. While Gradient Boosting Decision Trees (GBDTs) and existing deep neural networks (DNNs) have been extensively utilized by professional users, they present several challenges for casual users, particularly: (i) the dilemma of model selection due to their different dataset preferences, and (ii) the need for heavy hyperparameter searching, failing which their performances are deemed inadequate. In this paper, we delve into this question: Can we develop a deep learning model that serves as a sure bet solution for a wide range of tabular prediction tasks, while also being user-friendly for casual users? We delve into three key drawbacks of deep tabular models, encompassing: (P1) lack of rotational variance property, (P2) large data demand, and (P3) over-smooth solution. We propose ExcelFormer, addressing these challenges through a semi-permeable attention module that effectively constrains the influence of less informative features to break the DNNs' rotational invariance property (for P1), data augmentation approaches tailored for tabular data (for P2), and attentive feedforward network to boost the model fitting capability (for P3). These designs collectively make ExcelFormer a sure bet solution for diverse tabular datasets. Extensive and stratified experiments conducted on real-world datasets demonstrate that our model outperforms previous approaches across diverse tabular data prediction tasks, and this framework can be friendly to casual users, offering ease of use without the heavy hyperparameter tuning. The codes are available at https://github.com/whatashot/excelformer.

Supplemental Material

MP4 File - Can a Deep Learning Model be a Sure Bet for Tabular Prediction?
In summary, this paper shows that ExcelFormer excels with any dataset types without requiring hyperparameter tuning, while maintaining comparable model complexity. Datasets and codes are publicly available.

References

[1]
Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A next-generation hyperparameter optimization framework. In The ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
[2]
Sercan Ö Arik and Tomas Pfister. 2021. TabNet: Attentive interpretable tabular learning. In The AAAI Conference on Artificial Intelligence.
[3]
Thomas Bachlechner, Bodhisattwa Prasad Majumder, Henry Mao, Gary Cottrell, and Julian McAuley. 2021. Rezero is all you need: Fast convergence at large depth. In Uncertainty in Artificial Intelligence.
[4]
Leo Breiman. 1996. Bagging predictors. Machine Learning (1996).
[5]
Jintai Chen, Kuanlun Liao, Yanwen Fang, Danny Z Chen, and Jian Wu. 2023. TabCaps: A Capsule Neural Network for Tabular Data Classification with BoW Routing. In International Conference on Learning Representations.
[6]
Jintai Chen, Kuanlun Liao, Yao Wan, Danny Z Chen, and Jian Wu. 2022. DANets: Deep abstract networks for tabular data classification and regression. In The AAAI Conference on Artificial Intelligence.
[7]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[8]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, et al. 2016. Wide & deep learning for recommender systems. In Workshop on Deep Learning for Recommender Systems.
[9]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics.
[10]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.
[11]
Yura Gorishniy, Ivan Rubachev, and Artem Babenko. 2022. On Embeddings for Numerical Features in Tabular Deep Learning. In Advances in Neural Information Processing Systems.
[12]
Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, and Artem Babenko. 2021. Revisiting deep learning models for tabular data. In Advances in Neural Information Processing Systems.
[13]
Leo Grinsztajn, Edouard Oyallon, and Gael Varoquaux. 2022. Why do tree-based models still outperform deep learning on typical tabular data?. In Advances in Neural Information Processing Systems.
[14]
Huifeng Guo, Bo Chen, Ruiming Tang, Weinan Zhang, Zhenguo Li, and Xiuqiang He. 2021. An embedding learning framework for numerical features in CTR prediction. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
[15]
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A factorization-machine based neural network for CTR prediction. In International Joint Conference on Artificial Intelligence.
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In International Conference on Computer Vision.
[17]
Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. 2022. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second. In International Conference on Learning Representations.
[18]
Alan Jeffares, Tennison Liu, Jonathan Crabbé, Fergus Imrie, and Mihaela van der Schaar. 2023. TANGOS: Regularizing tabular neural networks through gradient orthogonalization and specialization. arXiv preprint arXiv:2303.05506 (2023).
[19]
Arlind Kadra, Marius Lindauer, Frank Hutter, and Josif Grabocka. 2021. Well tuned simple nets excel on tabular datasets. Advances in Neural Information Processing Systems (2021).
[20]
Liran Katzir, Gal Elidan, and Ran El-Yaniv. 2020. Net-DNF: Effective deep modeling of tabular data. In International Conference on Learning Representations.
[21]
Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah. 2022. Transformers in vision: A survey. Comput. Surveys (2022).
[22]
Alisa Kim, Y Yang, Stefan Lessmann, Tiejun Ma, M-C Sung, and Johnnie EV Johnson. 2020. Can deep learning predict risky retail investors? A case study in financial risk behavior forecasting. European Journal of Operational Research (2020).
[23]
Jang-Hyun Kim, Wonho Choo, and Hyun Oh Song. 2020. Puzzle Mix: Exploiting saliency and local statistics for optimal mixup. In International Conference on Machine Learning.
[24]
Yann LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. 2002. Efficient backprop. Neural networks: Tricks of the Trade (2002).
[25]
Ilya Loshchilov and Frank Hutter. 2018. Decoupled Weight Decay Regularization. In International Conference on Learning Representations.
[26]
Andrew Y Ng. 2004. Feature selection, L1 vs. L2 regularization, and rotational invariance. In International Conference on Machine Learning.
[27]
Sergei Popov, Stanislav Morozov, and Artem Babenko. 2019. Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data. In International Conference on Learning Representations.
[28]
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems (2018).
[29]
Ivan Rubachev, Artem Alekberov, Yury Gorishniy, and Artem Babenko. 2022. Revisiting pretraining objectives for tabular deep learning. arXiv preprint arXiv:2207.03208 (2022).
[30]
Andrew M Saxe, James L McClelland, and Surya Ganguli. 2014. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In International Conference on Learning Representations.
[31]
Noam Shazeer. 2020. GLU variants improve Transformer. arXiv preprint arXiv:2002.05202 (2020).
[32]
Gowthami Somepalli, Micah Goldblum, Avi Schwarzschild, C Bayan Bruss, and Tom Goldstein. 2021. SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training. arXiv preprint arXiv:2106.01342 (2021).
[33]
Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. AutoInt: Automatic feature interaction learning via self-attentive neural networks. In ACM International Conference on Information and Knowledge Management.
[34]
Nima Tajbakhsh, Laura Jeyaseelan, Qian Li, Jeffrey N Chiang, Zhihao Wu, and Xiaowei Ding. 2020. Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation. Medical Image Analysis (2020).
[35]
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image Transformers & distillation through attention. In International Conference on Machine Learning.
[36]
Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, and Hervé Jégou. 2021. Going deeper with image Transformers. In IEEE/CVF International Conference on Computer Vision.
[37]
AFM Shahab Uddin, Mst Sirazam Monira, Wheemyung Shin, TaeChoong Chung, and Sung-Ho Bae. 2020. SaliencyMix: A Saliency Guided Data Augmentation Strategy for Better Regularization. In International Conference on Learning Representations.
[38]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems (2017).
[39]
Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, David Lopez-Paz, and Yoshua Bengio. 2019. Manifold Mixup: Better representations by interpolating hidden states. In International Conference on Machine Learning.
[40]
Devesh Walawalkar, Zhiqiang Shen, Zechun Liu, and Marios Savvides. 2020. Attentive Cutmix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification. In International Conference on Acoustics, Speech and Signal Processing.
[41]
Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross network for ad click predictions. In ADKDD.
[42]
Ruoxi Wang, Rakesh Shivanna, Derek Cheng, Sagar Jain, Dong Lin, Lichan Hong, and Ed Chi. 2021. DCN v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In The ACM Web Conference.
[43]
Zifeng Wang and Jimeng Sun. 2022. TransTab: Learning Transferable Tabular Transformers Across Tables. In Advances in Neural Information Processing Systems.
[44]
Zhuo Wang, Wei Zhang, Ning Liu, and Jianyong Wang. 2021. Scalable rule-based representation learning for interpretable classification. Advances in Neural Information Processing Systems (2021).
[45]
Zhuo Wang, Wei Zhang, LIU Ning, and Jianyong Wang. 2020. Transparent classification with multilayer logical perceptrons and random binarization. In The AAAI Conference on Artificial Intelligence.
[46]
Kevin Wu, Eric Wu, Michael DAndrea, Nandini Chitale, Melody Lim, Marek Dabrowski, Klaudia Kantor, Hanoor Rangi, Ruishan Liu, Marius Garmhausen, et al. 2022. Machine learning prediction of clinical trial operational efficiency. The AAPS Journal (2022).
[47]
Jiahuan Yan, Jintai Chen, Yixuan Wu, Danny Z Chen, and Jian Wu. 2023. T2G-Former: Organizing Tabular Features into Relation Graphs Promotes Heterogeneous Feature Interaction. The AAAI Conference on Artificial Intelligence (2023).
[48]
Jiahuan Yan, Bo Zheng, Hongxia Xu, Yiheng Zhu, Danny Chen, Jimeng Sun, Jian Wu, and Jintai Chen. 2024. Making Pre-trained Language Models Great on Tabular Prediction. In ICLR.
[49]
Jinsung Yoon, Yao Zhang, James Jordon, and Mihaela van der Schaar. 2020. VIME: Extending the success of self-and semi-supervised learning to tabular domain. Advances in Neural Information Processing Systems (2020).
[50]
Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. 2019. CutMix: Regularization strategy to train strong classifiers with localizable features. In International Conference on Computer Vision.
[51]
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2018. Mixup: Beyond Empirical Risk Minimization. In International Conference On Learning Representations.
[52]
Bingzhao Zhu, Xingjian Shi, Nick Erickson, Mu Li, George Karypis, and Mahsa Shoaran. 2023. XTab: Cross-table Pretraining for Tabular Transformers. In ICML.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2024
6901 pages
ISBN:9798400704901
DOI:10.1145/3637528
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. mixup
  2. tabular data prediction

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 521
    Total Downloads
  • Downloads (Last 12 months)521
  • Downloads (Last 6 weeks)61
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media