research-article

Can a Deep Learning Model be a Sure Bet for Tabular Prediction?

Authors:

Jian Wu,

Jimeng SunAuthors Info & Claims

KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 288 - 296

https://doi.org/10.1145/3637528.3671893

Published: 24 August 2024 Publication History

Get Access

Abstract

Data organized in tabular format is ubiquitous in real-world applications, and users often craft tables with biased feature definitions and flexibly set prediction targets of their interests. Thus, a rapid development of a robust, effective, dataset-versatile, user-friendly tabular prediction approach is highly desired. While Gradient Boosting Decision Trees (GBDTs) and existing deep neural networks (DNNs) have been extensively utilized by professional users, they present several challenges for casual users, particularly: (i) the dilemma of model selection due to their different dataset preferences, and (ii) the need for heavy hyperparameter searching, failing which their performances are deemed inadequate. In this paper, we delve into this question: Can we develop a deep learning model that serves as a sure bet solution for a wide range of tabular prediction tasks, while also being user-friendly for casual users? We delve into three key drawbacks of deep tabular models, encompassing: (P1) lack of rotational variance property, (P2) large data demand, and (P3) over-smooth solution. We propose ExcelFormer, addressing these challenges through a semi-permeable attention module that effectively constrains the influence of less informative features to break the DNNs' rotational invariance property (for P1), data augmentation approaches tailored for tabular data (for P2), and attentive feedforward network to boost the model fitting capability (for P3). These designs collectively make ExcelFormer a sure bet solution for diverse tabular datasets. Extensive and stratified experiments conducted on real-world datasets demonstrate that our model outperforms previous approaches across diverse tabular data prediction tasks, and this framework can be friendly to casual users, offering ease of use without the heavy hyperparameter tuning. The codes are available at https://github.com/whatashot/excelformer.

Supplemental Material

MP4 File - Can a Deep Learning Model be a Sure Bet for Tabular Prediction?

In summary, this paper shows that ExcelFormer excels with any dataset types without requiring hyperparameter tuning, while maintaining comparable model complexity. Datasets and codes are publicly available.

Download
31.41 MB

References

[1]

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A next-generation hyperparameter optimization framework. In The ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.

Abstract

Supplemental Material

References

Index Terms

Recommendations

Deep Sparse Autoencoders for Football Match and Bet Prediction

TTNet: Tabular Transfer Network for Few-samples Prediction

Tabular representation of schema mappings: semantics and algorithms

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations