Skip to main content
Log in

Variable selection for categorical response: a comparative study

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Variable selection is a well-studied problem in linear regression, but the existing works mostly deal with continuous responses. However, in many applications, we come across data with categorical responses. In the classical (frequentist) approach there exists penalized regression methods (e.g. logistic Lasso) which can be used for variable selection when we have a categorical response, and a large number of predictors. In this paper, we compare the performance of three alternative approaches for handling data with a single categorical response and multiple continuous (or count) predictors. In addition to the well-known logistic Lasso, we consider a model-based Bayesian approach, and a model-free approach for variable selection. We consider a binary response, and a response with three categories. Through extensive simulation studies we compare the performance of these three competing methods. We observe that the model-based methods can often accurately identify the important predictors, but sometimes fail to detect the unimportant ones. Also the model-based approaches are computationally expensive whereas the model-free approach is extremely fast. For misspecified models, the model-free method really outperforms in prediction. However, when the predictors are correlated (moderately or substantially) then the model-based methods perform better than the model-free method. We analyse the well-known Pima Indian Diabetes dataset for illustrating the effectiveness of three competing methods under consideration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kiranmoy Das.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sen, S., Kundu, D. & Das, K. Variable selection for categorical response: a comparative study. Comput Stat 38, 809–826 (2023). https://doi.org/10.1007/s00180-022-01260-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-022-01260-1

Keywords