Skip to main content
Log in

Variable selection in discriminant analysis for mixed continuous-binary variables and several groups

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

We propose a method for variable selection in discriminant analysis with mixed continuous and binary variables. This method is based on a criterion that permits to reduce the variable selection problem to a problem of estimating suitable permutation and dimensionality. Then, estimators for these parameters are proposed and the resulting method for selecting variables is shown to be consistent. A simulation study that permits to study several properties of the proposed approach and to compare it with an existing method is given, and an example on a real data set is provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Aspakourov O, Krzanowski WJ (2000) Non-parametric smoothing of the location model in mixed variables discrimination. Stat Comput 10:289–297

    Article  Google Scholar 

  • Bar-Hen A, Daudin JJ (1995) Generalization of the Mahalanobis distance in the mixed case. J Multivar Anal 53:332–342

    Article  MathSciNet  MATH  Google Scholar 

  • Bedrick EJ, Lapidus J, Powell JF (2000) Estimating the Mahalanobis distance from mixed continuous and discrete data. Biometrics 56:394–401

    Article  MathSciNet  MATH  Google Scholar 

  • Chang PC, Afifi AA (1979) Classification based on dichotomous and continue variables. J Am Stat Assoc 69:336–339

    Article  MATH  Google Scholar 

  • Daudin JJ (1986) Selection of variables in mixed-variable discriminant analysis. Biometrics 42:473–481

    Article  MathSciNet  Google Scholar 

  • Daudin JJ, Bar-Hen A (1999) Selection in discriminant analysis with continuous and discrete variables. Comput Stat Data Anal 32:161–175

    Article  Google Scholar 

  • De Leon AR, Carriere KC (2005) A generalized Mahalanobis distance for mixed data. J Multivar Anal 92:174–185

    Article  MathSciNet  MATH  Google Scholar 

  • De Leon AR, Soo A, Williamson T (2011) Classification with discrete and continuous variables via general mixed-data models. J Appl Stat 38:1021–1032

    Article  MathSciNet  Google Scholar 

  • Fujikoshi Y (1982) A test for additional information in canonical correlation analysis. Ann Inst Stat Math 34:523–530

    Article  MathSciNet  MATH  Google Scholar 

  • Fujikoshi Y (1985) Selection of variables in two-group discriminant analysis by error rate and Akaike’s information criteria. J Multivar Anal 17:27–37

    Article  MathSciNet  MATH  Google Scholar 

  • Hand DJ (1997) Construction and assessment of classification rules. Wiley, Chichester

    MATH  Google Scholar 

  • Krusinska E (1989a) New procedure for selection of variables in location model for mixed variable discrimination. Biom J 31:511–523

    Article  MathSciNet  Google Scholar 

  • Krusinska E (1989b) Two step semi-optimal branch and bound algorithm for feature selection in mixed variable discrimination. Pattern Recognit. 22:455–459

    Article  Google Scholar 

  • Krusinska E (1990) Suitable location model selection in the terminology of graphical models. Biom J 32:817–826

    Article  Google Scholar 

  • Krzanowski WJ (1975) Discrimination and classification using both binary and continuous variables. J Am Stat Assoc 70:782–790

    Article  MATH  Google Scholar 

  • Krzanowski WJ (1983) Stepwise location model choice in mixed variable discrimination. J R Stat Soc C 32:260–266

    Google Scholar 

  • Krzanowski WJ (1984) On the null distribution of distance between two groups, using mixed continuous and categorical variables. J Classif 1:243–253

    Article  MATH  Google Scholar 

  • Mahat NI, Krzanowski WJ, Hernandez A (2007) Variable selection in discriminant analysis based on the location model for mixed variables. Adv Data Anal Classif 1:105–122

    Article  MathSciNet  MATH  Google Scholar 

  • McKay RJ (1977) Simultaneous procedures for variable selection in multiple discriminant analysis. Biometrika 64:283–290

    Article  MathSciNet  MATH  Google Scholar 

  • McLachlan GJ (1992) Discriminant analysis and statistical pattern recognition. Wiley, New York

    Book  MATH  Google Scholar 

  • Nkiet GM (2012) Direct variable selection for discrimination among several groups. J Multivar Anal 105:151–163

    Article  MathSciNet  MATH  Google Scholar 

  • Olkin I, Tate RF (1961) Multivariate correlation models with mixed discrete and continuous variables. Ann Math Stat 32:448–465. J Multivar Anal 105:151–163

    Google Scholar 

Download references

Acknowledgements

We are very grateful to two anonymous referees for their helpful and constructive comments, which led to a much improved manuscript. Research by Alban Mbina Mbina was supported in part by the Agence Universitaire de la Francophonie (AUF).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guy Martial Nkiet.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 160 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mbina Mbina, A., Nkiet, G.M. & Eyi Obiang, F. Variable selection in discriminant analysis for mixed continuous-binary variables and several groups. Adv Data Anal Classif 13, 773–795 (2019). https://doi.org/10.1007/s11634-018-0343-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-018-0343-0

Keywords

Mathematics Subject Classification

Navigation