Skip to main content

Localized Logistic Regression for Categorical Influential Factors

  • Conference paper
  • First Online:

Abstract

In localized logistic regression (cp. Loader, Local regression and likelihood, Springer, New York, 1999; Tutz and Binder, Statistics and Computing 15:155–166, 2005) at each target point where a prediction is required a logistic regression model is fitted locally. This is achieved by weighting the training observations in the log-likelihood based on their distances to the target observation. For interval-scaled influential factors these weights usually depend on Euclidean distances. This paper aims to combine localized logistic regression with dissimilarity measures more suitable for categorical data.

Categorical predictors are usually included into regression models by constructing design variables. Therefore, in principle distance measures can be defined based either on the original variables or on the design variables. In the first case matching coefficients, e.g., the simple or flexible matching coefficients, can be applied. In the second case Euclidean distances are suitable, too, since design variables can be considered interval-scaled.

Localized logistic regression with the proposed dissimilarity measures is applied to a SNP data set from the GENICA breast cancer study (cp. Justenhoven et al., Cancer Epidemiology Biomarkers and Prevention 13:2059–2064, 2004) in order to identify combinations of SNP variables that can be used to discriminate between cases and controls. By means of localized logistic regression one of the lowest error rates in combination with a maximal reduction of the number of predictors is achieved.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Anderberg, M. R. (1973). Cluster analysis for applications. New York: Academic Press.

    MATH  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

    Article  MATH  Google Scholar 

  • Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1983). CART: Classification and regression trees. Belmont, CA: Wadsworth.

    Google Scholar 

  • Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling (2nd ed.). Boca Raton: Chapman & Hall/CRC.

    MATH  Google Scholar 

  • Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models (2nd ed.). New York: Springer.

    MATH  Google Scholar 

  • GENICA Network (n.d.). Brauch, H., Brüning, Th., Hamann, U., & Ko, Y. http://www.genica.de.

  • Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York: Wiley.

    MATH  Google Scholar 

  • Ickstadt, K., Müller, T., & Schwender, H. (2006). Analyzing SNPs: Are there needles in the haystack? Chance, 19(3), 22–27.

    MathSciNet  Google Scholar 

  • Justenhoven, C., Hamann, U., Pesch, B., Harth, V., Rabstein, S., Baisch, C., et al. (2004). ERCC2 genotypes and a corresponding haplotype are linked with breast cancer risk in a German population. Cancer Epidemiology Biomarkers and Prevention, 13, 2059–2064.

    Google Scholar 

  • Loader, C. (1999). Local regression and likelihood. Statistics and computing. New York: Springer.

    MATH  Google Scholar 

  • Ruczinski, I., Kooperberg, C., & LeBlanc, M. L. (2003). Logic regression. Journal of Computational and Graphical Statistics, 12, 475–511.

    Article  MathSciNet  Google Scholar 

  • Schwender, H., Rabstein, S., & Ickstadt, K. (2006). Do you speak genomish? Chance, 19(3), 4–11.

    MathSciNet  Google Scholar 

  • Tutz, G., & Binder, H. (2005). Localized classification. Statistics and Computing, 15, 155–166.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julia Schiffner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schiffner, J., Szepannek, G., Monthé, T., Weihs, C. (2009). Localized Logistic Regression for Categorical Influential Factors. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_17

Download citation

Publish with us

Policies and ethics