Abstract
In localized logistic regression (cp. Loader, Local regression and likelihood, Springer, New York, 1999; Tutz and Binder, Statistics and Computing 15:155–166, 2005) at each target point where a prediction is required a logistic regression model is fitted locally. This is achieved by weighting the training observations in the log-likelihood based on their distances to the target observation. For interval-scaled influential factors these weights usually depend on Euclidean distances. This paper aims to combine localized logistic regression with dissimilarity measures more suitable for categorical data.
Categorical predictors are usually included into regression models by constructing design variables. Therefore, in principle distance measures can be defined based either on the original variables or on the design variables. In the first case matching coefficients, e.g., the simple or flexible matching coefficients, can be applied. In the second case Euclidean distances are suitable, too, since design variables can be considered interval-scaled.
Localized logistic regression with the proposed dissimilarity measures is applied to a SNP data set from the GENICA breast cancer study (cp. Justenhoven et al., Cancer Epidemiology Biomarkers and Prevention 13:2059–2064, 2004) in order to identify combinations of SNP variables that can be used to discriminate between cases and controls. By means of localized logistic regression one of the lowest error rates in combination with a maximal reduction of the number of predictors is achieved.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Anderberg, M. R. (1973). Cluster analysis for applications. New York: Academic Press.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1983). CART: Classification and regression trees. Belmont, CA: Wadsworth.
Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling (2nd ed.). Boca Raton: Chapman & Hall/CRC.
Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models (2nd ed.). New York: Springer.
GENICA Network (n.d.). Brauch, H., Brüning, Th., Hamann, U., & Ko, Y. http://www.genica.de.
Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York: Wiley.
Ickstadt, K., Müller, T., & Schwender, H. (2006). Analyzing SNPs: Are there needles in the haystack? Chance, 19(3), 22–27.
Justenhoven, C., Hamann, U., Pesch, B., Harth, V., Rabstein, S., Baisch, C., et al. (2004). ERCC2 genotypes and a corresponding haplotype are linked with breast cancer risk in a German population. Cancer Epidemiology Biomarkers and Prevention, 13, 2059–2064.
Loader, C. (1999). Local regression and likelihood. Statistics and computing. New York: Springer.
Ruczinski, I., Kooperberg, C., & LeBlanc, M. L. (2003). Logic regression. Journal of Computational and Graphical Statistics, 12, 475–511.
Schwender, H., Rabstein, S., & Ickstadt, K. (2006). Do you speak genomish? Chance, 19(3), 4–11.
Tutz, G., & Binder, H. (2005). Localized classification. Statistics and Computing, 15, 155–166.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schiffner, J., Szepannek, G., Monthé, T., Weihs, C. (2009). Localized Logistic Regression for Categorical Influential Factors. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-01044-6_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01043-9
Online ISBN: 978-3-642-01044-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)