ABSTRACT
The presence of contamination can influence the performance of parameter estimation in the binary logistic regression. Additionally, the emergence of collinearity among independent variables also gives rise to the issue of multicollinearity. In this work, we propose a novel correlation-driven adaptive lasso algorithm designed to enhance the robustness of logistic regression by incorporating a trimming step. The efficacy of this approach stems from the synergistic utilization of correlation-driven trimming techniques, which collectively serve to mitigate the impact of contaminated observations. The algorithm is designed to select information highly correlated features adaptively and to detect outilers simultaneously by maximizing a trimmed likelihood function. The proposed method has been evaluated and compared with other exisitng methods through a simulation study. Finally, an application to a real data set is given.
- S. Le Cessie and J. C. Van Houwelingen. 1992. Ridge Estimators in Logistic Regression. Journal of the Royal Statistical Society. Series C (Applied Statistics) 41, 1 (1992), 191–201.Google ScholarCross Ref
- Araveeporn Autcha and Yuwadee Klomwises. 2020. The estimated parameter of logistic regression model by Markov Chain Monte Carlo method with multicollinearity. Statistical Journal of the IAOS 36, 4 (2020), 1253-1259.Google ScholarCross Ref
- Robert Tibshirani. 1996. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B. 58, 1 (1996), 267-288.Google Scholar
- Hui Zou and Trevor Hastie. 2005. Regularization and Variable Selection Via the Elastic Net. Journal of the Royal Statistical Society Series B: Statistical Methodology 67, 2 (2005), 301–320Google ScholarCross Ref
- Netti Herawati. 2020. Selecting the Method to Overcome Partial and Full Multicollinearity in Binary Logistic Model. International journal of statistics and applications 10, 3 (2020), 55-59.Google Scholar
- Autcha Araveeporn and Choojai Kuharatanachai. 2019. Comparing Penalized Regression Analysis of Logistic Regression Model with Multicollinearity. In Proceedings of the 2019 2nd International Conference on Mathematics and Statistics (ICoMS'19), 52-27.Google ScholarDigital Library
- Robert Tibshirani. 1996. Regression Shrinkage and Selection via the Lasso. Journal of the royal statistical society series biomethodological 58, 267-288.Google ScholarCross Ref
- Hui Zou. 2006. The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association 101, 1418 - 1429.Google ScholarCross Ref
- Miguel Patrcio, Jos Pereira, Joana Crisstomo, Paulo Matafome, Raquel Seia, and Francisco Caramelo. 2018. Breast Cancer Coimbra. UCI Machine Learning Repository.Google Scholar
- Kriegel, Hans Peter , Matthias Schubert , and Arthur Zimek . 2008. Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Las Vegas, Nevada, USA, 444-452.Google Scholar
- Abir Smiti. 2020. A critical overview of outlier detection methods. Computer Science Review 38, 100306.Google ScholarDigital Library
- Yasin Asar. 2017. Some new methods to solve multicollinearity in logistic regression. Communications in Statistics - Simulation and Computation 46, 4 (2017), 2576-2586.Google ScholarCross Ref
Index Terms
- A Correlation-Driven Adaptive Lasso for Robust Logistic Regression Model Using Trimming Step
Recommendations
Robust weighted LAD regression
The least squares linear regression estimator is well-known to be highly sensitive to unusual observations in the data, and as a result many more robust estimators have been proposed as alternatives. One of the earliest proposals was least-sum of ...
Robust fuzzy regression analysis
In this paper we propose a robust fuzzy linear regression model based on the Least Median Squares-Weighted Least Squares (LMS-WLS) estimation procedure. The proposed model is general enough to deal with data contaminated by outliers due to measurement ...
Robust mixture regression using the t-distribution
The traditional estimation of mixture regression models is based on the normal assumption of component errors and thus is sensitive to outliers or heavy-tailed errors. A robust mixture regression model based on the t-distribution by extending the ...
Comments