Abstract
Quantile regression (QR) models offer an interesting alternative compared with ordinary regression models for the response mean. Besides allowing a more appropriate characterization of the response distribution, the former is less sensitive to outlying observations than the latter. Indeed, the QR models allow modeling other characteristics of the response distribution, such as the lower and/or upper tails. However, in the presence of outlying observations, the estimates can still be affected. In this context, a robust quantile parametric regression model for bounded responses is developed, considering a new distribution, the Kumaraswamy Rectangular (KR) distribution. The KR model corresponds to a finite mixture structure similar to the Beta Rectangular distribution. That is, the KR distribution has heavier tails compared to the Kumaraswamy model. Indeed, we show that the correspondent KR quantile regression model is more robust and flexible than the usual Kumaraswamy one. Bayesian inference, which includes parameter estimation, model fit assessment, model comparison, and influence analysis, is developed through a hybrid-based MCMC approach. Since the quantile of the KR distribution is not analytically tractable, we consider the modeling of the conditional quantile based on a suitable data augmentation scheme. To link both quantiles in terms of a regression structure, a two-step estimation algorithm under a Bayesian approach is proposed to obtain the numerical approximation of the respective posterior distributions of the parameters of the regression structure for the KR quantile. Such an algorithm combines a Markov Chain Monte Carlo algorithm with the Ordinary Least Squares approach. Our proposal showed to be robust against outlying observations related to the response while keeping the estimation process simple without adding too much to the computational complexity. We showed the effectiveness of our estimation method with a simulation study, whereas two other studies showed some benefits of the proposed model in terms of robustness and flexibility. To exemplify the adequacy of our approach, under the presence of outlying observations, we analyzed two data sets regarding socio-economic indicators from Brazil and compared them with alternatives.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Atkinson, A.C.: Plots, Transformations, and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis. Clarendon Press, Oxford (1985)
Azevedo, C.L.N., Fox, J.-P., Andrade, D.F.: Bayesian longitudinal item response modeling with restricted covariance pattern structures. Stat. Comput. 26(1), 443–460 (2016). https://doi.org/10.1007/s11222-014-9518-5
Barndorff-Nielsen, O.E., Jørgensen, B.: Some parametric models on the simplex. J. Multivar. Anal. 39(1), 106–116 (1991)
Barnett, V., Lewis, T.: Outliers in Statistical Data. Wiley, Chichester (1994)
Barreto-Souza, W., Mayrink, V.D., Simas, A.B.: Bessel regression and bbreg package to analyse bounded data. Aust. N. Zeal. J. Stat. 63(4), 685–706 (2021)
Bayes, C.L., Bazán, J.L., García, C.: A new robust regression model for proportions. Bayesian Anal. 7(4), 841–866 (2012)
Bayes, C.L., Bazán, J.L., Castro, M.: A quantile parametric mixed regression model for bounded response variables. Stat. Interface 10, 483–493 (2017)
Benoit, D.F., Poel, D.: Binary quantile regression: a Bayesian approach based on the asymmetric Laplace distribution. J. Appl. Econom. 27(7), 1174–1188 (2012)
Bottai, M., Cai, B., McKeown, R.E.: Logistic quantile regression for bounded outcomes. Stat. Med. 29(2), 309–317 (2010)
Bouguila, N., Ziou, D., Monga, E.: Practical Bayesian estimation of a finite beta mixture through Gibbs sampling and its applications. Stat. Comput. 16(2), 215–225 (2006)
Bourguignon, M., Gallardo, D.I., Saulo, H.: A parametric quantile beta regression for modeling case fatality rates of COVID-19. arXiv (2021)
Box, G.E.P.: Sampling and Bayes’ inference in scientific modelling and robustness. J. R. Stat. Soc. Ser. A (General) 143(4), 383–404 (1980)
Brent, R.P.: Algorithms for Minimization Without Derivatives. Dover Books on Mathematics. Dover Publications, New Jersey (2013)
Buchinsky, M.: Recent advances in quantile regression models: a practical guideline for empirical research. J. Hum. Resour. 33(1), 88–126 (1998)
Carpenter, B., Gelman, A., Hoffman, M.D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., Riddell, A.: Stan: a probabilistic programming language. J. Stat. Softw. (2017). https://doi.org/10.18637/jss.v076.i01
Chen, M.-H., Shao, Q.-M., Ibrahim, J.G.: Monte Carlo Methods in Bayesian Computation. Springer, New York (2012)
Cho, H., Ibrahim, J.G., Sinha, D., Zhu, H.: Bayesian case influence diagnostics for survival models. Biometrics 65(1), 116–124 (2009)
Courard-Hauri, D.: Using Monte Carlo analysis to investigate the relationship between overconsumption and uncertain access to one’s personal utility function. Ecol. Econ. 64(1), 152–162 (2007)
Cribari-Neto, F., Souza, T.C.: Testing inference in variable dispersion beta regressions. J. Stat. Comput. Simul. 82(12), 1827–1843 (2012)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)
Dey, S., Mazucheli, J., Anis, M.Z.: Estimation of reliability of multicomponent stress-strength for a Kumaraswamy distribution. Commun. Stat. Theory Methods 46(4), 1560–1572 (2017)
Dunn, P.K., Smyth, G.K.: Randomized quantile residuals. J. Comput. Graph. Stat. 5(3), 236–244 (1996)
Dyk, D.A., Meng, X.-L.: The art of data augmentation. J. Comput. Graph. Stat. 10(1), 1–50 (2001)
Ferrari, S., Cribari-Neto, F.: Beta regression for modelling rates and proportions. J. Appl. Stat. 31(7), 799–815 (2004)
Ferrari, S.L.P., Espinheira, P.L., Cribari-Neto, F.: Diagnostic tools in beta regression with varying dispersion. Stat. Neerl. 65(3), 337–351 (2011)
Figueroa-Zúñiga, J.I., Arellano-Valle, R.B., Ferrari, S.L.P.: Mixed beta regression: a Bayesian perspective. Comput. Stat. Data Anal. 61, 137–147 (2013)
Fletcher, S.G., Kumaraswamy, P.: Estimation of reservoir yield and storage distribution using moments analysis. J. Hydrol. 182(1), 259–275 (1996)
Ganji, A., Kumaraswamy, P., Khalili, D., Karamouz, M.: Grain yield reliability analysis with crop water demand uncertainty. Stoch. Environ. Res. Risk Assess. 20(4), 259–277 (2006)
Gelfand, A., Dey, D., Chang, H.: Model determination using predictive distributions with implementation via sampling-based methods. Bayesian Stat. 4, 147 (1992)
Hahn, E.D.: Mixture densities for project management activity times: a robust approach to pert. Eur. J. Oper. Res. 188(2), 450–459 (2008)
John, O.O.: Robustness of quantile regression to outliers. Am. J. Appl. Math. Stat. 3(2), 86–88 (2015)
Jørgensen, B.: Proper dispersion models. Braz. J. Probab. Stat. 11(2), 89–128 (1997)
Kieschnick, R., McCullough, B.D.: Regression analysis of variates observed on (0, 1): percentages, proportions and fractions. Stat. Model. 3(3), 193–213 (2003)
Kızılaslan, F., Nadar, M.: Estimation of reliability in a multicomponent stress-strength model based on a bivariate Kumaraswamy distribution. Stat. Pap. 59(1), 307–340 (2018)
Koenker, R.: Quantile regression: 40 years on. Ann. Rev. Econ. 9(1), 155–176 (2017)
Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46(1), 33–50 (1978)
Kumaraswamy, P.: Stochastic simulation of weekly hydrological processes (with computer programs), part 1. Institute of Hydraulics and Hydrology, 34–72 (1976)
Kumaraswamy, P.: A generalized probability density function for double-bounded random processes. J. Hydrol. 46(1), 79–88 (1980)
Lemonte, A.J., Bazán, J.L.: New class of Johnson distributions and its associated regression model for rates and proportions. Biom. J. 58(4), 727–746 (2016)
Lemonte, A.J., Moreno-Arenas, G.: On a heavy-tailed parametric quantile regression model for limited range response variables. Comput. Stat. 35(1), 379–398 (2020)
Mazucheli, J., Menezes, A.F.B., Fernandes, L.B., Oliveira, R.P., Ghitany, M.E.: The unit-Weibull distribution as an alternative to the Kumaraswamy distribution for the modeling of quantiles conditional on covariates. J. Appl. Stat. 47(6), 954–974 (2020)
McDonald, J.B.: Some generalized functions for the size distribution of income. Econometrica 52(3), 647–663 (1984)
Migliorati, S., Brisco, A.M.D., Ongaro, A.: A new regression model for bounded responses. Bayesian Anal. 13(3), 845–872 (2018)
Mitnik, P.A., Baek, S.: The Kumaraswamy distribution: median-dispersion re-parameterizations for regression modeling and simulation-based estimation. Stat. Pap. 54(1), 177–192 (2013)
Mousa, A.M., El-Sheikh, A.A., Abdel-Fattah, M.A.: A gamma regression for bounded continuous variables. Adv. Appl. Stat. 49(4), 305–326 (2016)
Pinheiro, J.C., Liu, C., Wu, Y.N.: Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution. J. Comput. Graph. Stat. 10(2), 249–276 (2001)
Plummer, M.: JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling (2003)
Plummer, M.: Rjags: Bayesian Graphical Models Using MCMC. R package version 4-13 (2022)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2022)
Ribeiro, V.S.O., Nobre, J.S., Santos, J.R.S., Azevedo, C.L.N.: Beta rectangular regression models to longitudinal data. Braz. J. Probab. Stat. 35(4), 851–874 (2021)
Rubin, D.B.: Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Stat. 12(4), 1151–1172 (1984)
Sánchez, S., Ancheyta, J., McCaffrey, W.C.: Comparison of probability distribution functions for fitting distillation curves of petroleum. Energy Fuels 21(5), 2955–2963 (2007)
Santos, A.R.: Zero-one augmented heteroscedastic rectangular beta regression models. Unpublished Thesis (2017)
Seifi, A., Kumaraswamy, P., Vlach, J.: Maximization of manufacturing yield of systems with arbitrary distributions of component values. Ann. Oper. Res. 99, 373–383 (2000)
Shiryayev, A.N.: The Method of the Median in the Theory of Errors, pp. 115–117. Springer, Dordrecht (1992)
Silva, A.R.S., Azevedo, C.L.N., Bazán, J.L., Nobre, J.S.: Augmented-limited regression models with an application to the study of the risk perceived using continuous scales. J. Appl. Stat. 48(11), 1998–2021 (2021)
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 64(4), 583–639 (2002)
Sundar, V., Subbiah, K.: Application of double bounded probability density function for analysis of ocean waves. Ocean Eng. 16(2), 193–200 (1989)
Trecenti, J., Witkoski, K.: abjData: databases used routinely by the Brazilian jurimetrics association. R package version 1.1.2 (2022). https://CRAN.R-project.org/package=abjData
Verkuilen, J., Smithson, M.: Mixed and mixture regression models for continuous bounded responses using the beta distribution. J. Educ. Behav. Stat. 37(1), 82–113 (2012)
Wickham, H.: Ggplot2: Elegant Graphics for Data Analysis. Springer, New York (2016)
Yu, K., Moyeed, R.A.: Bayesian quantile regression. Stat. Probab. Lett. 54(4), 437–447 (2001)
Acknowledgements
We would like to express our sincere gratitude to the diligent efforts of the referees and Associate Editor for their insightful comments and valuable suggestions, which significantly enhanced the quality of this manuscript.
Funding
This study was partially financed by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Project Number 001. The authors are also thankfull to the Conselho Nacional de Desenvolvimento Científico e Tecnológico Grant Number 308058/2022-4, for a research scholarship granted to the second author, as well as to the Fundação de Amparo à Pesquisa do Estado de São Paulo, Grant Number 2020/16713-0, for providing an additional financial support, also granted to the second author.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Castro, M., Azevedo, C. & Nobre, J. A robust quantile regression for bounded variables based on the Kumaraswamy Rectangular distribution. Stat Comput 34, 74 (2024). https://doi.org/10.1007/s11222-024-10381-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-024-10381-0