Abstract
Bivariate statistical regression is a statistical tool that allows performing regression on a multivariate data set under the hypothesis that one of the independent variables is dominant. Statistical regression is profitable when the amount of available data is enough to explain the relevant statistical features of the phenomenon underlying the data. The present paper suggests a fast statistical regression method based on a neural system that is able to match its input–output statistic to the marginal statistic of the available data sets. A key point of the implementation proposed in the present paper is that it is based on purely numerical-algebraic operations, which guarantee a computationally advantageous way of implementing neural systems. A number of numerical experiments, performed on real-world data sets, provide some insights into the behavior of the devised neural-system-based statistical regression method and its limitations.
Similar content being viewed by others
References
Beckers JM, Rixen M (2003) EOF calculations and data filling from incomplete oceanographic data sets. J Atmos Ocean Technol 20(12):1839–1856
Biagiotti J, Fiori S, Torre L, López-Manchado MA, Kenny JM (2004) Mechanical properties of polypropylene matrix composites reinforced with natural fibers: a statistical approach. Polym Compos 25(1):26–36
Cook NR (2006) Imputation strategies for blood pressure data nonignorably missing due to medication use. Clin Trials 3(5):411–420
Dargahi-Noubary GR, Razzaghi M (1994) Earthquake hazard assessment based on bivariate exponential distribution. Reliab Eng Syst Saf 44:135–166
Dupacová J, Hurt J, Štepán J (2002) Stochastic modeling in economics and finance. Kluwer (Applied Optimization Series), Dordrecht
Enders CK (2006) A primer on the use of modern missing-data methods in psychosomatic medicine research. Psychosom Med 68(3):427–436
Fiori S (2003) Non-symmetric PDF estimation by artificial neurons: application to statistical characterization of reinforced composites. IEEE Trans Neural Netw 14(4):959–962
Fiori S, Rossi R (2004) Statistical characterization of some electrical and mechanical phenomena by a neural probability density function estimation technique. Neural Netw World 2:153–176
Fiori S (2006) Neural systems with numerically-matched input-output statistic: variate generation. Neural Process Lett 23(2):143–170
Fiori S (2011) Statistical nonparametric bivariate isotonic regression by look-up-table-based neural networks, In: B.-L. Lu, L. Zhang and J. Kwok (Eds.)Proceedings of the 2011 international conference on neural information processing (ICONIP 2011, Shanghai (China), November 14–17, 2011), Part III, LNCS 7064, pp. 365–372. Springer, Heidelberg
Frank A, Asuncion A (2010) UCI Machine learning repository [http://archive.ics.uci.edu/ml], University of California at Irvine, School of Information and Computer Science
Greve HR, Tuma NB, Strang D (2001) Estimation of diffusion processes from incomplete data (a simulation study). Sociol Methods Res 29(4):435–467
Härdle W (1992) Applied nonparametric regression. Cambridge University Press, Cambridge
Katch F, McArdle W (1977) Nutrition, weight control, and exercise. Houghton Mifflin Co., Boston
Little RJA, Rubin DA (1987) Statistical analysis with missing data. Wiley, New York
Luchinsky DG, Millonas MM, Smelyanskiy VN, Pershakova A, Stefanovska A, McClintock PVE (2005) Nonlinear statistical modeling and model discovery for cardiorespiratory data. Phys Rev E 72:021905
Nikoloulopoulos AK, Karlis D (2010) Regression in a copula model for bivariate count data. J Appl Stat 37(9):1555–1568
Peugh JL, Enders CK (2004) Missing data in educational research: a review of reporting practices and suggestions for improvement. Rev Educ Res 74(4):525–556
Rosenblum M, Cimponeriu L, Pikovsky A (2006) Coupled oscillators approach in analysis of bivariate data. In: Schelter B, Winterhalder M, Timmer J (eds) Handbook of time series analysis, Wiley, New York, pp 159–180
Salanti G (2003) The isotonic regression framework: estimating and testing under order restrictions, PhD Dissertation, Fakultät für Matematik, Ludwig-Maximilians-Universität München
Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7(2):147–177
Schneider T (2001) Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J Clim 14:853–871
SOCR data BMI regression (2012) [http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_BMI_Regression], University of California at Los Angeles
Torgo L (2007) Regression datasets [http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html], Artificial Intelligence and Computer Science Laboratory, University of Porto (Portugal)
Velikova MV (2006) Monotone models for prediction in data mining, Ph.D. Dissertation, Dutch Graduate School for Information and Knowledge Systems and Graduate School of the Faculty of Economics and Business Administration of Tilburg University
Verde PE (2010) Meta-analysis of diagnostic test data: a bivariate Bayesian modeling approach. Stat Med 29:3088–3102
Yeh I-C (1998) Modeling of strength of high performance concrete using artificial neural networks. Cem Concr Res 28(12):1797–1808
Acknowledgments
The present paper is an extended version of the conference paper [10]. The author wishes to thank Andrew Leung for the invitation to submit the present extended version to the special issue of Neural Computation and Applications dedicated to the ICONIP’2011 conference.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fiori, S. Fast statistical regression in presence of a dominant independent variable. Neural Comput & Applic 22, 1367–1378 (2013). https://doi.org/10.1007/s00521-012-0958-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-012-0958-6