Skip to main content
Log in

A further analysis of robust regression modeling and data mining corrections testing in global stocks

  • S.I.: Data Mining and Decision Analytics
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

In this analysis of the risk and return of stocks in global markets, we build a reasonably large number of stock selection models and create optimized portfolios to outperform a global benchmark. We apply robust regression techniques, LAR regression, and LASSO regression modeling to estimate stock selection models. Markowitz-based optimization techniques is used in portfolio construction within a global stock universe. We apply the Markowitz–Xu data mining corrections test to a global stock universe. We find that (1) robust regression applications are appropriate for modeling stock returns in global markets; (2) weighted latent root regression robust regression techniques work as well as LAR and LASSO-Regressions in building effective stock selection models; (3) mean–variance techniques continue to produce portfolios capable of generating excess returns above transactions costs; and (4) our models pass several data mining tests such that regression models produce statistically significant asset selection for global stocks. Recent Sturdy-Regression modeling technique may offer the greatest potential for further research for statistically based stock selection modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. Robust regression is useful in analyzing data plagued by the presence of a “large number” of residuals from ordinary least squares regressions (OLS) that are outside of the 95% confidence interval. The reader is referred to Bloch et al. (1993) for a detailed explanation and rationale for robust regression, which was referred to as WLRR in the article. WLRR outperformed OLS in Sharpe Ratios and Geometric Means in the Bloch et al. analysis. Statisticians refer to the underlying robust regression model used in this study as the Beaton–Tukey bisquare weighting scheme, see Maronna et al. (2006, 2019) and Dhrymes (2017). Multicollinearity was a significant problem, given the presence of book value and sales, earnings and cash flow, and the variable interdependencies. Farrar and Glauber (1967), and Leamer (1973, 1978) addressed multicollinearity issues prior to Gunst and Mason (1980). The WLRR techniques eliminates the eigen roots with eigen values ≤ 0.30 and eigen vectors with first element ≤ 0.10 and is similar to the variance inflation regression technique of Lin et al. (2011).

  2. Guerard et al. (2013) estimated a Global Model, GLER, using Eq. (1) and the FactSet database for global securities during the January 1999–December 2011 period. In the world of business, one does not access academic databases annually, or even quarterly. Most industry analysis uses FactSet database and the Thomson Financial (I/B/E/S) earnings forecasting database. Guerard et al. (2013) estimated tracking error at risk portfolio (MVTaR) model for the 7500 largest securities, in terms of market capitalization, for stocks Thomson Financial and FactSet databases, some 46,550 firms in December 2011, and 64,455 stocks in December 2013. In the Guerard et al. (2015) earnings forecasting study used both APT and Axioma World-wide Statistical Risk Models. We use data only as it is known (or more exactly, our portfolios are tested out-of-sample). USER and GLER are the “Public Forms” of the McKinley Capital Management models for U.S. and Global stocks, respectively. Excess returns were higher in non-US than U.S. markets, see Deng and Min (2013) and Guerard et al. (2018a, b).

  3. The CTEF and PM variable weights are large relative to the first eight factor eights. The relative variables are “growth” variables such that both the Markowitz model and GLER models plot in the growth boxes of the Zephyr style report. The first four factors of GLER are value factors. The weighting results are extremely consistent with McKinley Capital Management being a Global Growth specialist. The CTEF and PM variables accounted for 40 percent of the weights in the GLER Model.

  4. Haugen (2001) continued the treatment of the Graham and Dodd variables in his Modern Investment Theory. Haugen and Baker (1996, 2010) examined 12 of the most important factors in the U.S. equity markets and in Germany, France, Great Britain, and Japan. The book-to-price, earnings-to-price, sales-to-price, and cash flow-to-price variables were among the highest mean payoff variables in the respective countries. Haugen and Baker (28) published a paper in the Guerard volume to honor Harry Markowitz which updated their models and completely demolished the case for efficient markets, in the eyes of the primary author. Our analysis and the work of Haugen do not report evidence that is consistent with the work of Fama and French (1992, 1995, 2008), and the dominance of the firm size and book-to-price variables in stock selection modeling.

  5. See Jacobs and Levy (1988) and Ziemba and Schwartz (1993) for different approaches to variable selection.

  6. Guerard et al. (2018a, b) tested 30-plus variables in six different U.S. and global stock universes.

  7. Efron et al. (2004) introduce LAR to the reader by discussing automatic model-building algorithms, including forward selection, all subsets, and back elimination. LAR is a variation of forward selection; LAR creates a regression model, one covariate at a step, such that after K steps, only K of the \( \hat{\beta }_{j} \)s are non-zero. The K corresponding independent variables are the best K-member subset model. In our application, K is selected with highest Akaike Information Criterion (AIC). The K and selected sub-model may differ for each period. See Tibshirani (1996) for LASSO discussion. One of the better books on statistical modeling using LAR and LASSO is Hastie et al. (2016).

  8. The first set is a fundamental risk model, such as the Axioma World-Wide Equity Risk Factor Model (AX-WW2.1), which seeks to forecast medium-horizon risk, or risk 3–6 months ahead. The Axioma Fundamental Risk Model uses nine style factors: exchange rate sensitivity, growth (historical earnings and sales growth), leverage (debt-to-assets), liquidity (1 month trading volume divided by market capitalization), medium-term momentum (cumulative returns of the past year, excluding the previous month), short-term momentum (last month return), size (natural logarithm of issuer market capitalization), value (book-to-price and earnings-to-price ratios), and volatility (3 months average of absolute returns divided by cross-sectional standard deviation). The Axioma fundamentally-based risk model evolved from the MSCI Barra risk model. The BARRA model was developed in Rosenberg (1974), Rosenberg and Marathe (1979) and thoroughly discussed in Rudd and Clasing (1982), Grinold and Kahn (1999), Conner and Korajczyk (2010), Connor et al. (2010), and Menchero et al. (2010). Statistically-based risk models developed in the works of Ross (1976), Roll and Ross (1980), Dhrymes et al. (1984), and Guerard et al. (1997). The Axioma Statistical Risk Model, World-Wide Equity Risk Factor Model, AX-WW2.1, estimates 15 principal components to measure risk. See Guerard et al. (2015) for a comparison of Axioma Fundamental and statistically based risk models. Guerard et al. reported that the statistical model dominated the fundamental risk model in producing a higher set of returns for a given level of risk.

  9. The authors have used APT, SunGard APT, and FIS APT since 1988 and Axioma since 2010.

  10. Recent research by Leamer (2016) proved highly insightful and relevant in our modeling of financial data. Leamer is critical of using t-statistic as a model selection method (dropping variables with statistically non-significant estimated coefficient) because t-value is a measure of estimation uncertainty. Leamer (2016) introduce the concept S-values to measure the sturdiness of regression coefficient taking into account the model uncertainty and use S-value as a model selection criterion. Leamer, a renowned Bayesian and data mining specialist (1973, 1978), attacks the model uncertainty by Bayesian methodology. We applied the Leamer sturdy regression technique with our data and report initial stock selection modeling enhancement. The Leamer S-regression enhances portfolio returns, the Sharpe Ratio and Information Ratios relative to the GLER model estimated with WLRR. In an initial test of WLRR and S-Regression, during the 1/2003–10/2015 time period, the WLRR Model produced a Geometric Mean of 13.75%, a Sharpe Ratio of 0.738, an IR of 1.08, and a Specific Return of 4.23% (t-statistic of 3.70). The corresponding portfolio statistics for the Leamer Sturdy-regressions were a Geometric Mean of 15.85%, a Sharpe Ratio of 0.748, an IR of 1.16, and a Specific Return of 6.27% (t-statistic of 4.54). The WLRR Model produced a tracking error of 7.30% whereas the Leamer Sturdy-regression Model produced a 9.53% tracking error. Clearly, on the basis of the Geometric Mean, Sharpe ratio, and the Information Ratio, the Leamer Sturdy-regression model is well worth further research.

    The authors Markowitz and Guerard (2019), presented a paper, “The Existence and Persistence of Financial Anomalies” at the Q-Group, The Institute of Quantitative Research meeting, October 29, 2019 in La Jolla. The authors report additional research on the optimal influence function regression modeling in Maronna et al. (2019) enhances the Geometric Mean, Sharpe Ratio, and Information Ratio relative to WLRR.

  11. Had the authors restricted the DMC test to the GPRD variables, the DMC beta would have been 0.70 and its corresponding t-statistic equal to 3.01. Thus, we report little difference in the DMC results of 21 and 36 models tested in the global universe.

  12. Zhou et al. (2006) discussed the role of alpha-investing as a form of Information Investing to minimize the FDR. The alpha-investing approach uses p values associated with t-statistics to represent the probability that coefficients could have been estimated by chance even though the true coefficients were zero.

References

  • Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300.

    Google Scholar 

  • Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 2, 1165–1188.

    Google Scholar 

  • Blin, J. M., Bender, S., & Guerard, J. B., Jr. (1997). Earnings forecasts, revisions and momentum in the estimation of efficient market-neutral Japanese and U.S. portfolios. Research in Finance, 15, 93–114.

    Google Scholar 

  • Bloch, M., Guerard, J. B., Jr., Markowitz, H. M., Todd, P., & Xu, G. (1993). A comparison of some aspects of the U.S. and Japanese equity markets. Japan & the World Economy, 5, 3–26.

    Article  Google Scholar 

  • Conner, G., & Korajczyk, R. A. (2010). Factor models in portfolio and asset pricing theory. In J. Guerard (Ed.), The handbook of portfolio construction: Contemporary applications of Markowitz techniques. New York: Springer.

    Google Scholar 

  • Connor, G., Goldberg, L., & Korajczyk, R. A. (2010). Portfolio risk analysis. Princeton: Princeton University Press.

    Book  Google Scholar 

  • Deng, S., & Min, X. (2013). Applied optimization in global efficient portfolio construction using earnings forecasting. Journal of Investing, 22, 104–114.

    Article  Google Scholar 

  • Dhrymes, P. J. (2017). Introductory econometrics (2nd ed.). New York: Springer.

    Book  Google Scholar 

  • Dhrymes, P. J., Friend, I., & Gultekin, N. B. (1984). A critical re-examination of the empirical evidence on the arbitrage pricing theory. Journal of Finance, 39, 323–346.

    Article  Google Scholar 

  • Efron, B., Hastie, T., Johnstone, J., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32, 407–499.

    Article  Google Scholar 

  • Elton, E. J., Gruber, M. J., Brown, S. J., & Goetzman, W. N. (2007). Modern portfolio theory and investment analysis (7th ed.). New York: Wiley.

    Google Scholar 

  • Fama, E. F., & French, K. R. (1992). Cross-sectional variation in expected stock returns. Journal of Finance, 1992(47), 427–465.

    Article  Google Scholar 

  • Fama, E. F., & French, K. R. (1995). Size and the book-to-market factors in earnings and returns. Journal of Finance, 50, 131–155.

    Article  Google Scholar 

  • Fama, E. F., & French, K. R. (2008). Dissecting anomalies. Journal of Finance, 63, 1653–1678.

    Article  Google Scholar 

  • Farrar, D. E., & Glauber, R. (1967). Multicollinearity in regression analysis; The problem revisited. The Review of Economics and Statistics, 49, 92–107.

    Article  Google Scholar 

  • Graham, B., & Dodd, D. (1934). Security analysis: Principles and technique. New York: McGraw-Hill Book Company.

    Google Scholar 

  • Grinold, R., & Kahn, R. (1999). Active portfolio management. New York: McGraw-Hill/Irwin.

    Google Scholar 

  • Guerard, J. B., Jr., Gillam, R. A., Markowitz, H. M., Xu, G., Deng, S., & Wang, Z. (2018a). Data mining corrections testing in Chinese stocks. Interfaces, 48, 108–120.

    Article  Google Scholar 

  • Guerard, J. B., Jr., Gultekin, M., & Stone, B. K. (1997). The role of fundamental data and analysts’ earnings breadth, forecasts, and revisions in the creation of efficient portfolios. Research in Finance, 15, 69–92.

    Google Scholar 

  • Guerard, J. B., Jr., Markowitz, H. M., & Xu, G. (2014). The role of effective corporate decisions in the creation of efficient portfolios. IBM Journal of Research and Development, 58(6), 1–11.

    Google Scholar 

  • Guerard, J. B., Jr., Markowitz, H. M., & Xu, G. (2015). Earnings forecasting in a global stock selection model and efficient portfolio construction and management. International Journal of Forecasting, 31, 550–560.

    Article  Google Scholar 

  • Guerard, J. B., Jr., Markowitz, H. M., Xu, G., & Wang, E. (2018b). Global portfolio construction with emphasis on conflicting corporate strategies to maximize stockholder wealth. Annals of Operations Research, 267, 203–219.

    Article  Google Scholar 

  • Guerard, J. B., Jr., Rachev, R. T., & Shao, B. (2013). Efficient global portfolios: Big data and investment universes. IBM Journal of Research and Development, 57, 11.

    Article  Google Scholar 

  • Guerard, J. B., Jr., Xu, G., & Gultekin, M. N. (2012). Investing with momentum: The past, present, and future. Journal of Investing, 21, 68–80.

    Article  Google Scholar 

  • Guerard, J. B., Jr., Xu, G., & Wang, Z. (2019). Portfolio and investment analysis with SAS. Cary, NC: SAS Press.

    Google Scholar 

  • Gunst, R. F., & Mason, R. L. (1980). Regression analysis and its application. New York: Marcel Dekker Inc.

    Google Scholar 

  • Hansen, L. (1982). Large sample properties of generalized method of moments estimator. Econnometrica, 50(4), 1029–1054.

    Article  Google Scholar 

  • Harvey, C. R. (2017). Presidential address: The scientific outlook in financial economics. Journal of Finance, 72, 1399–1440.

    Article  Google Scholar 

  • Harvey, C. R. & Liu, Y. (2014). Lucky Factors. SSRN. http://papers.ssrn.com/sol3/papers.cfm?abstract_id = 2528780.

  • Harvey, C. R., Liu, Y., & Zhu, H. (2016). …and the cross-section of expected returns. Review of Financial Studies, 29(1), 5–69.

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2016). The elements of statistical learning: Data mining, inference, and prediction (2nd ed., 11th printing). New York: Springer.

  • Haugen, R. A. (2001). Modern investment theory (5th ed.). Upper Saddle River, NJ: Prentice Hall.

    Google Scholar 

  • Haugen, R. A., & Baker, N. (1996). Communality in the determinants of expected results. Journal of Financial Economics, 41, 401–440.

    Article  Google Scholar 

  • Haugen, R. A., & Baker, N. (2010). Case closed. In J. B. Guerard (Ed.), The Handbook of portfolio construction: Contemporary applications of Markowitz techniques. New York: Springer.

    Google Scholar 

  • Jacobs, B., & Levy, K. (1988). Disentangling equity return regularities: New insights and investment opportunities. Financial Analysts Journal, 44, 18–43.

    Article  Google Scholar 

  • Leamer, E. E. (1973). Multicollinearity: A Bayesian interpretation. Review of Economics and Statistics, 1973(55), 371–380.

    Article  Google Scholar 

  • Leamer, E. E. (1978). Specification searches: Ad hoc inference with nonexperimental data (p. 1978). New York: Wiley.

    Google Scholar 

  • Leamer, E. E. (2016). S-values; conventional context-minimal measures of the sturdiness of regression coefficients. Journal of Econometrics, 19, 147–161.

    Article  Google Scholar 

  • Levy, H. (1999). Introduction to investments (2nd ed.). Cincinnati: South-Western College Publishing.

    Google Scholar 

  • Levy, H. (2012). The capital asset pricing model in the 21st century. New York: Cambridge University Press.

    Google Scholar 

  • Lin, D., Foster, D. P., & Ungar, L. H. (2011). VIF regression: A fast regression algorithm for large data. Journal of the American Statistical Association, 10, 232–247.

    Article  Google Scholar 

  • Lo, A. (2002). The statistics of Sharpe ratios. Financial Analyst Journal, 58, 36–52.

    Article  Google Scholar 

  • Lo, A., & MacKinlay, C. (1990). Data-snooping biases in tests of financial asset pricing models. The Review of Financial Studies, 3, 431–467.

    Article  Google Scholar 

  • Markowitz, H. M. (1952). Portfolio selection. Journal of Finance, 7, 77–91.

    Google Scholar 

  • Markowitz, H. M. (1959). Portfolio selection: Efficient diversification of investment. Cowles Foundation Monograph No. 16. New York: Wiley.

  • Markowitz, H. M., & Guerard, J. B. Jr. (2019). The existence and persistence of financial anomalies. To be presented at The Institute of Quantitative research in Finance. La Jolla, October.

  • Markowitz, H. M., & Xu, G. (1994). Data mining corrections. Journal of Portfolio Management, 21, 60–69.

    Article  Google Scholar 

  • Maronna, R. A., Martin, R. D., & Yohai, V. J. (2006). Robust statistics: Theory and methods. New York: Wiley.

    Book  Google Scholar 

  • Maronna, R. A., Martin, R. D., Yohai, V. J., & Salibian-Barerra, M. (2019). Robust statistics: Theory and methods. New York: Wiley.

    Google Scholar 

  • Menchero, J., Morozov, A., & Shepard, P. (2010). Global Equity Modeling. In J. B. Guerard (Ed.), The handbook of portfolio construction: Contemporary applications of Markowitz techniques. New York: Springer.

    Google Scholar 

  • Ramnath, S., Rock, S., & Shane, P. (2008). The financial analyst forecasting literature: A taxonomy with suggestions for further research. International Journal of Forecasting, 24, 34–75.

    Article  Google Scholar 

  • Roll, R., & Ross, A. (1980). An empirical investigation of the arbitrage pricing theory. Journal of Finance, 35, 1071–1103.

    Article  Google Scholar 

  • Rosenberg, B. (1974). Extra-market components of covariance in security returns. Journal of Financial and Quantitative Analysis, 9, 263–274.

    Article  Google Scholar 

  • Rosenberg, B., & Marathe, V. (1979). Tests of capital asset pricing hypotheses. In H. Levy (Ed.), Research in finance (Vol. 1). Greenwich, CT: JAI Press.

    Google Scholar 

  • Ross, S. A. (1976). The arbitrage theory of capital asset pricing. Journal of Economic Theory, 13, 341–360.

    Article  Google Scholar 

  • Rudd, A., & Clasing, H. K. (1982). Modern portfolio theory: The principles of investment management. Homewood, IL: Dow-Jones Irwin.

    Google Scholar 

  • Subramanian, S., Suzuki, S. D., Makedon, A., Hall, J., Pouey, M., & Wang, B. (2018). A pm’s guide to stock picking. Bank of America Merrill Lynch.

  • Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Association, Series B, 1996(58), 267–288.

    Google Scholar 

  • White, H. (1984). Asymptotic theory for econometricians. New York: Academic Press.

    Google Scholar 

  • Williams, J. B. (1938). The theory of investment value. Cambridge, MA: Harvard University Press.

    Google Scholar 

  • Zhou, J., Foster, D. P., Stine, R. A., & Ungar, L. H. (2006). Streamwise feature selection. Journal of Machine Learning Research, 2006(7), 1861–1885.

    Google Scholar 

  • Ziemba, W. T., & Schwartz, S. (1993). Invest Japan. Chicago: Probus.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John B. Guerard Jr..

Ethics declarations

Conflict of interest

The views and opinions expressed in this paper are those of the authors and may not represent or reflect those of McKinley Capital Management, LLC. All information contained herein is believed to be acquired from reliable sources but the accuracy cannot be guaranteed. This paper is for informational purposes only; was prepared for academics and financially sophisticated and institutional audience; and does not represent specific financial services or investment recommendations or advice.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guerard, J.B., Xu, G. & Markowitz, H. A further analysis of robust regression modeling and data mining corrections testing in global stocks. Ann Oper Res 303, 175–195 (2021). https://doi.org/10.1007/s10479-020-03521-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-020-03521-y

Keywords

Navigation