Skip to main content

Advertisement

Comparative performance of machine learning-selected portfolios from dynamic CSI300 constituents: forward vs. backward adjusted stock prices

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Most existing studies utilize backward-adjusted stock prices from data platforms to develop and backtest investment strategies using machine learning models. However, these prices are not point-in-time data and may introduce look-ahead bias, raising concerns about the reliability of model performance. To examine the impact of different price adjustment methods, we compare the predictive performance of various machine learning models and the backtesting results of portfolios constructed using these models with both forward-adjusted and backward-adjusted stock prices. Our study, conducted from 2012 to 2022, evaluates the real-world viability of investment strategies on the dynamic constituents of the CSI300 index. The empirical results reveal that while certain measures of machine learning models’ predictive performance may not be significantly affected by the stock price adjustment method, the backtesting performance under backward-adjusted stock prices is overestimated compared to that under forward-adjusted stock prices. This research provides evidence for the impact of historical stock price adjustments in developing machine learning models and presents a comprehensive framework for applying these techniques to the management of index constituent portfolios, thereby bridging the gap between predictive modeling and practical investment strategies.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data that support this study were obtained from China Stock Market & Accounting Research Database (CSMAR), available at https://data.csmar.com/. As a commercial and professional financial data provider, CSMAR restricts free sharing of its data due to licensing terms. However, additional details about the specific field names and database tables used in this research can be made available upon request for purposes of reproducing and building upon the analytical work.

References

  1. Kumbure MM, Lohrmann C, Luukka P, Porras J (2022) Machine learning techniques and data for stock market forecasting: A literature review. Expert Syst Appl 197

  2. Olorunnimbe K, Viktor H (2023) Deep learning in the stock market—a systematic survey of practice, backtesting, and applications. Artif Intell Rev 56(3):2057–2109

    Article  MATH  Google Scholar 

  3. Thakkar A, Chaudhari K (2021) Fusion in stock market prediction: A decade survey on the necessity, recent developments, and potential future directions. Inf Fusion 65:95–107

    Article  MATH  Google Scholar 

  4. Kim H, Jun S, Moon KS (2022) Stock market prediction based on adaptive training algorithm in machine learning. Quant Fin 22(6):1133–1152

    Article  MathSciNet  MATH  Google Scholar 

  5. Akyildirim E, Nguyen DK, Sensoy A, Šikić M (2023) Forecasting high-frequency excess stock returns via data analytics and machine learning. Eur Fin Manag 29(1):22–75

    Article  MATH  Google Scholar 

  6. Fabozzi FJ, de Prado ML (2018) Being honest in backtest reporting: a template for disclosing multiple tests. J Portf Manag 45(1):141–147

    Article  MATH  Google Scholar 

  7. Krauss C, Do XA, Huck N (2017) Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S &P 500. Eur J Oper Res 259(2):689–702

    Article  MATH  Google Scholar 

  8. Ghosh P, Neufeld A, Sahoo JK (2022) Forecasting directional movements of stock prices for intraday trading using LSTM and random forests. Fin Res Lett 46

  9. Wang T, Guo J, Shan Y, Zhang Y, Peng B, Wu Z (2023) A knowledge graph-GCN-community detection integrated model for large-scale stock price prediction. Appl Soft Comput 145

  10. Xu C, Huang H, Ying X, Gao J, Li Z, Zhang P, Xiao J, Zhang J, Luo J (2022) HGNN: Hierarchical graph neural network for predicting the classification of price-limit-hitting stocks. Inf Sci 607:783–798

    Article  MATH  Google Scholar 

  11. Wolff D, Echterling F (2024) Stock picking with machine learning. J Forecast 43(1):81–102

    Article  MathSciNet  MATH  Google Scholar 

  12. Han Y, Kim J, Enke D (2023) A machine learning trading system for the stock market based on N-period min-max labeling using xgboost. Expert Syst Appl 211:118581

  13. Tang H, Dong P, Shi Y (2019) A new approach of integrating piecewise linear representation and weighted support vector machine for forecasting stock turning points. Appl Soft Comput 78:685–696

    Article  MATH  Google Scholar 

  14. Nti IK, Adekoya AF, Weyori BA (2020) A comprehensive evaluation of ensemble learning for stock-market prediction. J Big Data 7(1):20

    Article  MATH  Google Scholar 

  15. Markowitz H (1952) Portfolio selection, The. J Fin 7(1):77–91

    MATH  Google Scholar 

  16. Bodnar T, Mazur S, Okhrin Y (2017) Bayesian estimation of the global minimum variance portfolio. Eur J Oper Res 256(1):292–307

    Article  MathSciNet  MATH  Google Scholar 

  17. Black F, Litterman R (1992) Global portfolio optimization. Fin Anal J 48(5):28–43

    Article  MATH  Google Scholar 

  18. Wu M-E, Syu J-H, Lin JC-W, Ho J-M (2021) Portfolio management system in equity market neutral using reinforcement learning. Appl Intell 51(11):8119–8131

    Article  MATH  Google Scholar 

  19. Wu X, Chen H, Wang J, Troiano L, Loia V, Fujita H (2020) Adaptive stock trading strategies with deep reinforcement learning methods. Inf Sci 538:142–158

    Article  MathSciNet  Google Scholar 

  20. Zhang Y, Zhao P, Wu Q, Li B, Huang J, Tan M (2020) Cost-sensitive portfolio selection via deep reinforcement learning. IEEE Trans Knowl Data Eng 34(1):236–248

  21. Singh V, Chen S-S, Singhania M, Nanavati B, Gupta A et al (2022) How are reinforcement learning and deep learning algorithms used for big data based decision making in financial industries-a review and research agenda. Int J Inf Manag Data Insights 2(2)

  22. Esteve V, Prats MA (2010) Threshold cointegration and nonlinear adjustment between stock prices and dividends. Appl Econ Lett 17(4):405–410

    Article  MATH  Google Scholar 

  23. Fan Y, Gao Y (2024) Short selling, informational efficiency, and extreme stock price adjustment. Int Rev Econ Fin 89(A):1009–1028

  24. Truong C, Corrado C (2014) Options trading volume and stock price response to earnings announcements. Rev Account Stud 19(1):161–209

    Article  MATH  Google Scholar 

  25. Isichenko M (2021) Quantitative portfolio management: The art and science of statistical arbitrage, John Wiley & Sons

  26. Long J, Chen Z, He W, Wu T, Ren J (2020) An integrated framework of deep learning and knowledge graph for prediction of stock price trend: An application in chinese stock exchange market. Appl Soft Comput 91

  27. Li W, Mei F (2020) Asset returns in deep learning methods: An empirical analysis on sse 50 and csi 300. Res Int Bus Fin 54

  28. Lin Y, Lin Z, Liao Y, Li Y, Xu J, Yan Y (2022) Forecasting the realized volatility of stock price index: A hybrid model integrating ceemdan and lstm. Expert Syst Appl 206

  29. Lv D, Yuan S, Li M, Xiang Y (2019) An empirical study of machine learning algorithms for stock daily trading strategy. Math Probl Eng 2019(1):7816154

    Article  MATH  Google Scholar 

  30. Hao J, He F, Ma F, Zhang S, Zhang X (2023) Machine learning vs deep learning in stock market investment: an international evidence. Ann Oper Res March 1–23

  31. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  32. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

  33. Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. in: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794

  34. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: a highly efficient gradient boosting decision tree. in: Advances in Neural Information Processing Systems 30, pp 3149–3157

  35. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  36. Markowitz H (1952) Portfolio selection. The. J Fin 7(1):77–91

    MATH  Google Scholar 

  37. Bergstra J, Yamins D, Cox D (2013) Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. in: International conference on machine learning, PMLR, pp 115–123

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Contributions

Ligang Zhou: Conceptualization, Data analysis, Writing - original draft preparation. Xiaoguo Chen: Data collection, Visualization. Xiaolei Tang: Investigation, Writing - review and editing.

Corresponding author

Correspondence to Ligang Zhou.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Conflicts of Interest

The authors have no financial interests that could have appeared to influence the work reported in this paper.

Ethics approval

Ethical approval and informed consent were not necessary for the use of these data in this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, L., Chen, X. & Tang, X. Comparative performance of machine learning-selected portfolios from dynamic CSI300 constituents: forward vs. backward adjusted stock prices. Appl Intell 55, 176 (2025). https://doi.org/10.1007/s10489-024-06107-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-06107-4

Keywords