Abstract
Most existing studies utilize backward-adjusted stock prices from data platforms to develop and backtest investment strategies using machine learning models. However, these prices are not point-in-time data and may introduce look-ahead bias, raising concerns about the reliability of model performance. To examine the impact of different price adjustment methods, we compare the predictive performance of various machine learning models and the backtesting results of portfolios constructed using these models with both forward-adjusted and backward-adjusted stock prices. Our study, conducted from 2012 to 2022, evaluates the real-world viability of investment strategies on the dynamic constituents of the CSI300 index. The empirical results reveal that while certain measures of machine learning models’ predictive performance may not be significantly affected by the stock price adjustment method, the backtesting performance under backward-adjusted stock prices is overestimated compared to that under forward-adjusted stock prices. This research provides evidence for the impact of historical stock price adjustments in developing machine learning models and presents a comprehensive framework for applying these techniques to the management of index constituent portfolios, thereby bridging the gap between predictive modeling and practical investment strategies.
Graphical abstract
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data that support this study were obtained from China Stock Market & Accounting Research Database (CSMAR), available at https://data.csmar.com/. As a commercial and professional financial data provider, CSMAR restricts free sharing of its data due to licensing terms. However, additional details about the specific field names and database tables used in this research can be made available upon request for purposes of reproducing and building upon the analytical work.
References
Kumbure MM, Lohrmann C, Luukka P, Porras J (2022) Machine learning techniques and data for stock market forecasting: A literature review. Expert Syst Appl 197
Olorunnimbe K, Viktor H (2023) Deep learning in the stock market—a systematic survey of practice, backtesting, and applications. Artif Intell Rev 56(3):2057–2109
Thakkar A, Chaudhari K (2021) Fusion in stock market prediction: A decade survey on the necessity, recent developments, and potential future directions. Inf Fusion 65:95–107
Kim H, Jun S, Moon KS (2022) Stock market prediction based on adaptive training algorithm in machine learning. Quant Fin 22(6):1133–1152
Akyildirim E, Nguyen DK, Sensoy A, Šikić M (2023) Forecasting high-frequency excess stock returns via data analytics and machine learning. Eur Fin Manag 29(1):22–75
Fabozzi FJ, de Prado ML (2018) Being honest in backtest reporting: a template for disclosing multiple tests. J Portf Manag 45(1):141–147
Krauss C, Do XA, Huck N (2017) Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S &P 500. Eur J Oper Res 259(2):689–702
Ghosh P, Neufeld A, Sahoo JK (2022) Forecasting directional movements of stock prices for intraday trading using LSTM and random forests. Fin Res Lett 46
Wang T, Guo J, Shan Y, Zhang Y, Peng B, Wu Z (2023) A knowledge graph-GCN-community detection integrated model for large-scale stock price prediction. Appl Soft Comput 145
Xu C, Huang H, Ying X, Gao J, Li Z, Zhang P, Xiao J, Zhang J, Luo J (2022) HGNN: Hierarchical graph neural network for predicting the classification of price-limit-hitting stocks. Inf Sci 607:783–798
Wolff D, Echterling F (2024) Stock picking with machine learning. J Forecast 43(1):81–102
Han Y, Kim J, Enke D (2023) A machine learning trading system for the stock market based on N-period min-max labeling using xgboost. Expert Syst Appl 211:118581
Tang H, Dong P, Shi Y (2019) A new approach of integrating piecewise linear representation and weighted support vector machine for forecasting stock turning points. Appl Soft Comput 78:685–696
Nti IK, Adekoya AF, Weyori BA (2020) A comprehensive evaluation of ensemble learning for stock-market prediction. J Big Data 7(1):20
Markowitz H (1952) Portfolio selection, The. J Fin 7(1):77–91
Bodnar T, Mazur S, Okhrin Y (2017) Bayesian estimation of the global minimum variance portfolio. Eur J Oper Res 256(1):292–307
Black F, Litterman R (1992) Global portfolio optimization. Fin Anal J 48(5):28–43
Wu M-E, Syu J-H, Lin JC-W, Ho J-M (2021) Portfolio management system in equity market neutral using reinforcement learning. Appl Intell 51(11):8119–8131
Wu X, Chen H, Wang J, Troiano L, Loia V, Fujita H (2020) Adaptive stock trading strategies with deep reinforcement learning methods. Inf Sci 538:142–158
Zhang Y, Zhao P, Wu Q, Li B, Huang J, Tan M (2020) Cost-sensitive portfolio selection via deep reinforcement learning. IEEE Trans Knowl Data Eng 34(1):236–248
Singh V, Chen S-S, Singhania M, Nanavati B, Gupta A et al (2022) How are reinforcement learning and deep learning algorithms used for big data based decision making in financial industries-a review and research agenda. Int J Inf Manag Data Insights 2(2)
Esteve V, Prats MA (2010) Threshold cointegration and nonlinear adjustment between stock prices and dividends. Appl Econ Lett 17(4):405–410
Fan Y, Gao Y (2024) Short selling, informational efficiency, and extreme stock price adjustment. Int Rev Econ Fin 89(A):1009–1028
Truong C, Corrado C (2014) Options trading volume and stock price response to earnings announcements. Rev Account Stud 19(1):161–209
Isichenko M (2021) Quantitative portfolio management: The art and science of statistical arbitrage, John Wiley & Sons
Long J, Chen Z, He W, Wu T, Ren J (2020) An integrated framework of deep learning and knowledge graph for prediction of stock price trend: An application in chinese stock exchange market. Appl Soft Comput 91
Li W, Mei F (2020) Asset returns in deep learning methods: An empirical analysis on sse 50 and csi 300. Res Int Bus Fin 54
Lin Y, Lin Z, Liao Y, Li Y, Xu J, Yan Y (2022) Forecasting the realized volatility of stock price index: A hybrid model integrating ceemdan and lstm. Expert Syst Appl 206
Lv D, Yuan S, Li M, Xiang Y (2019) An empirical study of machine learning algorithms for stock daily trading strategy. Math Probl Eng 2019(1):7816154
Hao J, He F, Ma F, Zhang S, Zhang X (2023) Machine learning vs deep learning in stock market investment: an international evidence. Ann Oper Res March 1–23
Breiman L (2001) Random forests. Mach Learn 45:5–32
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. in: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: a highly efficient gradient boosting decision tree. in: Advances in Neural Information Processing Systems 30, pp 3149–3157
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
Markowitz H (1952) Portfolio selection. The. J Fin 7(1):77–91
Bergstra J, Yamins D, Cox D (2013) Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. in: International conference on machine learning, PMLR, pp 115–123
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Contributions
Ligang Zhou: Conceptualization, Data analysis, Writing - original draft preparation. Xiaoguo Chen: Data collection, Visualization. Xiaolei Tang: Investigation, Writing - review and editing.
Corresponding author
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Conflicts of Interest
The authors have no financial interests that could have appeared to influence the work reported in this paper.
Ethics approval
Ethical approval and informed consent were not necessary for the use of these data in this study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, L., Chen, X. & Tang, X. Comparative performance of machine learning-selected portfolios from dynamic CSI300 constituents: forward vs. backward adjusted stock prices. Appl Intell 55, 176 (2025). https://doi.org/10.1007/s10489-024-06107-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06107-4