Skip to main content

Advertisement

Log in

Mining profitable alpha factors via convolution kernel learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

An automatic alpha factor mining method is proposed in this paper to assist expert traders in finding profitable alpha factors efficiently. Unlike finding qualified alpha factors by directly enumerating all possible combinations, the mining task is formulated as an iterative convolution kernel learning problem. Each kernel to be learned is associated with a unique alpha factor. To better solve the learning problem, the sparsity is introduced at the mutation step of the learning process to find simple and interpretable solutions efficiently and relieve the overfitting risks in real-world trading. A theorem is proposed to prove that the designed learning process can complete automatically in finite iterations as all convolution kernel vectors converge to zero vectors. In addition, a score function based on win rate, expected return and trade frequency is designed to evaluate the performance of market entry signals generated by the alpha factors practically. The convolution kernels with high score values are recorded and exported as the mined alpha factors. The experiment results show that the proposed method can achieve superior performance on both the China government bond dataset and the gold dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Algorithm 2
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Availability of data and materials

The datasets generated and analyzed during the current study are available from the corresponding author at reasonable request and can be downloaded at https://github.com/szy1900/autoAlpha/

References

  1. Dai Z, Zhu H, Kang J (2021) New technical indicators and stock returns predictability. Int Rev Econ Finance 71:127–142

    Article  Google Scholar 

  2. Cui C, Wang W, Zhang M, Chen G, Luo Z, Ooi BC (2021) Alphaevolve: A learning framework to discover novel alphas in quantitative investment. In: Proceedings of the 2021 International conference on management of data, pp 2208–2216

  3. Kumbure MM, Lohrmann C, Luukka P, Porras J (2022) Machine learning techniques and data for stock market forecasting: a literature review. Expert Syst Appl 116659

  4. Cao H (2022) Entrepreneurship education-infiltrated computer-aided instruction system for college music majors using convolutional neural network. Front Psychol 13

  5. Huang C, Han Z, Li M, Wang X, Zhao W (2021) Sentiment evolution with interaction levels in blended learning environments: Using learning analytics and epistemic network analysis. Australas J Educ Technol 37(2):81–95

    Article  Google Scholar 

  6. An Z, Ding Y, Wu Q (2022) Trend prediction of stock index based on convolutional neural network. In: 2022 7th International conference on cloud computing and big data analytics (ICCCBDA), pp 17–21. IEEE

  7. Li K (2022) Predicting stock price using convolutional neural network. In: 2022 IEEE International conference on artificial intelligence and computer applications (ICAICA), pp 739–742. IEEE

  8. Wang T, Zhang L, Hu W (2021) Bridging deep and multiple kernel learning: A review. Inf Fusion 67:3–13

    Article  Google Scholar 

  9. Wu D, Wang B, Precup D, Boulet B (2019) Multiple kernel learning-based transfer regression for electric load forecasting. IEEE Trans Smart Grid 11(2):1183–1192

    Article  Google Scholar 

  10. Zhang T, Li Y, Jin Y, Li J (2020) Autoalpha: an efficient hierarchical evolutionary algorithm for mining alpha factors in quantitative investment. arXiv:2002.08245

  11. Buhrmester V, Münch D, Arens M (2021) Analysis of explainers of black box deep neural networks for computer vision: A survey. Mach Learn Knowl Extraction 3(4):966–989

    Article  Google Scholar 

  12. Shwartz-Ziv R, Tishby N (2017) Opening the black box of deep neural networks via information. arXiv:1703.00810

  13. Du L, Gao R, Suganthan PN, Wang DZ (2022) Bayesian optimization based dynamic ensemble for time series forecasting. Inf Sci 591:155–175

    Article  Google Scholar 

  14. Awal MA, Masud M, Hossain MS, Bulbul AA-M, Mahmud SH, Bairagi AK (2021) A novel bayesian optimization-based machine learning framework for covid-19 detection from inpatient facility data. Ieee Access 9:10263–10281

    Article  Google Scholar 

  15. Turkoglu B, Uymaz SA, Kaya E (2023) Chaos theory in metaheuristics. Comprehensive metaheuristics. Elsevier, Amsterdam, pp 1–20

    Google Scholar 

  16. Huang Y, Gao Y, Gan Y, Ye M (2021) A new financial data forecasting model using genetic algorithm and long short-term memory network. Neurocomputing 425:207–218

    Article  Google Scholar 

  17. Tian J, Hou M, Bian H, Li J (2022) Variable surrogate model-based particle swarm optimization for high-dimensional expensive problems. Complex Intell Syst 1–49

  18. Yadav RK et al (2020) Pso-ga based hybrid with adam optimization for ann training with application in medical diagnosis. Cognit Syst Res 64:191–199

    Article  Google Scholar 

  19. Liang Y, Liu J (2021) Feature selection using forest optimization algorithm based on multi-ethnic strategy. In: 2021 16th International conference on intelligent systems and knowledge engineering (ISKE), pp 63–68. IEEE

  20. Ghaemi M, Feizi-Derakhshi M-R (2014) Forest optimization algorithm. Expert Syst Appl 41(15):6676–6687

    Google Scholar 

  21. Kaya E, Gorkemli B, Akay B, Karaboga D (2022) A review on the studies employing artificial bee colony algorithm to solve combinatorial optimization problems. Eng Appl Artif Intell 115:105311

    Article  Google Scholar 

  22. Turkoglu B, Uymaz SA, Kaya E (2022) Clustering analysis through artificial algae algorithm. Int J Mach Learn Cybernet 13(4):1179–1196

    Article  Google Scholar 

  23. Turkoglu B, Uymaz SA, Kaya E (2022) Binary artificial algae algorithm for feature selection. Appl Soft Comput 120:108630

    Article  Google Scholar 

  24. Turkoglu B, Kaya E (2020) Training multi-layer perceptron with artificial algae algorithm. Eng Sci Technol Int J 23(6):1342–1350

    Google Scholar 

  25. Greenacre M, Groenen PJ, Hastie T, d’Enza AI, Markos A, Tuzhilina E (2022) Principal component analysis. Nat Rev Methods Primers 2(1):100

    Article  Google Scholar 

  26. Stephens T (2016) Genetic Programming in Python With a Scikit-Learn Inspired API: Gplearn

  27. Katoch S, Chauhan SS, Kumar V (2021) A review on genetic algorithm: past, present, and future. Multimed Tools Appl 80(5):8091–8126

    Article  Google Scholar 

  28. Kim B (2014) Simplicity Theory. Oxford University Press, Oxford

    MATH  Google Scholar 

  29. Bargagli Stoffi FJ, Cevolani G, Gnecco G (2022) Simple models in complex worlds: Occam’s razor and statistical learning theory. Minds Mach 32(1):13–42

    Article  Google Scholar 

  30. Sharma D (2022) Occam’s razor and surgeons. Indian J Surg 1–2

  31. Ladley D, Pellizzari P (2014) The Simplicity of Optimal Trading in Order Book Markets. Springer, New York City, pp 183–199

    Google Scholar 

  32. Heinz A, Jamaloodeen M, Saxena A, Pollacia L (2021) Bullish and bearish engulfing japanese candlestick patterns: A statistical analysis on the s &p 500 index. Q Rev Econ Finance 79:221–244

    Article  Google Scholar 

  33. Indah YR, Mahyuni LP (2022) The accuracy of relative strength index (rsi) indicator in forecasting foreign exchange price movement. Inovbiz: J Inovasi Bisnis 10(1):96–101

  34. Sagar R, Sharma GP (2012) Measurement of alpha diversity using simpson index (1/lamda): the jeopardy. Environ Skeptics Critics 1(1):23

    Google Scholar 

  35. Wang CD, Chen Z, Lian Y, Chen M (2022) Asset selection based on high frequency sharpe ratio. J Econ 227(1):168–188

    Article  MathSciNet  MATH  Google Scholar 

  36. Kakushadze Z (2016) 101 formulaic alphas. Wilmott 2016(84):72–81

    Article  Google Scholar 

  37. Du X, Tanaka-Ishii K (2022) Stock portfolio selection balancing variance and tail risk via stock vector representation acquired from price data and texts. Knowl-Based Syst 249:108917

    Article  Google Scholar 

  38. Riley T, Yan Q (2022) Maximum drawdown as predictor of mutual fund performance and flows. Financ Anal J 78(4):59–76

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Zhao.

Ethics declarations

Conflict of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 A. List of abbreviations

The abbreviations used in this paper are summarized in Table 10.

Table 10 List of abbreviations

1.2 B. List of symbols

The symbols used in this paper are summarized in Table 11.

Table 11 List of symbols

1.3 C. Summary of 28 basic patterns

The name of all 28 basic patterns used as the columns of each input are provided in Table 12. For a given trading day, the possible value for each basic pattern is either 0 or 1, indicating the occurrence and absence of the basic pattern, respectively.

Table 12 The name of 28 basic patterns used in the input matrix

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, Z., Mao, X., Yang, X. et al. Mining profitable alpha factors via convolution kernel learning. Appl Intell 53, 28460–28478 (2023). https://doi.org/10.1007/s10489-023-05014-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05014-4

Keywords

Navigation