Mining profitable alpha factors via convolution kernel learning

Shen, Zhenyi; Mao, Xiahong; Yang, Xiaohu; Zhao, Dan

doi:10.1007/s10489-023-05014-4

Mining profitable alpha factors via convolution kernel learning

Published: 05 October 2023

Volume 53, pages 28460–28478, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zhenyi Shen^1,2,
Xiahong Mao²,
Xiaohu Yang¹ &
…
Dan Zhao^1,2

146 Accesses
Explore all metrics

Abstract

An automatic alpha factor mining method is proposed in this paper to assist expert traders in finding profitable alpha factors efficiently. Unlike finding qualified alpha factors by directly enumerating all possible combinations, the mining task is formulated as an iterative convolution kernel learning problem. Each kernel to be learned is associated with a unique alpha factor. To better solve the learning problem, the sparsity is introduced at the mutation step of the learning process to find simple and interpretable solutions efficiently and relieve the overfitting risks in real-world trading. A theorem is proposed to prove that the designed learning process can complete automatically in finite iterations as all convolution kernel vectors converge to zero vectors. In addition, a score function based on win rate, expected return and trade frequency is designed to evaluate the performance of market entry signals generated by the alpha factors practically. The convolution kernels with high score values are recorded and exported as the mined alpha factors. The experiment results show that the proposed method can achieve superior performance on both the China government bond dataset and the gold dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis

Article Open access 20 January 2024

A brief review of portfolio optimization techniques

Article 15 September 2022

Machine learning-driven credit risk: a systemic review

Article Open access 16 July 2022

Availability of data and materials

The datasets generated and analyzed during the current study are available from the corresponding author at reasonable request and can be downloaded at https://github.com/szy1900/autoAlpha/

References

Dai Z, Zhu H, Kang J (2021) New technical indicators and stock returns predictability. Int Rev Econ Finance 71:127–142
Article Google Scholar
Cui C, Wang W, Zhang M, Chen G, Luo Z, Ooi BC (2021) Alphaevolve: A learning framework to discover novel alphas in quantitative investment. In: Proceedings of the 2021 International conference on management of data, pp 2208–2216
Kumbure MM, Lohrmann C, Luukka P, Porras J (2022) Machine learning techniques and data for stock market forecasting: a literature review. Expert Syst Appl 116659
Cao H (2022) Entrepreneurship education-infiltrated computer-aided instruction system for college music majors using convolutional neural network. Front Psychol 13
Huang C, Han Z, Li M, Wang X, Zhao W (2021) Sentiment evolution with interaction levels in blended learning environments: Using learning analytics and epistemic network analysis. Australas J Educ Technol 37(2):81–95
Article Google Scholar
An Z, Ding Y, Wu Q (2022) Trend prediction of stock index based on convolutional neural network. In: 2022 7th International conference on cloud computing and big data analytics (ICCCBDA), pp 17–21. IEEE
Li K (2022) Predicting stock price using convolutional neural network. In: 2022 IEEE International conference on artificial intelligence and computer applications (ICAICA), pp 739–742. IEEE
Wang T, Zhang L, Hu W (2021) Bridging deep and multiple kernel learning: A review. Inf Fusion 67:3–13
Article Google Scholar
Wu D, Wang B, Precup D, Boulet B (2019) Multiple kernel learning-based transfer regression for electric load forecasting. IEEE Trans Smart Grid 11(2):1183–1192
Article Google Scholar
Zhang T, Li Y, Jin Y, Li J (2020) Autoalpha: an efficient hierarchical evolutionary algorithm for mining alpha factors in quantitative investment. arXiv:2002.08245
Buhrmester V, Münch D, Arens M (2021) Analysis of explainers of black box deep neural networks for computer vision: A survey. Mach Learn Knowl Extraction 3(4):966–989
Article Google Scholar
Shwartz-Ziv R, Tishby N (2017) Opening the black box of deep neural networks via information. arXiv:1703.00810
Du L, Gao R, Suganthan PN, Wang DZ (2022) Bayesian optimization based dynamic ensemble for time series forecasting. Inf Sci 591:155–175
Article Google Scholar
Awal MA, Masud M, Hossain MS, Bulbul AA-M, Mahmud SH, Bairagi AK (2021) A novel bayesian optimization-based machine learning framework for covid-19 detection from inpatient facility data. Ieee Access 9:10263–10281
Article Google Scholar
Turkoglu B, Uymaz SA, Kaya E (2023) Chaos theory in metaheuristics. Comprehensive metaheuristics. Elsevier, Amsterdam, pp 1–20
Google Scholar
Huang Y, Gao Y, Gan Y, Ye M (2021) A new financial data forecasting model using genetic algorithm and long short-term memory network. Neurocomputing 425:207–218
Article Google Scholar
Tian J, Hou M, Bian H, Li J (2022) Variable surrogate model-based particle swarm optimization for high-dimensional expensive problems. Complex Intell Syst 1–49
Yadav RK et al (2020) Pso-ga based hybrid with adam optimization for ann training with application in medical diagnosis. Cognit Syst Res 64:191–199
Article Google Scholar
Liang Y, Liu J (2021) Feature selection using forest optimization algorithm based on multi-ethnic strategy. In: 2021 16th International conference on intelligent systems and knowledge engineering (ISKE), pp 63–68. IEEE
Ghaemi M, Feizi-Derakhshi M-R (2014) Forest optimization algorithm. Expert Syst Appl 41(15):6676–6687
Google Scholar
Kaya E, Gorkemli B, Akay B, Karaboga D (2022) A review on the studies employing artificial bee colony algorithm to solve combinatorial optimization problems. Eng Appl Artif Intell 115:105311
Article Google Scholar
Turkoglu B, Uymaz SA, Kaya E (2022) Clustering analysis through artificial algae algorithm. Int J Mach Learn Cybernet 13(4):1179–1196
Article Google Scholar
Turkoglu B, Uymaz SA, Kaya E (2022) Binary artificial algae algorithm for feature selection. Appl Soft Comput 120:108630
Article Google Scholar
Turkoglu B, Kaya E (2020) Training multi-layer perceptron with artificial algae algorithm. Eng Sci Technol Int J 23(6):1342–1350
Google Scholar
Greenacre M, Groenen PJ, Hastie T, d’Enza AI, Markos A, Tuzhilina E (2022) Principal component analysis. Nat Rev Methods Primers 2(1):100
Article Google Scholar
Stephens T (2016) Genetic Programming in Python With a Scikit-Learn Inspired API: Gplearn
Katoch S, Chauhan SS, Kumar V (2021) A review on genetic algorithm: past, present, and future. Multimed Tools Appl 80(5):8091–8126
Article Google Scholar
Kim B (2014) Simplicity Theory. Oxford University Press, Oxford
MATH Google Scholar
Bargagli Stoffi FJ, Cevolani G, Gnecco G (2022) Simple models in complex worlds: Occam’s razor and statistical learning theory. Minds Mach 32(1):13–42
Article Google Scholar
Sharma D (2022) Occam’s razor and surgeons. Indian J Surg 1–2
Ladley D, Pellizzari P (2014) The Simplicity of Optimal Trading in Order Book Markets. Springer, New York City, pp 183–199
Google Scholar
Heinz A, Jamaloodeen M, Saxena A, Pollacia L (2021) Bullish and bearish engulfing japanese candlestick patterns: A statistical analysis on the s &p 500 index. Q Rev Econ Finance 79:221–244
Article Google Scholar
Indah YR, Mahyuni LP (2022) The accuracy of relative strength index (rsi) indicator in forecasting foreign exchange price movement. Inovbiz: J Inovasi Bisnis 10(1):96–101
Sagar R, Sharma GP (2012) Measurement of alpha diversity using simpson index (1/lamda): the jeopardy. Environ Skeptics Critics 1(1):23
Google Scholar
Wang CD, Chen Z, Lian Y, Chen M (2022) Asset selection based on high frequency sharpe ratio. J Econ 227(1):168–188
Article MathSciNet MATH Google Scholar
Kakushadze Z (2016) 101 formulaic alphas. Wilmott 2016(84):72–81
Article Google Scholar
Du X, Tanaka-Ishii K (2022) Stock portfolio selection balancing variance and tail risk via stock vector representation acquired from price data and texts. Knowl-Based Syst 249:108917
Article Google Scholar
Riley T, Yan Q (2022) Maximum drawdown as predictor of mutual fund performance and flows. Financ Anal J 78(4):59–76
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, People’s Republic of China
Zhenyi Shen, Xiaohu Yang & Dan Zhao
Bank of Hangzhou Co., Ltd., Hangzhou, Zhejiang, People’s Republic of China
Zhenyi Shen, Xiahong Mao & Dan Zhao

Authors

Zhenyi Shen
View author publications
You can also search for this author in PubMed Google Scholar
Xiahong Mao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dan Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dan Zhao.

Ethics declarations

Conflict of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 A. List of abbreviations

The abbreviations used in this paper are summarized in Table 10.

Table 10 List of abbreviations

Full size table

1.2 B. List of symbols

The symbols used in this paper are summarized in Table 11.

Table 11 List of symbols

Full size table

1.3 C. Summary of 28 basic patterns

The name of all 28 basic patterns used as the columns of each input are provided in Table 12. For a given trading day, the possible value for each basic pattern is either 0 or 1, indicating the occurrence and absence of the basic pattern, respectively.

Table 12 The name of 28 basic patterns used in the input matrix

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shen, Z., Mao, X., Yang, X. et al. Mining profitable alpha factors via convolution kernel learning. Appl Intell 53, 28460–28478 (2023). https://doi.org/10.1007/s10489-023-05014-4

Download citation

Accepted: 08 September 2023
Published: 05 October 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-05014-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining profitable alpha factors via convolution kernel learning

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis

A brief review of portfolio optimization techniques

Machine learning-driven credit risk: a systemic review

Availability of data and materials

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher's Note

Appendix

1.1 A. List of abbreviations

1.2 B. List of symbols

1.3 C. Summary of 28 basic patterns

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining profitable alpha factors via convolution kernel learning

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis

A brief review of portfolio optimization techniques

Machine learning-driven credit risk: a systemic review

Availability of data and materials

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher's Note

Appendix

Appendix

1.1 A. List of abbreviations

1.2 B. List of symbols

1.3 C. Summary of 28 basic patterns

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation