Multiple-cause discovery combined with structure learning for high-dimensional discrete data and application to stock prediction

Chen, Weiqi; Hao, Zhifeng; Cai, Ruichu; Zhang, Xiangzhou; Hu, Yong; Liu, Mei

doi:10.1007/s00500-015-1764-8

Multiple-cause discovery combined with structure learning for high-dimensional discrete data and application to stock prediction

Methodologies and Application
Published: 08 July 2015

Volume 20, pages 4575–4588, (2016)
Cite this article

Soft Computing Aims and scope Submit manuscript

Weiqi Chen¹,
Zhifeng Hao²,
Ruichu Cai²,
Xiangzhou Zhang^3,4,
Yong Hu^3,4 &
…
Mei Liu^4,5

953 Accesses
10 Citations
Explore all metrics

Abstract

Causal discovery in observational data is crucial to a variety of scientific and business research. Although many causal discovery algorithms have been proposed in recent decades, none of them is effective enough in dealing with high-dimensional discrete data. The main challenge is the complex interactions among large volume of variables, leading to numerous spurious causalities found. In this work, we propose a novel multiple-cause discovery method combined with structure learning (McDSL) to eliminate the spurious causalities. The method is carried out in two phases. In the first phase, conditional independence test is used to distinguish direct causal candidates from the indirect ones. In the second phase, causal direction of multi-cause structure is carefully determined with a hybrid causal discovery method. Validation experiments on synthetic data showed that McDSL is reliable in discovering multi-cause structures and eliminating indirect causes. We then applied this algorithm in discovering multiple causes of stock return based on 13-year historical financial data of the Shanghai Stock Exchanges of China, and established a stock prediction model. Experimental results showed that the McDSL discovered causes revealed changes of key risk factors of the stock market over 13 years, which indicated investors should change their investment strategy over time. Moreover, the causes discovered by McDSL have better performance in predicting stock return than that of other common filter-based feature selection algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hybrid Approach for Large Scale Causality Discovery

A Survey on Causal Discovery

Causal Discovery via the Subsample Based Reward and Punishment Mechanism

Notes

If \(|S|,|S'|=1\), that the above definition will be transformed into the definition in article (Peters et al. 2011).
\(\#\) Factor represents that Factor \(\#\) is inferred as the causes of return in training set by McDSL.
‘NoFS’ indicates no feature selection. Best results are highlighted in bold. The value in parentheses indicates the performance difference with the corresponding our algorithm. ‘Average’ is the average value of 6 algorithms on 7 baseline models.

References

Agbabiaka TB, Savović J, Ernst E (2008) Methods for causality assessment of adverse drug reactions. Drug Saf 310(1):21–37
Article Google Scholar
Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010) Local causal and markov blanket induction for causal discovery and feature selection for classification part i: algorithms and empirical evaluation. J Mach Learn Res 11:171–234
MathSciNet MATH Google Scholar
Andreu L, Aldás J, Bigné JE, Mattila AS (2010) An analysis of e-business adoption and its impact on relational quality in travel agency-supplier relationships. Tour Manag 310(6):777–787
Article Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton
MATH Google Scholar
Cai R, Zhang Z, Hao Z (2011) Bassum: a Bayesian semi-supervised method for classification feature selection. Pattern Recognit 440(4):811–820
Article MATH Google Scholar
Cai R, Zhang Z, Hao Z (2013a) Causal gene identification using combinatorial v-structure search. Neural Netw 43:63–71
Article MATH Google Scholar
Cai R, Zhang Z, Hao Z (2013b) Sada: a general framework to support robust causation discovery. In: Proceedings of the 30th international conference on machine learning, pp 208–216
Chang YC, Hsieh YL, Chen CC, Hsu WL (2015) A semantic frame-based intelligent agent for topic detection. Soft Comput. doi:10.1007/s00500-015-1695-4
De Morais SR, Aussem A (2010) A novel Markov boundary based feature subset selection algorithm. Neurocomputing 730(4):578–584
Article Google Scholar
Esposito C, Ficco M, Palmieri F, Castiglione A (2015) Smart cloud storage service selection based on fuzzy logic, theory of evidence and game theory. IEEE Trans Comput. doi:10.1109/TC.2015.2389952
Fama EF, French KR (1992) The cross-section of expected stock returns. J Financ 470(2):427–465
Article Google Scholar
Fernandez-Lozano C, Seoane JA, Gestal M, Gaunt TR, Dorado J, Campbell C (2015) Texture classification using feature selection and kernel-based techniques. Soft Comput doi:10.1007/s00500-014-1573-5
Fu R, Qin B, Liu T (2015) Open-categorical text classification based on multi-lda models. Soft Comput 190(1):29–38
Article Google Scholar
Hoyer PO, Janzing D, Mooij JM, Peters J, Schölkopf B (2009) Nonlinear causal discovery with additive noise models. In: Advances in neural information processing systems, pp 689–696
Kano Y, Shimizu S (2003) Causal inference using nonnormality. In: Proceedings of the international symposium on science of modeling, the 30th anniversary of the information criterion, pp 261–270
Karahoca A, Tunga MA (2015) A polynomial based algorithm for detection of embolism. Soft Comput 190(1):167–177
Article Google Scholar
Koller D, Sahami M (1996) Toward optimal feature selection. Proc int conf mach Learn 20(1113):284–292
Google Scholar
Lee M-C (2009) Using support vector machine with a hybrid feature selection method to the stock trend prediction. Expert Syst Appl 360(8):10896–10904
Article Google Scholar
Mooij J, Janzing D, Peters J, Schölkopf B (2009) Regression by dependence minimization and its application to causal inference in additive noise models. In: Proceedings of the 26th annual international conference on machine learning, pp 745–752. ACM
Pearl J (2000) Causality: models, reasoning and inference, vol 29. Cambridge Univ Press, Cambridge
MATH Google Scholar
Peters J, Janzing D, Gretton A, Schölkopf B (2009) Detecting the direction of causal time series. In: Proceedings of the 26th annual international conference on machine learning, pp 801–808. ACM
Peters J, Janzing D, Schölkopf B (2010) Identifying cause and effect on discrete data using additive noise models. In: International conference on artificial intelligence and statistics, pp 597–604
Peters J, Janzing D, Scholkopf B (2011) Causal inference on discrete data using additive noise models. IEEE Trans Pattern Anal Mach Intell 330(12):2436–2450
Article Google Scholar
Sethi R (1996) Endogenous regime switching in speculative markets. Struct Change Econ Dyn 70(1):99–118
Article Google Scholar
Shimizu S, Hoyer PO, Hyvärinen A, Kerminen A (2006) A linear non-gaussian acyclic model for causal discovery. J Mach Learn Res 7:2003–2030
MathSciNet MATH Google Scholar
Sobel ME (1996) An introduction to causal inference. Sociol Methods Res 240(3):353–379
Article MathSciNet Google Scholar
Spirtes P, Glymour CN, Scheines R (2000) Causation, prediction, and search, vol 81. MIT press, Cambridge
MATH Google Scholar
Tibshirani R (1994) Regression shrinkage and selection via the lasso. J Royal Stat Soc 58(1):267–288
MathSciNet MATH Google Scholar
Tsai C-F, Hsiao Y-C (2010) Combining multiple feature selection methods for stock prediction: union, intersection, and multi-intersection approaches. Decis Support Syst 500(1):258–269
Article Google Scholar
Tsai C-F, Lin Y-C, Yen DC, Chen Y-M (2011) Predicting stock returns by classifier ensembles. Appl Soft Comput 110(2):2452–2459
Article Google Scholar
Tsamardinos I, Aliferis CF, Statnikov A (2003) Time and sample efficient discovery of markov blankets and direct causal relations. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 673–678. ACM
Zhang J, Spirtes P (2008) Detection of unfaithfulness and robust causal inference. Minds Mach 180(2):239–271
Article Google Scholar
Zhang X, Yong H, Xie K, Wang S, Ngai EWT, Liu M (2014) A causal feature selection algorithm for stock prediction modeling. Neurocomputing 142:48–59
Article Google Scholar
Zhu Z, Ong Y-S, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit 400(11):3236–3248
Article MATH Google Scholar
Zunino L, Zanin M, Tabak BM, Pérez DG, Rosso OA (2010) omplexity-entropy causality plane: A useful approach to quantify the stock market inefficiency. Phys A Stat Mech Appl 3890(9):1891–1901
Article Google Scholar
Zuo Y, Kita E (2012) Stock price forecast using Bayesian network. Expert Syst Appl 390(8):6729–6737
Article Google Scholar

Download references

Acknowledgments

This research was partly supported by the National Natural Science Foundation of China (71271061, 70801020), Science and Technology Planning Project of Guangdong Province, China (2010B010600034, 2012B091100192), Guangdong Natural Science Foundation Research Team (S2013030015737), and Business Intelligence Key Team of Guangdong University of Foreign Studies (TD1202).

Author information

Authors and Affiliations

Faculty of Automation, Guangdong University of Technology, Guangzhou, China
Weiqi Chen
Department of Computer Science, Guangdong University of Technology, Guangzhou, China
Zhifeng Hao & Ruichu Cai
School of Business, Sun Yat-sen University, Guangzhou, China
Xiangzhou Zhang & Yong Hu
Big Data Decision Institute, Jinan University, Guangzhou, China
Xiangzhou Zhang, Yong Hu & Mei Liu
University of Kansas Medical Center, Kansas City, USA
Mei Liu

Authors

Weiqi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhifeng Hao
View author publications
You can also search for this author in PubMed Google Scholar
Ruichu Cai
View author publications
You can also search for this author in PubMed Google Scholar
Xiangzhou Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Hu
View author publications
You can also search for this author in PubMed Google Scholar
Mei Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiqi Chen.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, W., Hao, Z., Cai, R. et al. Multiple-cause discovery combined with structure learning for high-dimensional discrete data and application to stock prediction. Soft Comput 20, 4575–4588 (2016). https://doi.org/10.1007/s00500-015-1764-8

Download citation

Published: 08 July 2015
Issue Date: November 2016
DOI: https://doi.org/10.1007/s00500-015-1764-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple-cause discovery combined with structure learning for high-dimensional discrete data and application to stock prediction

Abstract

Access this article

Similar content being viewed by others

A Hybrid Approach for Large Scale Causality Discovery

A Survey on Causal Discovery

Causal Discovery via the Subsample Based Reward and Punishment Mechanism

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiple-cause discovery combined with structure learning for high-dimensional discrete data and application to stock prediction

Abstract

Access this article

Similar content being viewed by others

A Hybrid Approach for Large Scale Causality Discovery

A Survey on Causal Discovery

Causal Discovery via the Subsample Based Reward and Punishment Mechanism

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation