Abstract
In the era of big data, stock markets are closely connected with Internet big data from diverse sources. This paper makes the first attempt to compare the linkage between stock markets and various Internet big data collected from search engines, public media and social media. To achieve this purpose, a big data-based causality testing framework is proposed with three steps, i.e., data crawling, data mining and causality testing. Taking the Shanghai Stock Exchange and Shenzhen Stock Exchange as targets for stock markets, web search data, news, and microblogs as samples of Internet big data, some interesting findings can be obtained. 1) There is a strong bi-directional, linear and nonlinear Granger causality between stock markets and investors’ web search behaviors due to some similar trends and uncertain factors. 2) News sentiments from public media have Granger causality with stock markets in a bi-directional linear way, while microblog sentiments from social media have Granger causality with stock markets in a unidirectional linear way, running from stock markets to microblog sentiments. 3) News sentiments can explain the changes in stock markets better than microblog sentiments due to their authority. The results of this paper might provide some valuable information for both stock market investors and modelers.
References
Li J J, Xu L Z, Tang L, et al., Big data in tourism research: A literature review, Tourism Management, 2018, 68: 301–323.
Barber B M and Odean T, All that Glitters: The effect of attention and news on the buying behavior of individual and institutional investors, The Review of Financial Studies, 2007, 21(2): 785–818.
Huang T L, Lai K L, and Chen M L, How web search activity exert influence on stock trading across market states?, PACIS, 2014, 198.
Dimpfl T and Jank S, Can Internet search queries help to predict stock market volatility?, European Financial Management, 2016, 22(2): 171–192.
Hu H, Tang L, Zhang S, et al., Predicting the direction of stock markets using optimized neural networks with Google Trends, Neurocomputing, 2018, 285: 188–195.
Kelly S and Ahmad K, Estimating the impact of domain-specific news sentiment on financial assets, Knowledge-Based Systems, 2018, 150: 116–126.
Wu G R, Hou C T, and Lin J L, Can economic news predict Taiwan stock market returns, Asia Pacific Management Review, 2019, 24(1): 54–59.
Ruan Y, Durresi A, and Alfantoukh L, Using Twitter trust network for stock market analysis, Knowledge-Based Systems, 2018, 145: 207–218.
Chu X, Wu C, and Qiu J, A nonlinear Granger causality test between stock returns and investor sentiment for Chinese stock market: A wavelet-based approach, Applied Economics, 2016, 48(21): 1915–1924.
Dong Y H, Chen H, Qian W N, et al., Micro-blog social moods and Chinese stock market: The influence of emotional valence and arousal on Shanghai Composite Index volume, International Journal of Embedded Systems, 2015, 7(2): 148–155.
Jawadi F, Namouri H, and Ftiti Z, An analysis of the effect of investor sentiment in a heterogeneous switching transition model for G7 stock markets, Journal of Economic Dynamics and Control, 2018, 91: 469–484.
Ginsberg J, Mohebbi M H, Patel R S, et al., Detecting influenza epidemics using search engine query data, Nature, 2009, 457(7232): 1012–1014.
Liu Y, Chen Y, Wu S, et al., Composite leading search index: A preprocessing method of Internet search data for stock trends prediction, Annals of Operations Research, 2015, 234(1): 77–94.
Pak A and Paroubek P, Twitter as a corpus for sentiment analysis and opinion mining, Proceedings of the 7th International Conference on Language Resources and Evaluation, 2010, 10: 1320–1326.
Wang K, Zong C, and Su K Y, Which is more suitable for Chinese word segmentation, the generative model or the discriminative one?, Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, 2009, 2: 827–834.
Brants T, TnT: A statistical part-of-speech tagger, Proceedings of the 6th Conference on Applied Natural Language Processing, 2000, 224–231.
Sajjad H, Dalvi F, Durrani N, et al., Challenging language-dependent segmentation for Arabic: An application to machine translation and part-of-speech tagging, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017.
Duda R O and Hart P E, Pattern Classification and Scene Analysis, Wiley-Interscience, Oxford, 1973.
Langley P and Thompson K, An analysis of Bayesian classifiers, Proceedings of the 10th National Conference on Artificial Intelligence, 1992, 223–228.
Friedman N, Dan G, and Goldszmidt M, Bayesian network classifiers, Machine Learning, 1997, 29: 131–163.
Yu L, Li J, Tang L, et al., Linear and nonlinear Granger causality investigation between carbon market and crude oil market: A multi-scale approach, Energy Economics, 2015, 51: 300–311.
Baek E and Brock W A, A nonparametric test for independence of a multivariate time series, Statistica Sinica, 1992, 2(1): 137–156.
Hiemstra C and Jones J D, Testing for linear and nonlinear Granger causality in the stock pricevolume relation, Journal of Finance, 1994, 49(5): 1639–1664.
Zhang Y, Feng L, Jin X, et al., Internet information arrival and volatility of SME Price index, Physica A: Statistical Mechanics and Its Applications, 2014, 399: 70–74.
Zhang W, Shen D, Zhang Y, et al., Open source information, investor attention, and asset pricing, Economic Modelling, 2013, 33: 613–619.
Dzielinski M, News sensitivity and the cross-section of stock returns, SSRN Electronic Journal, 2011.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was sponsored by the National Natural Science Foundation of China under Grant Nos. 71573244, 71532013, 71202115 and 71403260.
This paper was recommended for publication by Editor WANG Shouyang.
Rights and permissions
About this article
Cite this article
Dong, J., Dai, W. & Li, J. Exploring the Linear and Nonlinear Causality Between Internet Big Data and Stock Markets. J Syst Sci Complex 33, 783–798 (2020). https://doi.org/10.1007/s11424-020-8119-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-020-8119-y