Skip to main content
Log in

Exploring the Linear and Nonlinear Causality Between Internet Big Data and Stock Markets

Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

In the era of big data, stock markets are closely connected with Internet big data from diverse sources. This paper makes the first attempt to compare the linkage between stock markets and various Internet big data collected from search engines, public media and social media. To achieve this purpose, a big data-based causality testing framework is proposed with three steps, i.e., data crawling, data mining and causality testing. Taking the Shanghai Stock Exchange and Shenzhen Stock Exchange as targets for stock markets, web search data, news, and microblogs as samples of Internet big data, some interesting findings can be obtained. 1) There is a strong bi-directional, linear and nonlinear Granger causality between stock markets and investors’ web search behaviors due to some similar trends and uncertain factors. 2) News sentiments from public media have Granger causality with stock markets in a bi-directional linear way, while microblog sentiments from social media have Granger causality with stock markets in a unidirectional linear way, running from stock markets to microblog sentiments. 3) News sentiments can explain the changes in stock markets better than microblog sentiments due to their authority. The results of this paper might provide some valuable information for both stock market investors and modelers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

  1. Li J J, Xu L Z, Tang L, et al., Big data in tourism research: A literature review, Tourism Management, 2018, 68: 301–323.

    Article  Google Scholar 

  2. Barber B M and Odean T, All that Glitters: The effect of attention and news on the buying behavior of individual and institutional investors, The Review of Financial Studies, 2007, 21(2): 785–818.

    Article  Google Scholar 

  3. Huang T L, Lai K L, and Chen M L, How web search activity exert influence on stock trading across market states?, PACIS, 2014, 198.

    Google Scholar 

  4. Dimpfl T and Jank S, Can Internet search queries help to predict stock market volatility?, European Financial Management, 2016, 22(2): 171–192.

    Article  Google Scholar 

  5. Hu H, Tang L, Zhang S, et al., Predicting the direction of stock markets using optimized neural networks with Google Trends, Neurocomputing, 2018, 285: 188–195.

    Article  Google Scholar 

  6. Kelly S and Ahmad K, Estimating the impact of domain-specific news sentiment on financial assets, Knowledge-Based Systems, 2018, 150: 116–126.

    Article  Google Scholar 

  7. Wu G R, Hou C T, and Lin J L, Can economic news predict Taiwan stock market returns, Asia Pacific Management Review, 2019, 24(1): 54–59.

    Article  Google Scholar 

  8. Ruan Y, Durresi A, and Alfantoukh L, Using Twitter trust network for stock market analysis, Knowledge-Based Systems, 2018, 145: 207–218.

    Article  Google Scholar 

  9. Chu X, Wu C, and Qiu J, A nonlinear Granger causality test between stock returns and investor sentiment for Chinese stock market: A wavelet-based approach, Applied Economics, 2016, 48(21): 1915–1924.

    Article  Google Scholar 

  10. Dong Y H, Chen H, Qian W N, et al., Micro-blog social moods and Chinese stock market: The influence of emotional valence and arousal on Shanghai Composite Index volume, International Journal of Embedded Systems, 2015, 7(2): 148–155.

    Article  Google Scholar 

  11. Jawadi F, Namouri H, and Ftiti Z, An analysis of the effect of investor sentiment in a heterogeneous switching transition model for G7 stock markets, Journal of Economic Dynamics and Control, 2018, 91: 469–484.

    Article  MathSciNet  Google Scholar 

  12. Ginsberg J, Mohebbi M H, Patel R S, et al., Detecting influenza epidemics using search engine query data, Nature, 2009, 457(7232): 1012–1014.

    Article  Google Scholar 

  13. Liu Y, Chen Y, Wu S, et al., Composite leading search index: A preprocessing method of Internet search data for stock trends prediction, Annals of Operations Research, 2015, 234(1): 77–94.

    Article  MathSciNet  Google Scholar 

  14. Pak A and Paroubek P, Twitter as a corpus for sentiment analysis and opinion mining, Proceedings of the 7th International Conference on Language Resources and Evaluation, 2010, 10: 1320–1326.

    Google Scholar 

  15. Wang K, Zong C, and Su K Y, Which is more suitable for Chinese word segmentation, the generative model or the discriminative one?, Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, 2009, 2: 827–834.

    Google Scholar 

  16. Brants T, TnT: A statistical part-of-speech tagger, Proceedings of the 6th Conference on Applied Natural Language Processing, 2000, 224–231.

    Chapter  Google Scholar 

  17. Sajjad H, Dalvi F, Durrani N, et al., Challenging language-dependent segmentation for Arabic: An application to machine translation and part-of-speech tagging, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017.

    Google Scholar 

  18. Duda R O and Hart P E, Pattern Classification and Scene Analysis, Wiley-Interscience, Oxford, 1973.

    MATH  Google Scholar 

  19. Langley P and Thompson K, An analysis of Bayesian classifiers, Proceedings of the 10th National Conference on Artificial Intelligence, 1992, 223–228.

    Google Scholar 

  20. Friedman N, Dan G, and Goldszmidt M, Bayesian network classifiers, Machine Learning, 1997, 29: 131–163.

    Article  Google Scholar 

  21. Yu L, Li J, Tang L, et al., Linear and nonlinear Granger causality investigation between carbon market and crude oil market: A multi-scale approach, Energy Economics, 2015, 51: 300–311.

    Article  Google Scholar 

  22. Baek E and Brock W A, A nonparametric test for independence of a multivariate time series, Statistica Sinica, 1992, 2(1): 137–156.

    MathSciNet  MATH  Google Scholar 

  23. Hiemstra C and Jones J D, Testing for linear and nonlinear Granger causality in the stock pricevolume relation, Journal of Finance, 1994, 49(5): 1639–1664.

    Google Scholar 

  24. Zhang Y, Feng L, Jin X, et al., Internet information arrival and volatility of SME Price index, Physica A: Statistical Mechanics and Its Applications, 2014, 399: 70–74.

    Article  Google Scholar 

  25. Zhang W, Shen D, Zhang Y, et al., Open source information, investor attention, and asset pricing, Economic Modelling, 2013, 33: 613–619.

    Article  Google Scholar 

  26. Dzielinski M, News sensitivity and the cross-section of stock returns, SSRN Electronic Journal, 2011.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingjing Li.

Additional information

This research was sponsored by the National Natural Science Foundation of China under Grant Nos. 71573244, 71532013, 71202115 and 71403260.

This paper was recommended for publication by Editor WANG Shouyang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, J., Dai, W. & Li, J. Exploring the Linear and Nonlinear Causality Between Internet Big Data and Stock Markets. J Syst Sci Complex 33, 783–798 (2020). https://doi.org/10.1007/s11424-020-8119-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-020-8119-y

Keywords

Navigation