Abstract
Analyzing market performance via social media has attracted a great deal of attention in the finance and machine- learning disciplines. However, the vast majority of research does not consider the enormous influence a crisis has on social media that further affects the relationship between social media and the stock market. This article aims to address these challenges by proposing a multistage dynamic analysis framework. In this framework, we use an authorship analysis technique and topic model method to identify stakeholder groups and topics related to a special firm. We analyze the activities of stakeholder groups and topics in different periods of a crisis to evaluate the crisis’s influence on various social media parameters. Then, we construct a stock regression model in each stage of crisis to analyze the relationships of changes among stakeholder groups/topics and stock behavior during a crisis. Finally, we discuss some interesting and significant results, which show that a crisis affects social media discussion topics and that different stakeholder groups/topics have distinct effects on stock market predictions during each stage of a crisis.
Similar content being viewed by others
References
Agrawal R, Rajagopalan S, Srikant R, et al. Mining newsgroups using networks arising from social behavior. In: Proceedings of the 12th International Conference on World Wide Web. New York: ACM, 2003. 529–535
Schumaker R P, Chen H. Textual analysis of stock market prediction using breaking financial news: the AZFin text system. ACM Trans Inf Syst, 2009, 27: 12
Das S R, Chen M Y. Yahoo! for Amazon: sentiment extraction from small talk on the web. Manag Sci, 2007, 53: 1375–1388
Antweiler W, Frank M Z. Is all that talk just noise? The information content of internet stock message boards. J Finan, 2004, 59: 1259–1294
Donaldson T, Preston L E. The stakeholder theory of the corporation: concepts, evidence, and implications. Acad Manage Rev, 1995, 20: 65–91
Kim W, Jeong O R, Lee S W. On social Web sites. Inf Syst, 2010, 35: 215–236
Chen H. Smart market and money. IEEE Intell Syst, 2011, 26: 82–96
Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993–1022
Tetlock P, Teschansky M, Macskassy S. More than words: quantifying language to measure firms’ fundamentals. J Finan, 2008, 63: 1437–1467
Shiller R. Do stock price move too much to be justified by subsequent changes in dividends? Amer Psychol Rev, 1981, 5: 296–320
Roll R. R-squared. J Finan, 1988, 43: 541–566
Watts D J, Dodds P S. Influentials, networks, and public opinion formation. J Consum Res, 2007, 34: 441–458
Chung W, Chen H, Reid E. Business stakeholder analyzer: an experiment of classifying stakeholders on the Web. J AM Soc Inf Sci Technol, 2009, 60: 59–74
Mei Q, Zhai C X. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. New York: ACM, 2005. 198–207
Zhou Y, Guan X, Zhang Z, et al. Predicting the tendency of topic discussion on the online social networks using a dynamic probability model. In: Proceedings of the Hypertext 2008 Workshop on Collaboration and Collective Intelligence. New York: ACM, 2008. 7–11
Dubinko M, Kumar R, Magnani J, et al. Visualizing tags over time. ACM Trans Web, 2007, 1: 7
Kaplan A M, Haenlein M. Users of the world, unite! The challenges and opportunities of Social Media. Bus Horiz, 2010, 53: 59–68
Zheng R, Li J, Chen H, et al. A framework for authorship identification of online messages: Writing style features and classification techniques. J AM Soc Inf Sci Technol, 2006, 57: 378–393
Burrows J F. Word-patterns and story-shapes: the statistical analysis of narrative style. Lit Linguist Comput, 1987, 2: 61–70
Stamatatos E, Fakotakis N, Kokkinakis G. Computer-based authorship attribution without lexical measures. Comput Hum, 2001, 35: 193–214
De Vel O, Anderson A, Corney M, et al. Mining e-mail content for author identification forensics. ACM Sigmod Rec, 2001, 30: 55–64
Zheng R, Qin Y, Huang Z, et al. Authorship analysis in cybercrime investigation. In: Proceedings of the 1st NSF/NIJ Conference on Intelligence and Security Informatics. Berlin/heidelberg: Springer-Verlag, 2003. 59–73
Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1999. 50–57
Griffiths T L, Steyvers M. Finding scientific topics. Proc Nat Acad Sci USA, 2004, 101(Suppl. 1): 5228–5235
Carlson B A. Unsupervised topic clustering of switchboard speech messages. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Atlanta, 1996. 315–318
Hofmann T. Unsupervised learning by probabilistic latent semantic analysis. Mach learn, 2001, 42: 177–196
Wei X, Croft W B. LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2006. 178–185
Bao H, Chang E Y. Adheat: an influence-based diffusion model for propagating hints to match ads. In: Proceedings of the 19th International Conference on World Wide Web. New York: ACM, 2010. 71–80
Jain A K, Murty M N, Flynn P J. Data clustering: a review. ACM Comput Surv, 1999, 31: 264–323
Witten I H, Frank E. Data Mining: Practical Machine Learing Tools and Techniques. 2nd ed. Morgan Kaufmann, 2005
Fisher D H. Knowledge acquisition via incremental conceptual clustering. Mach Learn, 1987, 2: 139–172
Cheeseman P, Stutz J. Bayesian classification (AutoClass): theory and results. In: Fayyad U M, Piatetsky-Shapiro G, Smyth P, et al., eds. Advances in Knowledge Discovery and Data Mining. Menlo Park: AAAI Press, 1995
Abbasi A, Chen H, Salem A. Sentiment analysis in multiple languages: feature selection for opinion classification in Web forums. ACM Trans Inf Syst, 2008, 26: 12
Chen H, Zimbra D. AI and opinion mining. IEEE Intell Syst, 2010, 25: 74–80
Pang B, Lee L. Opinion mining and sentiment analysis. Found Trends Inf Retr, 2008, 2: 1–135
Gamon M. Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In: Proceedings of the 20th International Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2004. 841
Hatzivassiloglou V, Wiebe J M. Effects of adjective orientation and gradability on sentence subjectivity. In: Proceedings of the 18th International Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2000. 299–305
Pang B, Lee L, Vaithyanathan S. Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2002. 79–86
Engle R, Patton A. What good is a volatility model? Quant Financ, 2001, 1: 237–245
Abbasi A, Chen H. Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans Inf Syst, 2008, 26: 7
Abbasi A, Chen H, Nunamaker J F. Stylometric identification in electronic markets: scalability and robustness. J Manage Inf Syst, 2008, 25: 49–78
Abbasi A, Chen H. CyberGate: a design framework and system for text analysis of computer-mediated communication. MIS Quart, 2008, 32: 811
Zhang Y L, Dang C. Gender classification for Web forums. IEEE Trans Syst Man Cybern A-Syst Hum, 2011, 41: 668–677
Abbasi A, Chen H. Visualizing authorship for identification. In: Proceedings of the 4th IEEE International Conference on Intelligence and Security Informatics. Berlin/Heidelberg: Springer-Verlag, 2006. 60–71
Huang S, Ward M O, Rundensteiner E A. Exploration of dimensionality reduction for text visualization. In: Proceedings of the 3rd International Conference on Coordinated and Multiple Views in Exploratory Visualization. Washington DC: IEEE, 2005. 63–74
Riloff E, Wiebe J. Learning extraction patterns for subjective expressions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2003. 105–112
Esuli A, Sebastiani F. Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of the 5th Conference on Language Resources and Evaluation, Genoa, 2006. 417–422
Antweiler W, Frank M Z. Internet stock message boards and stock returns. University of British Columbia Working Paper, 2002
De Choudhury M, Sundaram H, John A, et al. Can blog communication dynamics be correlated with stock market activity? In: Proceedings of the 9th ACM Conference on Hypertext and Hypermedia. New York: ACM, 2008. 55–60
Hansen P R, Lunde A. A forecast comparison of volatility models: does anything beat a GARCH (1,1)? J Appl Econom, 2005, 20: 873–889
Bossaerts P, Hillion P. Implementing statistical criteria to select return forecasting models: what do we learn? Rev Financ Stud, 1999, 12: 405–428
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, C., Liang, K., Chen, H. et al. Analyzing market performance via social media: a case study of a banking industry crisis. Sci. China Inf. Sci. 57, 1–18 (2014). https://doi.org/10.1007/s11432-013-4860-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-013-4860-3