Abstract
The fast-growing digital data generation leads to the emergence of the era of big data, which become particularly more valuable because approximately 70% of the collected data in the world comes from social media. Thus, the investigation of online social network services is of paramount importance. In this paper, we use the sentiment analysis, which detects attitudes and emotions toward issues of society posted in social media, to understand the actual economic situation. To this end, two steps are suggested. In the first step, after training the sentiment classifiers with several big data sources of social media datasets, we consider three types of feature sets: feature vector, sequence vector and a combination of dictionary-based feature and sequence vectors. Then, the performance of six classifiers is assessed: MaxEnt-L1, C4.5 decision tree, SVM-kernel, Ada-boost, Naïve Bayes and MaxEnt. In the second step, we collect datasets that are relevant to several economic words that the public use to explicitly express their opinions. Finally, we use a vector auto-regression analysis to confirm our hypothesis. The results show the statistically significant relationship between public sentiment and economic performance. That is, “depression” and “unemployment” lead to KOSPI. Also, it shows that the extracted keywords from the sentiment analysis, such as “price,” “year-end-tax” and “budget deficit,” cause the exchange rates.


Similar content being viewed by others
References
Perrin A (2015) Social media usage. Pew research center, pp 52–68
Statista, Number of social network users worldwide from 2010 to 2021 (in billions). https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/
Jay Jacobs, CFA (2016) Social Media: Tech’s Growth Industry. https://www.globalxfunds.com/social-media-techs-growth-industry/
Jin S, Lin W, Yin H, Yang S, Li A, Deng B (2015) Community structure mining in big data social media networks with MapReduce. Clust Comput 18(3):999–1010
Zhang G, Xu L, Xue Y (2017) Model and forecast stock market behavior integrating investor sentiment analysis and transaction data. Clust Comput 20(1):789–803
Nasukawa T, Yi J (2003) Sentiment analysis: capturing favorability using natural language processing. In: Proceedings of the 2nd International Conference on Knowledge Capture. ACM, pp 70–77
Appel O, Chiclana F, Carter J (2015) Main concepts, state of the art and future research questions in sentiment analysis. Acta Polytech Hung 12(3):87–108
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10. Association for Computational Linguistics, pp 79–86
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp 347-354
O’Hare N, Davy M, Bermingham A, Ferguson P, Sheridan P, Gurrin C, Smeaton AF (2009) Topic-dependent sentiment analysis of financial blogs. In: Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion. ACM, pp 9–16
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, vol 1, no 12
Wu F, Yuan Z, Huang Y (2017) Collaboratively training sentiment classifiers for multiple domains. IEEE Trans Knowl Data Eng 29(7):1370–1383
Fernández AM, Esuli A, Sebastiani F (2016) Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification. J Artif Intell Res 55(1):131–163
Wang L, Niu J, Song H, Atiquzzaman M (2018) SentiRelated: a cross-domain sentiment classification algorithm for short texts through sentiment related index. J Netw Comput Appl 101:111–119
Bader BW, Kegelmeyer WP, Chew PA (2011) Multilingual sentiment analysis using latent semantic indexing and machine learning. In: IEEE 11th International Conference on Data Mining Workshops, pp 45–52
Manek AS, Shenoy PD, Mohan MC, Venugopal KR (2017) Aspect term extraction for sentiment analysis in large movie reviews using Gini index feature selection method and SVM classifier. World Wide Web 20(2):135–154
Culnan M, McHugh P, Zubillaga J (2010) How large U.S. companies can use twitter and other social media to gain business value. MIS Q Executive 9(4):243–259
Di Gangi PM, Wasko M, Hooker RE (2010) Getting customers’ ideas to work for you: learning from dell how to succeed with online user innovation communities. MIS Q Executive 9(4):163–178
He W, Zha S, Li L (2013) Social media competitive analysis and text mining: a case study in the pizza industry. Int J Inf Manag 33(3):464–472
Yang Y, Duan W, Cao Q (2013) The impact of social and conventional media on firm equity value: a sentiment analysis approach. Decis Support Syst 55(4):919–926
Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions. Ecol Model 190(3):231–259
Sun CJ, Yao L, Lin L, Sha XJ, Wang XL (2011) Semi-supervised biomedical relation classification using generalized expectation criteria. In: 2011 International Conference on Machine Learning and Cybernetics (ICMLC), vol 4. IEEE, pp 1949–1952
Mann GS, McCallum A (2010) Generalized expectation criteria for semi-supervised learning with weakly labeled data. J Mach Learn Res 11:955–984
Polat K, Güneş S (2009) A novel hybrid intelligent method based on C4. 5 decision tree classifier and one-against-all approach for multi-class classification problems. Expert Syst Appl 36(2):1587–1592
Schapire RE (2003) The boosting approach to machine learning: an overview. In: Nonlinear estimation and classification. Springer, New York, pp 149–171
Lewis DD (1998) Naive (Bayes) at forty: the independence assumption in information retrieval. In: European Conference on Machine Learning. Springer, Berlin, pp 4–15
Vapnik V (2013) The nature of statistical learning theory. Springer, Berlin
Levine R, Zervos S (1998) Stock markets, banks, and economic growth. Am Econ Rev 88:537–558
Acknowledgements
This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2015S1A3A2046711).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, H., Lee, N., Seo, H. et al. Developing a supervised learning-based social media business sentiment index. J Supercomput 76, 3882–3897 (2020). https://doi.org/10.1007/s11227-018-02737-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-02737-x