Abstract
Stock price volatility prediction is regarded as one of the most attractive and meaningful research issues in financial market. Some existing researches have pointed out that both the prediction accuracy and the prediction speed are the most important factors in the process of stock prediction. In this paper, we focus on the problem of how to design a methodology which can improve prediction accuracy as well as speed up prediction process, and propose a new prediction model which employs mutual information- based sentimental analysis methodology with extreme learning machine to enhance the prediction performance. The two major contributions of our work are (1) as the words in the news documents are not absolutely negative or positive, and the lengths of the financial news documents are various; here, we propose a new sentimental analysis methodology based on mutual information to improve the efficiency of feature selection, which is different from the traditional sentimental analysis algorithm, and a new weighting scheme is also used in the feature weighting process; (2) since ELM is a fast learning model and has been successfully applied in many research fields, we propose a prediction model which combined mutual information-based sentimental analysis with kernel-based ELM named as MISA-K-ELM. This model has the benefits of both statistical sentimental analysis and ELM, which can well balance the requirements of both prediction accuracy and prediction speed. We take experiments on HKEx 2001 stock market datasets to validate the performance of the proposed MISA-K-ELM. The market historical price and the market news are implemented in our MISA-K-ELM. To test the efficiency of MISA, we first compare the prediction accuracy of ELM model using MISA with ELM model using traditional sentimental analysis. Then, we compare our proposed MISA-K-ELM with existing state-of-the-art learning algorithms, such as Back-Propagation Neural Network (BP-NN), and Support Vector Machine (SVM). Our experimental results show that (1) MISA model can help get higher prediction accuracy than traditional sentimental analysis models; (2) MISA-K-ELM and MISA-SVM have a higher prediction accuracy than MISA-BP-NN and MISA-B-ELM; (3) both MISA-K-ELM and MISA-B-ELM can achieve faster prediction speed than MISA-SVM and MISA-BP-NN in most cases; (4) in most cases, MISA-K-ELM has higher prediction accuracy than the other three methodologies.
Similar content being viewed by others
Notes
Software is downloaded on http://ictclas.org.
Levenberg–Marquardt algorithm has its implementation in MATLAB toolbox.
Notation \(\#(X)\) indicates the number of object X.
References
Aizawa A (2003) An information-theoretic perspective of tf-idf measures. Inf Process Manag 39(1):45–65
Baccianella S, Esuli A, Sebastiani F (2010) Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, vol. 10, 2010, pp 2200–2204
Bautin M, Vijayarenu L, Skiena S (2008) International sentiment analysis for news and blogs. In: ICWSM, 2008
Bhatia N et al (2010) Survey of nearest neighbor techniques. arXiv:1007.0085
Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8
Cheung C-C, Ng S-C, Lui AK, Xu SS (2010) Enhanced two-phase method in fast learning algorithms. In: Proceedings of the 2010 international joint conference on neural networks (IJCNN’10), IEEE, 2010, pp 1–7
Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-hash and tf-idf weighting. In: BMVC, vol 810, 2008, pp 812–815
Dai W, Wu J-Y, Lu C-J (2012) Combining nonlinear independent component analysis and neural network for the prediction of asian stock market indexes. Exp Syst Appl 39(4):4444–4452
Deng S, Mitsubuchi T, Shioda K, Shimada T, Sakurai A (2011) Combining technical analysis with sentiment analysis for stock price prediction. In: Dependable, autonomic and secure computing (DASC), 2011 IEEE 9th international conference on, IEEE, 2011, pp 800–807
Feldman R, Rosenfeld B, Bar-Haim R, Fresko M (2011) The stock sonarłsentiment analysis of stocks based on a hybrid approach. In: 23rd IAAI Conference, 2011
Feng G, Huang G-B, Lin Q, Gay RKL (2009) Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Trans Neural Netw 20(8):1352–1357
Handoko SD, Keong KC, Soon OY, Zhang GL, Brusic V (2006) Extreme learning machine for predicting hla-peptide binding. In: Advances in neural networks-ISNN. Springer, 2006, pp 716–721
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Huang G-B, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70(16):3056–3062
Hung J-C (2015) Robust kalman filter based on a fuzzy garch model to forecast volatility using particle swarm optimization. Soft Comput 19(10):2861–2869
Ku L-W, Liang Y-T, Chen H-H (2006) Opinion extraction, summarization and tracking in news and blog corpora. In: Proceeding of AAAI, 2006
Li J, Fong S, Zhuang Y, Khoury R (2015) Hierarchical classification in text mining for sentiment analysis of online news. Soft Comput 2015:1–10
Li X, Wang C, Dong J, Wang F, Deng X, Zhu S (2011) Improving stock market prediction by integrating both market news and stock prices. In: Database and expert systems applications, Springer, 2011, pp 279–293
Martinez LC, da Hora DN, de Palotti JRM, Meira W, Pappa GL (2009) From an artificial neural network to a stock market day-trading system: a case study on the bm&f bovespa. In: Proceedings of the international joint conference on neural networks (IJCNN’09), IEEE, 2009, pp 2006–2013
Nguyen NN, Quek C (2010) Stock price prediction using generic self-evolving takagi–sugeno–kang (gsetsk) fuzzy neural network. In: Proceedings of the international joint conference on neural networks (IJCNN’10), IEEE, 2010, pp 1–8
O’Connor B, Balasubramanyan R, Routledge BR, Smith NA (2010) From tweets to polls: Linking text sentiment to public opinion time series. ICWSM 11:122–129
Paik JH (2013) A novel tf-idf weighting scheme for effective ranking. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. ACM, 2013, pp 343–352
Ramos J (2003) Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning
Rong H-J, Huang G-B, Ong Y-S (2008) Extreme learning machine for multi-categories classification applications. In: Proceedings of the international joint conference on neural networks (IJCNN’08), 2008, pp 1709–1713
Ruiz EJ, Hristidis V, Castillo C, Gionis A, Jaimes A (2012) Correlating financial time series with micro-blogging activity. In: Proceedings of the fifth ACM international conference on Web search and data mining, ACM, 2012, pp 513–522
Saraswathi S, Sundaram S, Sundararajan N, Zimmermann M, Nilsen-Hamilton M (2011) Icga-pso-elm approach for accurate multiclass cancer classification resulting in reduced gene sets in which genes encoding secreted proteins are highly represented. Computational biology and bioinformatics. IEEE/ACM Trans 8(2):452–463
Schumaker RP, Chen H (2006) Textual analysis of stock market prediction using financial news. In: Americas conference on information systems, 2006
Schumaker RP, Chen H (2009) Textual analysis of stock market prediction using breaking financial news: the azfin text system. ACM Trans Inf Syst (TOIS) 27(2):12
Si J, Mukherjee A, Liu B, Li Q, Li H, Deng X (2013) Exploiting topic based twitter sentiment for stock prediction. In: ACL (2), 2013, pp 24–29
Sun Y, Yuan Y, Wang G (2011) An os-elm based distributed ensemble classification framework in p2p networks. Neurocomputing 74(16):2438–2443
Tang J, Wang D, Chai T (2012) Predicting mill load using partial least squares and extreme learning machines. Soft Comput 16(9):1585–1594
Ticknor JL (2013) A bayesian regularized artificial neural network for stock market forecasting. Expert Syst Appl 40(14):5501–5506
Turney PD, Littman ML (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21(4):315–346
Wang R, Kwong S, Wang X (2012) A study on random weights between input and hidden layers in extreme learning machine. Soft Comput 16(9):1465–1475
Wu HC, Luk RWP, Wong KF, Kwok KL (2008) Interpreting tf-idf term weights as making relevance decisions. ACM Trans Inf Syst (TOIS) 26(3):13
Wu Q, Tan S, Cheng X (2009) Graph ranking for sentiment transfer. In: Proceedings of the ACL-IJCNLP 2009 conference short papers. Association for computational linguistics, 2009, pp 317–320
Zhang R, Xu Z-B, Huang G-B, Wang D (2012) Global convergence of online bp training with dynamic learning rate. IEEE Trans Neural Netw Learn Syst 23(2):330–341
Acknowledgments
This work is supported by National Nature Science Foundation of China (Grant Nos. 51277135, 50707021, 61103125, 61373038, and 61573157); the Doctoral Fund of Ministry of Education of China (Grant No. 20100141120046); the Natural Science Foundation of Hubei Province of China (Grant No. 2010CDB08504); the 111 Programme of Introducing Talents of Discipline to Universities (Grant No. B07037); Wuhan University Academic Development Plan for Scholars after 1970s [(“Research on Internet User Behavior”; The National High Resolution Earth Observation System (the Civil Part) Technology Projects of China; the Fund of Natural Science Foundation of Guangdong Province of China with the Grant No. 2014A030313454 and Special Funds for Projects of Basic Research and Operational Costs of the Central Universities].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Author Feng Wang declares that she has no conflict of interest. Author Yongquan Zhang declares that he has no conflict of interest. Author Qi Rao declares that he has no conflict of interest. Author Kangshun Li declares that he has no conflict of interest. Author Hao Zhang declares that he has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Wang, F., Zhang, Y., Rao, Q. et al. Exploring mutual information-based sentimental analysis with kernel-based extreme learning machine for stock prediction. Soft Comput 21, 3193–3205 (2017). https://doi.org/10.1007/s00500-015-2003-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-015-2003-z