Abstract
Stock market prediction with data mining techniques is one of the most important issues to be investigated. In this paper, we present a system that predicts the changes of stock trend by analyzing the influence of non-quantifiableinformation (news articles). In particular, we investigate the immediate impact of news articles on the time series based on the Efficient Markets Hypothesis. Several data mining and text mining techniques are used in a novel way. A new statistical based piecewise segmentation algorithm is proposed to identify trends on the time series. The segmented trends are clustered into two categories, Rise and Drop, according to the slope of trends and the coefficient of determination. We propose an algorithm, which is called guided clustering, to filter news articles with the help of the clusters that we have obtained from trends. We also propose a new differentiated weighting scheme that assigns higher weights to the features if they occur in the Rise (Drop) news-article cluster but do not occur in its opposite Drop (Rise).
For example, the same article may align to more than one type of trend.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. B. Achelis. Technical Analysis from A to Z. Irwin Professional Publishing, Chicago, 2nd edition, 1995.
P. A. Adler and P. Adler. The Social Dynamics of Financial Markets. Jai Press Inc., 1984.
W. J. Eiteman, C. A. Dice and D. K. Eiteman. The Stock Market. McDGraw-Hill Book Company, 4th edition, 1966.
C. Faloutsos, M. Rangantathan and Y. Manalopoulos. Fast Subsequence Matching in Time-Series Database. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 419–429, Minneapolis, May 1994.
T. Fawcett and F. Provost. Activity Monitoring: Noticing Interesting Changes in Behavior. In Proceedings of the 5th International Conference on KDD, San Diego, California, 1999.
T. Hellstrom and K. Holmstrom. Predicting the Stock Market. Technical Report Series IMa-TOM-1997-07, 1998.
J. D. Holt and S. M. Chung. Efficient Mining of Association Rules in Text Databases. In Proceedings of the 8th International Conference on Information Knowledge Management, 234–242, ACM Press, 1999.
T. Joachims. Making large-Scale SVM Learning Practical. Advances in Kernel Methods-Support Vector Learning. B. Sholkopf and C. Burges and A. Smola, MIT-Press, 1999.
T. Joachims. Text Categorization with Support Vector Machines: Learning with many relevant features. In Proceedings of the European Conference on Machine Learning, Springer, 1998.
E. Keogh and P. Smyth. A Probabilistic Approach to Fast Pattern Matching in Time Series Databases. In Proceedings of the 3rd International Conference of KDD, 24–40, AAAl Press, 1997.
L. Kaufman and P. J. Rousseeuw. Finding Groups in Data-An Introduction to Cluster Analysis. John Wiley & Sons, Inc., 1990.
B. Larsen and C. Aone. Fast and Effective Text Mining Using Linear-time Document Clustering. In Proceedings of the 5th International Conference on KDD, San Diego, California, 1999.
V. Lavrenko, M. Schmill, D. Lawire, P. Ogilvie, D. Jensen and J. Allan. Mining of Concurrent Text and Time Series, In Proceedings of the 6th International Conference on KDD, Boston, MA, 2000.
W. Mendenhall and T. Sincich. A Second Course in Business Statistics: Regression Analysis. Dellen Publishing Company, 1989.
D. C. Montgomery and G. C. Runger. Applied Statistics and Probability for Engineers. John Wiley & Sons, Inc., 2nd edition, 1999.
T. Pavlidis and S. L. Horowitz. Segmentation of Plan Curves. IEEE Transactions on Computers, Vol. c-23, No. 8, August 1974.
C. Pratten. The Stock Market. Cambridge University Press, 1993.
C. J. vanRijsbergen. A Theoretical Basis for the use of Co-occurance Data in Information Retrieval. Journal of Documentation, 33:106–119, 1977.
P. Smyth. Hidden Markov Models for Fault Detection in Dynamic Systems. Pattern Recognition, 27(1), 149–164, 1994.
M. Steinbach, G. Karypis and V. Kumar. A Comparison of Document Clustering Techniques. Technical Report, 2000.
T. Takenobu and I. Makoto. Text Categorization Based on Weighted Inverse Document Frequency. Technical Report, ISSN 0918-2802, 1994.
V. N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
Y. Yang and X. Liu. A Re-examination of Text Categorization Methods. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 42–49, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fung, G.P.C., Yu, J.X., Lam, W. (2002). News Sensitive Stock Trend Prediction. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_48
Download citation
DOI: https://doi.org/10.1007/3-540-47887-6_48
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43704-8
Online ISBN: 978-3-540-47887-4
eBook Packages: Springer Book Archive