Mining stock price using fuzzy rough set system

https://doi.org/10.1016/S0957-4174(02)00079-9Get rights and content

Abstract

In this study of mining stock price data, we attempt to predict the stronger rules of stock prices. To address this problem, we proposed an effective method, a fuzzy rough set system to predict a stock price at any given time. Our system has two agents: one is a visual display agent that helps stock dealers monitor the current price of a stock and the other is a mining agent that helps stock dealers make decisions about when to buy or sell stocks. To demonstrate that our system is effective, we used it to predict the stronger rules of stock price and achieved at least 93% accuracy after 180 trials.

Introduction

The most glamorous of all the financial markets is the stock market, which never fails to conger up images of high rollers frantically buying and selling stock and making millions in return. Unfortunately, the reality of the nature of the market paints a less optimistic picture. The stock market is essentially a non-linear, non-parametric system that is, ipso facto, extremely hard to model with any reasonable accuracy. Although people have come up with many methods do try and do so, traditionally the best performers have been the speculators who use their considerable knowledge of the markets to predict the next trend. They are, though, only human and are very limited in their capacity to assimilate information and spot subtle trends in the information, which may be the indicators of an impending change in the value of market stock. Therefore, people have been trying for years to find a more efficient way of modeling the behavior of the financial markets, and in recent times interest has turned to the use of neural networks for this task but had less than successful results. This is our motivation of study in mining stock price from a given database using fuzzy rough set method.

Database technology has been used successfully in many applications. Today, the grand challenge of using a database is to generate useful rules from raw data in a database for users to make decisions, and these rules may be hidden deeply in the raw data of the database. Traditionally, the method of turning data into knowledge relies on manual analysis and interpretation, and the manual analysis is becoming impractical in many domains as data volumes grow exponentially. Therefore, to solve the problem of knowledge extraction from a database, different data mining approaches and systems have been proposed, and readers can refer to excellent works (Bansal et al., 1998, Chen et al., 1996, Dhar et al., 2000, Fayyad et al., 1996, Giudici et al., 2001, Glymour et al., 1997, Hernández and Stolfo, 1998, Matheus et al., 1993, Provost and Kolluri, 1999, Westerdijk et al., 2001) on this subject.

The original rough set model as introduced by Pawlak (1991) is concerned with the analysis of deterministic data dependencies. The rough set theory has attracted an increasing amount of attention in computer communities since it has been successful in many applications (Lin and Cercone, 1997, Grzymala-Busse et al., 1995, Mrozek and Plonka, 1998, Ziarko, 1994, Ziarko, 1999). In its formalism it does not recognize the presence or absence of non-deterministic relationships, i.e. the ones that may lead to predictive rules with probabilities less than one. In some data sets, however, the available information is not sufficient to produce strong deterministic rules but it may be quite possible to identify strong non-deterministic rules with estimates of decision probabilities. To find strong rules, we define a matching degree. According to the matching degree we can find the strong rules easily (Chiang, Lin, & Shis, 1998).

In this paper, the approach to mining agent in our system is combined the fuzzy linguistic approach and rough set theory. The fuzzy linguistic approach in a fuzzy relational database is not only used so that we can handle imprecise queries, but also to summarize data easily. Several researchers have proposed a series of excellent works about the fuzzy linguistic queries in a fuzzy relational database (Bosc et al., 1988, Bosc and Pivert, 1995, Chiang et al., 2000, Kacprzyk and Ziolkowski, 1986, Kacprzyk et al., 1989, Medina et al., 1994, Medina et al., 1995, Tahani, 1977, Zadeh, 1984). To find useful knowledge from data by the summarization, Yager proposed an approach to the summarization of data based on the theory of fuzzy sets (Yager, 1991). Cubero's summarization is through fuzzy dependencies (Cubero, Medina, Pons, & Vila, 1999). Yager's approach is useful for both numeric and non-numeric data. It summarizes data with three values: a summarizer, a quantity in agreement, and a truth value. In Kacprzyk and Ziolkowski, 1986, Kacprzyk et al., 1989, Kacprzyk and Iwanski, 1992, a summarizer is also called a property, and a quantity in agreement is also called a quantifier.

Similar to Yager's approach, our mining agent also developed by the theory of fuzzy set. However, unlike Yager's approach, our method, in some cases, when there is no unique property for characterizing a group of objects, can find out a disjunctive property for characterizing a group of objects. For example, we may conclude ‘most northern Chinese like pasta or dumplings’ but not ‘most northern Chinese like pasta’. Therefore, the property we find is a disjunctive property, F=F1∨⋯∨Fm, rather than a single property, Fi, where the length of F is denoted as |F|. However, the disjunctive property, F may be too general to characterize a group of objects because of losing valuable information; therefore, we have to restrict the length of the property, F, when it is too general.

Since it is difficult to predicate what exactly could be discovered from a database and the mining process is interactive and iterative, it is necessary to include the human in the mining process (Chen et al., 1996, Fayyad et al., 1996). Therefore, our system supports a visual display agent to help users to premine the database. With the results of the premining process, users can interact with the system so that the mining agent can automatically determine the property, which most objects in a group have from the predefined properties. To demonstrate our system works correctly, we consider the stock price analysis problem over 300 weekday-trading hours in this paper.

The remaining of the paper is organized as follows. In Section 2, the theoretical basis of fuzzy logic and rough set is presented. In Section 3, the visual display agent and the mining agent of our system are presented. Section 4 is the experimental results. Conclusions and future research are drawn up in Section 5.

The problem with predicting stock prices is that the volume of data is too huge to influence the ability of using information (Fayyad et al., 1996, Widom, 1995). Analyzing stock price data over a several years may just involve a few thousand records, but these must be selected from millions. A stockbroker who serves tens of thousands of customers each year may generate up to 170GB of stock data at any given time. Multiyear trend analysis of the stock price thus still presents a problem due to the vast amount of data involved. It is therefore important to devise efficient methods to analyze and predict stock prices. For this reason, we constructed a data mart (Demarest, 1994), a relational database, to clean and reduce the size of the stock data so only the useful data is downloaded and reformatted into the data mart (Liu & Setiono, 1996). The advantage of using reformatted data is that reformatted data is more easily understood and used by users, as shown in Table 1. Since our database stores the reformatted data supported, the minimum time interval is 5 min.

The other problem is to apply knowledge discovery (KD) techniques to identify strong predictive rules from stock and economic data, which are a true indication of what happened during a certain time period in the stock market. By strong rules, we mean rules reflecting highly repetitive patterns occurring in data. They are not supposed to be necessarily precise or deterministic; they may be associated with fractional probabilities of the predicted outcomes. They should be, however, correct or almost correct reflecting real relationships occurring in the economic system. The initial research results on discovery and analysis of strong rules have been reported by Piatesky-Shapiro (1989). The strength of such rules can be measured in terms of data objects (data records) satisfying rule conditions. The strong rules are potentially interesting data patterns and likely generally true regularities. Ideally, if the collected data is a representative and random subset of all feasible combinations of market indicators then it can be proven using probability theory that the stronger the rule the higher the likelihood that the rule represents a true fact, or a close approximation of a true fact about the domain of interest. In this paper, we define a matching degree and measure of matching strengths. According to the matching degree, we can discover the strong rules.

Another problem with predicting stock prices is that price differences can vary greatly over 2 h. For instance, it is often difficult to classify patients as fully sick (Kacprzyk & Iwanski, 1992). Therefore, crisp data mining approaches may not be appropriate for these situations. To solve this problem, we employ fuzzification and roughness techniques into our system to predict the stock price next hour. Using our system not only enables the user to know the stock price in any given hour but also follow stock price trends.

Investors have been trying to find a way to predict stock prices accurately, but have had less than successful results (Haefke & Helmenstein, 2000). In Kuo, Chen, and Hwang (2001) showed that the numerous studies addressing stock price prediction have generally employed the time series analysis techniques (Kendall & Ord, 1990) and multiple regression models. Recently, artificial intelligence techniques like artificial neural networks (ANNs) and genetic algorithms (GAs) have been applied to this area however the above-mentioned concern still exists (Baba and Kozaki, 1992, Mahfoud and Mani, 1996). In Kim and Han (2000) ANNs had some limitations in learning the patterns because stock price data has tremendous noise and complex dimensionality. Moreover, the sheer quantity of stock data sometimes interferes with the learning of patterns.

In Kuo et al. (2001) it is pointed out that numerous factors such as macro-economical and political events can have a major influence on stock prices. The timing of buying/selling stock is based on determining the best time to buy and sell stocks given the constant fluctuation of stock prices. However, humans have a difficult time doing this because of the complexity of the stock market as shown in Lee and Jo (1999).

Section snippets

Preliminaries

In this section, we briefly introduce some preliminary results and definitions that are useful for later discussion.

A fuzzy rough set system

Our system has been written with Visual BASIC on an IBM PC and includes two major modules: visual display agent and mining agent. In the proposed system, users can check the current price of the stock through the visual display agent and buy/sell the stock according to mining results provided by mining agent. Since the mining process is an application oriented process, different applications may need different data mining approaches. In this paper, we introduce only the fuzzy rough set method

Experimental results

In this paper, all the data was collected from a stockbroker's mainframe in Taiwan stock market recorded every 5 min. The trading hours in Taiwan stock market are from 9:00 a.m. to 13:30 p.m. without lunch break on weekdays. To simplify data processing, the data was grouped into hours assuming the trading time is from 9:00 a.m. to 14:00 p.m. on weekdays.

To demonstrate how effective our system works, we used data from January, 2001 to December, 2001 as training examples, and data from January,

Conclusion and future research

In the proposed system, the visual display agent can assist users in premining the history data or checking the current situation of the stock price. The mining agent can be used to predict the ranks of specific stock price promptly. In the proposed system, without considering the visual display agent, the system can also handle fuzzy series from the beginning.

Currently, most parameters are defaulted and users cannot insert rules into the system; therefore, we plan to develop a more flexible

Acknowledgements

This work was supported by the National Science Council of the Republic of China under Grant No. 91-2416-4-255-001.

References (52)

  • V. Tahani

    A conceptual framework for fuzzy query processing: a step toward vary intelligent database systems

    Information Processing and Management

    (1977)
  • R.R. Yager

    General multiple-objective decision functions and linquistically quantified statements

    International Journal of Man–Machine Studies

    (1984)
  • L.A. Zadeh

    Fuzzy sets

    Information and Control

    (1965)
  • L.A. Zadeh

    Fuzzy sets as a basis for theory of possibility

    Fuzzy Sets and Systems

    (1978)
  • L.A. Zadeh

    Fuzzy logic and the calculi of fuzzy rules, fuzzy graphs, and fuzzy probabilities

    Computers and Mathematics with Applications

    (1999)
  • Baba, N., & Kozaki, M. (1992). An intelligent forecasting system of stock price using neural networks. Proceedings of...
  • K. Bansal et al.

    Brief application description. Neural networks based forecasting techniques for inventory control applications

    Data Mining and Knowledge Discovery

    (1998)
  • P. Bosc et al.

    SQLf: a relational database language for fuzzy querying

    IEEE Transactions on Fuzzy Systems

    (1995)
  • B.P. Buckles et al.

    Information-theoretical characterization of fuzzy relational databases

    IEEE Transactions on System, Man, Cybernetics

    (1983)
  • Y. Cai et al.

    Attribute-oriented induction in relational databases

  • M.-S. Chen et al.

    Data mining: an overview from a database perspective

    IEEE Transactions on Knowledge Data and Engineering

    (1996)
  • D.A. Chiang et al.

    Matching strengths of answers in fuzzy relational database

    IEEE Transactions on System, Man, Cybernetics

    (1998)
  • M. Demarest

    Building the data mart

    DBMS Magazine

    (1994)
  • V. Dhar et al.

    Discovering interesting patterns for investment decision making with GLOWER ◯ genetic learner overlaid with entropy reduction

    Data Mining and Knowledge Discovery

    (2000)
  • D. Dubois et al.

    Rough sets and fuzzy rough sets

    International Journal of General Systems

    (1990)
  • U. Fayyad et al.

    The KDD process for extracting useful knowledge from volumes of data

    Communications of ACM

    (1996)
  • Cited by (157)

    View all citing articles on Scopus
    View full text