Mining stock price using fuzzy rough set system

doi:10.1016/S0957-4174(02)00079-9

Expert Systems with Applications

Volume 24, Issue 1, January 2003, Pages 13-23

https://doi.org/10.1016/S0957-4174(02)00079-9 Get rights and content

Abstract

In this study of mining stock price data, we attempt to predict the stronger rules of stock prices. To address this problem, we proposed an effective method, a fuzzy rough set system to predict a stock price at any given time. Our system has two agents: one is a visual display agent that helps stock dealers monitor the current price of a stock and the other is a mining agent that helps stock dealers make decisions about when to buy or sell stocks. To demonstrate that our system is effective, we used it to predict the stronger rules of stock price and achieved at least 93% accuracy after 180 trials.

Introduction

The most glamorous of all the financial markets is the stock market, which never fails to conger up images of high rollers frantically buying and selling stock and making millions in return. Unfortunately, the reality of the nature of the market paints a less optimistic picture. The stock market is essentially a non-linear, non-parametric system that is, ipso facto, extremely hard to model with any reasonable accuracy. Although people have come up with many methods do try and do so, traditionally the best performers have been the speculators who use their considerable knowledge of the markets to predict the next trend. They are, though, only human and are very limited in their capacity to assimilate information and spot subtle trends in the information, which may be the indicators of an impending change in the value of market stock. Therefore, people have been trying for years to find a more efficient way of modeling the behavior of the financial markets, and in recent times interest has turned to the use of neural networks for this task but had less than successful results. This is our motivation of study in mining stock price from a given database using fuzzy rough set method.

Database technology has been used successfully in many applications. Today, the grand challenge of using a database is to generate useful rules from raw data in a database for users to make decisions, and these rules may be hidden deeply in the raw data of the database. Traditionally, the method of turning data into knowledge relies on manual analysis and interpretation, and the manual analysis is becoming impractical in many domains as data volumes grow exponentially. Therefore, to solve the problem of knowledge extraction from a database, different data mining approaches and systems have been proposed, and readers can refer to excellent works (Bansal et al., 1998, Chen et al., 1996, Dhar et al., 2000, Fayyad et al., 1996, Giudici et al., 2001, Glymour et al., 1997, Hernández and Stolfo, 1998, Matheus et al., 1993, Provost and Kolluri, 1999, Westerdijk et al., 2001) on this subject.

The original rough set model as introduced by Pawlak (1991) is concerned with the analysis of deterministic data dependencies. The rough set theory has attracted an increasing amount of attention in computer communities since it has been successful in many applications (Lin and Cercone, 1997, Grzymala-Busse et al., 1995, Mrozek and Plonka, 1998, Ziarko, 1994, Ziarko, 1999). In its formalism it does not recognize the presence or absence of non-deterministic relationships, i.e. the ones that may lead to predictive rules with probabilities less than one. In some data sets, however, the available information is not sufficient to produce strong deterministic rules but it may be quite possible to identify strong non-deterministic rules with estimates of decision probabilities. To find strong rules, we define a matching degree. According to the matching degree we can find the strong rules easily (Chiang, Lin, & Shis, 1998).

In this paper, the approach to mining agent in our system is combined the fuzzy linguistic approach and rough set theory. The fuzzy linguistic approach in a fuzzy relational database is not only used so that we can handle imprecise queries, but also to summarize data easily. Several researchers have proposed a series of excellent works about the fuzzy linguistic queries in a fuzzy relational database (Bosc et al., 1988, Bosc and Pivert, 1995, Chiang et al., 2000, Kacprzyk and Ziolkowski, 1986, Kacprzyk et al., 1989, Medina et al., 1994, Medina et al., 1995, Tahani, 1977, Zadeh, 1984). To find useful knowledge from data by the summarization, Yager proposed an approach to the summarization of data based on the theory of fuzzy sets (Yager, 1991). Cubero's summarization is through fuzzy dependencies (Cubero, Medina, Pons, & Vila, 1999). Yager's approach is useful for both numeric and non-numeric data. It summarizes data with three values: a summarizer, a quantity in agreement, and a truth value. In Kacprzyk and Ziolkowski, 1986, Kacprzyk et al., 1989, Kacprzyk and Iwanski, 1992, a summarizer is also called a property, and a quantity in agreement is also called a quantifier.

Similar to Yager's approach, our mining agent also developed by the theory of fuzzy set. However, unlike Yager's approach, our method, in some cases, when there is no unique property for characterizing a group of objects, can find out a disjunctive property for characterizing a group of objects. For example, we may conclude ‘most northern Chinese like pasta or dumplings’ but not ‘most northern Chinese like pasta’. Therefore, the property we find is a disjunctive property, F=F₁∨⋯∨F_m, rather than a single property, F_i, where the length of F is denoted as |F|. However, the disjunctive property, F may be too general to characterize a group of objects because of losing valuable information; therefore, we have to restrict the length of the property, F, when it is too general.

Since it is difficult to predicate what exactly could be discovered from a database and the mining process is interactive and iterative, it is necessary to include the human in the mining process (Chen et al., 1996, Fayyad et al., 1996). Therefore, our system supports a visual display agent to help users to premine the database. With the results of the premining process, users can interact with the system so that the mining agent can automatically determine the property, which most objects in a group have from the predefined properties. To demonstrate our system works correctly, we consider the stock price analysis problem over 300 weekday-trading hours in this paper.

The remaining of the paper is organized as follows. In Section 2, the theoretical basis of fuzzy logic and rough set is presented. In Section 3, the visual display agent and the mining agent of our system are presented. Section 4 is the experimental results. Conclusions and future research are drawn up in Section 5.

The problem with predicting stock prices is that the volume of data is too huge to influence the ability of using information (Fayyad et al., 1996, Widom, 1995). Analyzing stock price data over a several years may just involve a few thousand records, but these must be selected from millions. A stockbroker who serves tens of thousands of customers each year may generate up to 170GB of stock data at any given time. Multiyear trend analysis of the stock price thus still presents a problem due to the vast amount of data involved. It is therefore important to devise efficient methods to analyze and predict stock prices. For this reason, we constructed a data mart (Demarest, 1994), a relational database, to clean and reduce the size of the stock data so only the useful data is downloaded and reformatted into the data mart (Liu & Setiono, 1996). The advantage of using reformatted data is that reformatted data is more easily understood and used by users, as shown in Table 1. Since our database stores the reformatted data supported, the minimum time interval is 5 min.

The other problem is to apply knowledge discovery (KD) techniques to identify strong predictive rules from stock and economic data, which are a true indication of what happened during a certain time period in the stock market. By strong rules, we mean rules reflecting highly repetitive patterns occurring in data. They are not supposed to be necessarily precise or deterministic; they may be associated with fractional probabilities of the predicted outcomes. They should be, however, correct or almost correct reflecting real relationships occurring in the economic system. The initial research results on discovery and analysis of strong rules have been reported by Piatesky-Shapiro (1989). The strength of such rules can be measured in terms of data objects (data records) satisfying rule conditions. The strong rules are potentially interesting data patterns and likely generally true regularities. Ideally, if the collected data is a representative and random subset of all feasible combinations of market indicators then it can be proven using probability theory that the stronger the rule the higher the likelihood that the rule represents a true fact, or a close approximation of a true fact about the domain of interest. In this paper, we define a matching degree and measure of matching strengths. According to the matching degree, we can discover the strong rules.

Another problem with predicting stock prices is that price differences can vary greatly over 2 h. For instance, it is often difficult to classify patients as fully sick (Kacprzyk & Iwanski, 1992). Therefore, crisp data mining approaches may not be appropriate for these situations. To solve this problem, we employ fuzzification and roughness techniques into our system to predict the stock price next hour. Using our system not only enables the user to know the stock price in any given hour but also follow stock price trends.

Investors have been trying to find a way to predict stock prices accurately, but have had less than successful results (Haefke & Helmenstein, 2000). In Kuo, Chen, and Hwang (2001) showed that the numerous studies addressing stock price prediction have generally employed the time series analysis techniques (Kendall & Ord, 1990) and multiple regression models. Recently, artificial intelligence techniques like artificial neural networks (ANNs) and genetic algorithms (GAs) have been applied to this area however the above-mentioned concern still exists (Baba and Kozaki, 1992, Mahfoud and Mani, 1996). In Kim and Han (2000) ANNs had some limitations in learning the patterns because stock price data has tremendous noise and complex dimensionality. Moreover, the sheer quantity of stock data sometimes interferes with the learning of patterns.

In Kuo et al. (2001) it is pointed out that numerous factors such as macro-economical and political events can have a major influence on stock prices. The timing of buying/selling stock is based on determining the best time to buy and sell stocks given the constant fluctuation of stock prices. However, humans have a difficult time doing this because of the complexity of the stock market as shown in Lee and Jo (1999).

Section snippets

Preliminaries

In this section, we briefly introduce some preliminary results and definitions that are useful for later discussion.

A fuzzy rough set system

Our system has been written with Visual BASIC on an IBM PC and includes two major modules: visual display agent and mining agent. In the proposed system, users can check the current price of the stock through the visual display agent and buy/sell the stock according to mining results provided by mining agent. Since the mining process is an application oriented process, different applications may need different data mining approaches. In this paper, we introduce only the fuzzy rough set method

Experimental results

In this paper, all the data was collected from a stockbroker's mainframe in Taiwan stock market recorded every 5 min. The trading hours in Taiwan stock market are from 9:00 a.m. to 13:30 p.m. without lunch break on weekdays. To simplify data processing, the data was grouped into hours assuming the trading time is from 9:00 a.m. to 14:00 p.m. on weekdays.

To demonstrate how effective our system works, we used data from January, 2001 to December, 2001 as training examples, and data from January,

Conclusion and future research

In the proposed system, the visual display agent can assist users in premining the history data or checking the current situation of the stock price. The mining agent can be used to predict the ranks of specific stock price promptly. In the proposed system, without considering the visual display agent, the system can also handle fuzzy series from the beginning.

Currently, most parameters are defaulted and users cannot insert rules into the system; therefore, we plan to develop a more flexible

Acknowledgements

This work was supported by the National Science Council of the Republic of China under Grant No. 91-2416-4-255-001.

References (52)

P. Bosc et al.
Fuzzy query with SQL: extensions and implementation aspects
Fuzzy Sets and Systems
(1988)
D.A. Chiang et al.
Mining time series data by a fuzzy linguistic summary system
Fuzzy Sets and Systems
(2000)
J.C. Cubero et al.
Data summarization in relational databases through fuzzy dependencies
Information Sciences
(1999)
J. Kacprzyk et al.
FQUERY III⁺: a human-consistent databases querying system based on fuzzy logic with fuzzy linguistic quantifiers
Information Systems
(1989)
K.J. Kim et al.
Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index
Expert Systems with Applications
(2000)
R.J. Kuo et al.
An intelligent stock trading decision support system through integration of genetic algorithm based fuzzy neural network and artificial neural network
Fuzzy Sets and Systems
(2001)
K.H. Lee et al.
Expert system for predicting stock market timing using a candlestick chart
Expert Systems with Applications
(1999)
H. Liu et al.
Dimensionality reduction via discretization
Knowledge-Based Systems
(1996)
J.M. Medina et al.
Toward the implementation of a generalized fuzzy relational database model
Fuzzy Sets and Systems
(1995)
J.M. Medina et al.
GEFRED: a generalized model of fuzzy relational databases
Information Science
(1994)

V. Tahani

A conceptual framework for fuzzy query processing: a step toward vary intelligent database systems

Information Processing and Management

(1977)

R.R. Yager

General multiple-objective decision functions and linquistically quantified statements

International Journal of Man–Machine Studies

(1984)

L.A. Zadeh

Fuzzy sets

Information and Control

(1965)

L.A. Zadeh

Fuzzy sets as a basis for theory of possibility

Fuzzy Sets and Systems

(1978)

L.A. Zadeh

Fuzzy logic and the calculi of fuzzy rules, fuzzy graphs, and fuzzy probabilities

Computers and Mathematics with Applications

(1999)

Baba, N., & Kozaki, M. (1992). An intelligent forecasting system of stock price using neural networks. Proceedings of...

K. Bansal et al.

Brief application description. Neural networks based forecasting techniques for inventory control applications

Data Mining and Knowledge Discovery

(1998)

P. Bosc et al.

SQLf: a relational database language for fuzzy querying

IEEE Transactions on Fuzzy Systems

(1995)

B.P. Buckles et al.

Information-theoretical characterization of fuzzy relational databases

IEEE Transactions on System, Man, Cybernetics

(1983)

Y. Cai et al.

Attribute-oriented induction in relational databases

M.-S. Chen et al.

Data mining: an overview from a database perspective

IEEE Transactions on Knowledge Data and Engineering

(1996)

D.A. Chiang et al.

Matching strengths of answers in fuzzy relational database

IEEE Transactions on System, Man, Cybernetics

(1998)

M. Demarest

Building the data mart

DBMS Magazine

(1994)

V. Dhar et al.

Discovering interesting patterns for investment decision making with GLOWER &xcirc; genetic learner overlaid with entropy reduction

Data Mining and Knowledge Discovery

(2000)

D. Dubois et al.

Rough sets and fuzzy rough sets

International Journal of General Systems

(1990)

U. Fayyad et al.

The KDD process for extracting useful knowledge from volumes of data

Communications of ACM

(1996)

Cited by (157)

A lower approximation based integrated decision analysis framework for a blockchain-based supply chain
2023, Computers and Industrial Engineering
Organizations and their supply chains generate vast amounts of structured and unstructured data, popularly known as Big Data. This paper proposes a framework in the prescriptive analytics category of big data analytics (BDA) and is especially relevant for organizations on a permissioned blockchain network. The framework integrates the concepts of Artificial intelligence (Case-based reasoning, Interval-valued rough fuzzy set theory, and Value closeness relation algorithm) and blockchain to help decision-makers make accountable and conformable decisions. The framework comprises five stages - the first three stages generate the decisions, while the last two stages enable validation and communication of decisions. Its application is demonstrated on an organizational dataset with 20 supply chain KPIs. The numerical example shows how the organization's business strategy can translate to framework parameters resulting in varying outputs. The proposed framework is compared with the rough set approach through sensitivity analysis, highlighting the differences. The concept of L-Graphs is proposed to facilitate the visual display of prescribing decisions, which can help decision-makers make optimal decision choices. The framework has several unique features: (i) it is one of the first decision-making frameworks proposed for a blockchain-based supply chain that can be a part of the BDA toolbox of an organization; (ii) it uses concepts of artificial intelligence; (iii) the optimal decisions are proposed based on analysis of multicriteria; (iv) it uses multiple approaches to validate the decision rules on blockchain.
A graph approach for fuzzy-rough feature selection
2020, Fuzzy Sets and Systems
Rough sets, especially fuzzy-rough sets, have proven to be a powerful tool for dealing with vagueness and uncertainty in data analysis. Fuzzy-rough feature selection has been shown to be highly useful in data dimensionality reduction. However, many fuzzy-rough feature selection algorithms are still time-consuming when dealing with the large-scale data sets. In this paper, the problem of feature selection in fuzzy-rough sets is studied in the framework of graph theory. We propose a new mechanism for fuzzy-rough feature selection. It is shown that finding the attribute reduction of a fuzzy decision system can be translated into finding the transversal of a derivative hypergraph. Based on the graph-representation model, a novel graph-theoretic algorithm for fuzzy-rough feature selection is proposed. The performance of the proposed method is compared with those of the state-of-the-art methods on various classification tasks. Experimental results show that the proposed technique outperforms all other known feature selection methods in terms of the computation time. Especially for the large-scale data sets, it demonstrates promising performance. Moreover, our proposed method can achieve better classification accuracies with the usage of small number of features.
A novel intelligent option price forecasting and trading system by multiple kernel adaptive filters
2020, Journal of Computational and Applied Mathematics
Derivatives such as options are complex financial instruments. The risk in option trading leads to the demand of trading support systems for investors to control and hedge their risk. The nonlinearity and non-stationarity of option dynamics are the main challenge of option price forecasting. To address the problem, this study develops a multi-kernel adaptive filters (MKAF) for online option trading. MKAF is an improved version of the adaptive filter, which employs multiple kernels to enhance the richness of nonlinear feature representation. The MKAF is a fully adaptive online algorithm. The strength of MKAF is that the weights to the kernels are simultaneous optimally determined in filter coefficient updates. We do not need to design the weights separately. Therefore, MKAF is good at tracking nonstationary nonlinear option dynamics. Moreover, to reduce the computation time in updating the filter, and prevent overadaptation, the number of kernels is restricted by using coherence-based sparsification, which constructs a set of dictionary and uses a coherence threshold to restrict the dictionary size. This study compared the new method with traditional ones, we found the performance improvement is significant and robust. Especially, the cumulated trading profits are substantially increased.
Literature review: Machine learning techniques applied to financial market prediction
2019, Expert Systems with Applications
The search for models to predict the prices of financial markets is still a highly researched topic, despite major related challenges. The prices of financial assets are non-linear, dynamic, and chaotic; thus, they are financial time series that are difficult to predict. Among the latest techniques, machine learning models are some of the most researched, given their capabilities for recognizing complex patterns in various applications. With the high productivity in the machine learning area applied to the prediction of financial market prices, objective methods are required for a consistent analysis of the most relevant bibliography on the subject. This article proposes the use of bibliographic survey techniques that highlight the most important texts for an area of research. Specifically, these techniques are applied to the literature about machine learning for predicting financial market values, resulting in a bibliographical review of the most important studies about this topic. Fifty-seven texts were reviewed, and a classification was proposed for markets, assets, methods, and variables. Among the main results, of particular note is the greater number of studies that use data from the North American market. The most commonly used models for prediction involve support vector machines (SVMs) and neural networks. It was concluded that the research theme is still relevant and that the use of data from developing markets is a research opportunity.
EMD2FNN: A strategy combining empirical mode decomposition and factorization machine based neural network for stock market trend prediction
2019, Expert Systems with Applications
Stock market forecasting is a vital component of financial systems. However, the stock prices are highly noisy and non-stationary due to the fact that stock markets are affected by a variety of factors. Predicting stock market trend is usually subject to big challenges. The goal of this paper is to introduce a new hybrid, end-to-end approach containing two stages, the Empirical Mode Decomposition and Factorization Machine based Neural Network (EMD2FNN), to predict the stock market trend. To illustrate the method, we apply EMD2FNN to predict the daily closing prices from the Shanghai Stock Exchange Composite (SSEC) index, the National Association of Securities Dealers Automated Quotations (NASDAQ) index and the Standard & Poor’s 500 Composite Stock Price Index (S&P 500), which respectively exhibit oscillatory, upward and downward patterns. The results are compared with predictions obtained by other methods, including the neural network (NN) model, the factorization machine based neural network (FNN) model, the empirical mode decomposition based neural network (EMD2NN) model and the wavelet de-noising-based back propagation (WDBP) neural network model. Under the same conditions, the experiments indicate that the proposed methods perform better than the other ones according to the metrics of Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). Furthermore, we compute the profitability with a simple long-short trading strategy to examine the trading performance of our models in the metrics of Average Annual Return (AAR), Maximum Drawdown (MD), Sharpe Ratio (SR) and AAR/MD. The performances in two different scenarios, when taking or not taking the transaction cost into consideration, are found economically significant.
An application on forecasting for stock market prices: hybrid of some metaheuristic algorithms with multivariate adaptive regression splines
2023, International Journal of Intelligent Computing and Cybernetics

View all citing articles on Scopus

View full text

Mining stock price using fuzzy rough set system

Abstract

Introduction

Section snippets

Preliminaries

A fuzzy rough set system

Experimental results

Conclusion and future research

Acknowledgements

Fuzzy Sets and Systems

Fuzzy Sets and Systems

Information Sciences

Information Systems

Expert Systems with Applications

Fuzzy Sets and Systems

Expert Systems with Applications

Knowledge-Based Systems

Fuzzy Sets and Systems

Information Science

Information Processing and Management

International Journal of Man–Machine Studies

Information and Control

Fuzzy Sets and Systems

Computers and Mathematics with Applications

Brief application description. Neural networks based forecasting techniques for inventory control applications

Data Mining and Knowledge Discovery

SQLf: a relational database language for fuzzy querying

IEEE Transactions on Fuzzy Systems

Information-theoretical characterization of fuzzy relational databases

IEEE Transactions on System, Man, Cybernetics

Attribute-oriented induction in relational databases

Data mining: an overview from a database perspective

IEEE Transactions on Knowledge Data and Engineering

Matching strengths of answers in fuzzy relational database

IEEE Transactions on System, Man, Cybernetics

Building the data mart

DBMS Magazine

Discovering interesting patterns for investment decision making with GLOWER &xcirc; genetic learner overlaid with entropy reduction

Data Mining and Knowledge Discovery

Rough sets and fuzzy rough sets

International Journal of General Systems

The KDD process for extracting useful knowledge from volumes of data

Communications of ACM