Abstract
Generally a database encompasses various kinds of knowledge and is shared by many users. Different users may prefer different kinds of knowledge. So it is important for a data mining algorithm to output specific knowledge according to users’ current requirements (preference). We call this kind of data mining requirement-oriented knowledge discovery (ROKD). When the rough set theory is used in data mining, the ROKD problem is how to find a reduct and corresponding rules interesting for the user. Since reducts and rules are generated in the same way, this paper only concerns with how to find a particular reduct. The user’s requirement is described by an order of attributes, called attribute order, which implies the importance of attributes for the user. In the order, more important attributes are located before less important ones. Then the problem becomes how to find a reduct including those attributes anterior in the attribute order. An approach to dealing with such a problem is proposed. And its completeness for reduct is proved. After that, three kinds of attribute order are developed to describe various user requirements.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Han J, Kamber M. Data Mining: Concepts and Techniques, Morgan Kaufmann 2000.
Catlett J. Megainduction: Machine learning on very large databases [Dissertation]. Dept. of Computer Science, University of Sydney, Australia, 1991.
Musick R, Catlett J, Russell S. Decision theoretic subsampling for induction on large databases. InProceedings of the Tenth International Conference on Machine Learning, Utgoff P E (ed.), San Francisco, CA: Morgan Kaufmann, 1992, pp. 212–219.
Chan P K, Stolfo S J. Learning arbiter and combiner trees from partitioned data for scaling machine learning. InProceedings of the First International Conference on Knowledge Discovery and Data Mining, Menlo Park, CA: AAAI Press, 1995, pp. 39–44.
Shafer J, Agrawal R, Mehta M. SPRINT: A scalable parallel classifier for data mining. InProceedings of the Twenty-Second VLDB Conference, San Francisco, CA: Morgan Kaufmann, 1996, pp. 544–555.
Mehta M, Agrawal R, Rissanen J. SLIQ: A fast scalable classifier for data mining. In5th Int. Conf. on Extending Database Technology, New York: Springer, 1996, pp. 18–32.
Provost F, Kolluri V. Scaling up inductive algorithms: An overview, InProceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997, pp. 239–242.
Ronen F, Willi K, Amir Z. Visualization techniques to explore data mining results for document collections. InProceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), AAAI Press, 1997, pp. 16–23.
Utgoff P, Mitchell T. Acquisition of appropriate bias for inductive concept learning. InProceedings of the National Conferense on Artificial Intelligence AAAI-82, Pittsburgh, 1982, pp. 414–417.
Utgoff P. Shift of bias for inductive concept learning. InMachine Learning: An Artificial Intelligence Approach, Michalski R S, Carbonell J G, Mitchell T M (eds.), Volume II, California: Morgan Kaufmann, 1986, pp. 107–148.
Rendell L. A general framework for induction and a study of selective induction.Machine Learning, 1986, 1(2): 177–226.
Haussler D. Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework.Artificial Intelligence, 1988, 36(2): 177–221.
Machine Learning, Vol.20, Issue 1/2,Special Issue of ML on Bias Selection, July, 1995.
Dietterich T G, Kong E B. Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Tech. Rep., Department of Computer Science, Oregon State University, Corvallis, Oregon, 1995.
Wilson D R, Tony R M. Bias and the Probability of Generalization. InProc. the Int. Conf. Intelligent Information Systems (IIS’97), 1997, pp. 108–114.
Turney P D. Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm.Journal for AI Research, 1995, 2: 369–409.
Turney P D. Technical note: Bias and the quantification of stability.Machine Learning, 1995, 20(1–2): 23–33.
Wang Jue, Wang Ju. Reduction algorithms based on discernibility matrix: The ordered attributes method.J. Computer Science and Technology, 2001, 16(6): 489–504.
Pawlak Z. Rough sets.Int. J. Comput. Inform. Sci., 1982, 11(5): 341–356.
Polkowski L, Skowron A (eds.), Rough sets in knowledge discovery. Heidelberg: Physica-Verlag, 1998.
Duntsch I, Gediga G. Rough set data analysis.Encyclopedia of Computer Science and Technology, 2000, 43(Supplement, 28): 281–301.
Greco S, Matarazzo B, Slowinski R. Rough approximation of a preference relation by dominance relations.European Journal of Operational Research, 1999, 117(1): 63–83.
Greco S, Matarazzo B, Slowinski R. The use of rough sets and fuzzy sets in MCDM. Gal T, Stewart T, Hanne T (eds.), Chapter 14,Advances in Multiple Criteria Decision Making, Kluwer Academic Publishers, Dordrecht, Boston, 1999, pp. 14.1–14.59.
Greco S, Matarazzo B, Slowinski R. Rough sets theory for multicriteria decision analysis.European Journal of Operational Research, 2001, 129(1): 1–47.
Liu B, Hsu W, Chen S. Using general impressions to analyze discovered classification rules.Knowledge Discovery and Data Mining, 1997, pp. 31–36.
Bazan J, Skowron A, Synak P. Discovery of decision rules from experimental data. InProc. the Third International Workshop on Rough Sets and Soft Computing, Lin T L (ed.), San Jose CA, November 10–12, 1994, pp. 526–533.
Bazan J, Skowron A, Synak P. Dynamic reducts as a tool for extracting laws from decision tables. InProc. the Symp. Methodologies for Intelligent Systems, Charlotte, NC, Lecture Notes in Artificial Intelligence, Berlin: Springer-Verlag, 1994, pp. 346–355.
Wang J, Cui J, Zhao K. Investigation on AQ11, ID3 and the principle of discernibility matrix.J. Computer Science and Technology, 2001, 16(1): 1–12.
Wroblewski J. Finding minimal reducts using genetic algorithms. InProceedings of the International Workshop on Rough Sets Soft Computing at Second Annual Joint Conference on Information Sciences (JCIS’95), Wang P P (ed.), Wrightsville Beach, North Carolina, USA, September 28–October 1, 1995, pp. 186–189.
Wroblewski J. Genetic algorithms in decomposition and classification problems. InRough Sets in Knowledge Discovery 2: Applications, Case Studies and Software Systems, Polkowski L, Skowron A (eds.), Physica-Verlag, Heidelberg, 1998, pp. 472–492.
Skowron A, Rauszer C. The discernibility matrices and functions in information systems. Intelligent Decision Support Handbook of Applications and Advances of the Rough Sets Theory, Slowinski R (eds.), 1991, pp. 331–362.
Wang X F, Wang R S, Wang J. Sustainability knowledge mining from human development database. InThird Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD99), Zhong N, Zhou L Z (eds.), 1999, pp. 279–283.
Ziarko W. The discovery, analysis, and representation of data dependencies in databases. InIJCAI Workshop on Knowledge Discovery in Databases Proceedings, Piatetsky-Shapiro G, Frawley W J (eds.), AAAI/MIT Press, 1991, pp. 195–209.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the National Key Project for Prime Research on Image, Speech, Natural Language Understanding and Knowledge Mining (NKBRSF, Grant No. G 1998030508).
ZHAO Kai received his B.S. degree from Beijing Institute of Technology in 1993, and Ph.D. degree from the Institute of Automation, the Chinese Academy of Sciences. His research interests are adaptation systems, genetic programming and data mining.
WANG Jue is a professor of computer science and artificial intelligence at the Institute of Automation, the Chinese Academy of Sciences. His research interests include artificial neural network, machine learning and knowledge discovery in database.
Rights and permissions
About this article
Cite this article
Zhao, K., Wang, J. A reduction algorithm meeting users’ requirements. J. Comput. Sci. & Technol. 17, 578–593 (2002). https://doi.org/10.1007/BF02948826
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02948826