Data Mining via Discretization, Generalization and Rough Set Feature Selection

Hu, Xiaohua; Cercone, Nick

doi:10.1007/BF03325090

Data Mining via Discretization, Generalization and Rough Set Feature Selection

Regular Paper
Published: 13 July 2013

Volume 1, pages 33–60, (1999)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Xiaohua Hu¹ &
Nick Cercone²

259 Accesses
29 Citations
3 Altmetric
Explore all metrics

Abstract

We present a data mining method which integrates discretization, generalization and rough set feature selection. Our method reduces the data horizontally and vertically. In the first phase, discretization and generalization are integrated. Numeric attributes are discretized into a few intervals. The primitive values of symbolic attributes are replaced by high level concepts and some obvious superfluous or irrelevant symbolic attributes are also eliminated. The horizontal reduction is done by merging identical tuples after substituting an attribute value by its higher level value in a pre- defined concept hierarchy for symbolic attributes, or the discretization of continuous (or numeric) attributes. This phase greatly decreases the number of tuples we consider further in the database(s). In the second phase, a novel context- sensitive feature merit measure is used to rank features, a subset of relevant attributes is chosen, based on rough set theory and the merit values of the features. A reduced table is obtained by removing those attributes which are not in the relevant attributes subset and the data set is further reduced vertically without changing the interdependence relationships between the classes and the attributes. Finally, the tuples in the reduced relation are transformed into different knowledge rules based on different knowledge discovery algorithms. Based on these principles, a prototype knowledge discovery system DBROUGH-II has been constructed by integrating discretization, generalization, rough set feature selection and a variety of data mining algorithms. Tests on a telecommunication customer data warehouse demonstrates that different kinds of knowledge rules, such as characteristic rules, discriminant rules, maximal generalized classification rules, and data evolution regularities, can be discovered efficiently and effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

H. Almuallim, T.G. Dietterich. Learning Boolean concepts in the presence of many irrelevant features. Artificial Intelligence 69(1–2), 279–305, 1994.
Article MathSciNet MATH Google Scholar
N. Cercone, W. Ziarko, X. Hu. Rule discovery from databases: A decision matrix approach. In: R. Zbigniew, M. Zemankova (eds.), Methodologies for Intelligent System, LNAI 1079, Springer-Verlag, 1996, pp.653-662
M. Chen, J. Han, P. Yu. Data mining: An overview from database perspective, IEEE Trans. Knowledge and Data Engineering 8(6), 866–883, 1996.
Article Google Scholar
P. Domingos. Rule induction and instance-based learning: A unified approach. In: Proc. 14th International Joint Conference on AI, Morgan Kaufmann, 1995, pp.1226-1232.
U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (eds.). Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996.
M. Gams. New measurements highlight the importance of redundant knowledge. In: Proc. 3rd Europe Working Session on Learning, Springer-Verlag, 1989, pp.71-80.
J. Han, Y. Cai, N. Cercone. Knowledge discovery in databases: An attribute-oriented approach. In: Proc. 18th VLDB Conference, Morgan Kaufmann, 1992, pp.335-350.
R. Hilderman, C. Carter, H. Hamilton, N. Cercone. Mining association rules from market basket analysis using share measures and characterized itemsets, Int. J. Artificial Intelligence Tools, in press.
S. Hong. Use of contextual information for feature ranking and discretization, IEEE Trans. Knowledge and Data Engineering 9(3), 221–233, 1997.
Google Scholar
X. Hu. Object aggregation and cluster identification: A knowledge discovery approach, Applied Math. Letter 7(4), 29–34, 1994.
Article MATH Google Scholar
X. Hu, N. Cercone, J. Xie. Learning data trend regularities from databases in a dynamic environment. In: Proc. AAAI Knowledge Discovery in Databases Workshop, 1994, pp.323-334
X. Hu, N. Cercone. Learning from relational databases, Computational Intelligence: An International Journal 12(5), 323–338, 1995.
Google Scholar
X. Hu, N. Cercone. Rough set based similarity learning from databases. In: Proc. First International Conference on Knowledge Discovery and Data Mining, AAAI, 1995, pp.162-167.
X. Hu, N. Cercone. Mining knowledge rules from databases: A rough set approach. In: Proc. 12th International Conference on Data Engineering, 1996, pp.96-105.
G. John, R. Kohavi, K. Pfleger. Irrelevant features And the subset selection problem, Proc. 11th International Conference on Machine Learning, Morgan Kaufmann, 1994, pp.121-129.
R. Kerber. ChiMerge: Discretization of numeric attribute. In: Proc. 10th National Conference on AI, AAAI, 1992, pp.123-128.
K. Kira, L. Rendell. The feature selection problem: Traditional methods and a new algorithm. In: Proc. 10th National Conference on AI, AAAI, 1992, pp.129-134.
R. Kohavi, H. John. Wrappers for feature subset selection, Artificial Intelligence Review (special issue in relevance), in press.
R. Kohavi, M. Sahami. Error-based and entropy-based discretization of continuous features. In: Proc. Second International Conference on Knowledge Discovery and Data Mining, AAAI, 1996, pp.114-119.
D. Koller, M. Sahami. Toward optimal feature selection. In: Proc. 13th International Conference on Machine Learning, Morgan Kaufmann, 1996, pp.284-292
S. Kullback. Information Theory and Statistics, Dover Publications, 1968.
UT.Y. Lin, N. Cercone. Rough Set and Data Mining: Analysis of Imprecise Data, Kluwer Academic Publisher, 1997.
H. Liu, R. Setiono. Chi2: Feature selection and discretization of numerical attributes. In: Proc. 8th IEEE Tools on AI, 1996, pp.388-391.
H. Liu, R. Setiono. A probabilistic approach to feature selection—A filter solution. In: Proc. 13th International Conference on Machine Learning, Morgan Kaufmann, 1996, pp.319-327.
R. Michalski, L. Mozetic, J. Hong, N. Lavrac. The Multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In: Proc. 5th National Conference on AI, AAAI, 1986, pp. 1041-1045.
M. Modrzejewski. Feature selection using rough sets theory. In: Proc. 5th Europe Conference on Machine learning, Springer-Verlag, 1993, pp.78-85.
P. Murphy, W. Aha. UCI repository of machine learning databases, http://www.ics.uci.edu/mlearn/MLRepository.html, 1996.
P. Murphy, M. Pazzani. Exploring the decision forest: An empirical investigation of Occam’s razor in decision tree induction, J. Artificial Intelligence Research, 1(1), 257–275, Morgan Kaufmann, 1994.
MATH Google Scholar
Z. Pawlak. Rough sets: Theoretical aspects of reasoning about data, Kluwer Academic Publishers, 1992.
J. R. Quinlan. C4.5: Program for Machine Learning, Morgan Kaufmann, 1993.
R. Rymon. An SE-tree based characterization of the induction problem. In: Proc. 10th International Conference on Machine Learning, Morgan Kaufmann, 1993, pp.134-142.
E. Simoudis, J. Han, U. Fayyad (eds.). Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1996.
L. Torgo. Controlled redundancy in incremental rule learning. In: Proc. 5th European Conference on Machine Learning, Springer-Verlag, 1993, pp. 185-195.
S. Tsumoto, S. Kobayashi, T. Yokpmori, H. Tanaka, A. Nakamura (eds.). Proceedings of the 4th International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery, The University of Tokyo Press, 1996.
W. Ziarko. Variable precision rough set model, J. Computer and System Sciences 46(1), Academic, 1993, 125–142.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Knowledge Stream Partner, 148 State St., Boston, MA, 02109, USA
Xiaohua Hu
Dept. of Computer Science, University of Waterloo, Waterloo, Canada
Nick Cercone

Authors

Xiaohua Hu
View author publications
You can also search for this author in PubMed Google Scholar
Nick Cercone
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, X., Cercone, N. Data Mining via Discretization, Generalization and Rough Set Feature Selection. Knowledge and Information Systems 1, 33–60 (1999). https://doi.org/10.1007/BF03325090

Download citation

Received: 10 July 1998
Revised: 09 September 1998
Accepted: 08 October 1998
Published: 13 July 2013
Issue Date: February 1999
DOI: https://doi.org/10.1007/BF03325090

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Mining via Discretization, Generalization and Rough Set Feature Selection

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

Recent advances in decision trees: an updated survey

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data Mining via Discretization, Generalization and Rough Set Feature Selection

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

Recent advances in decision trees: an updated survey

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation