New Techniques for Data Reduction in a Database System for Knowledge Discovery Applications

Kumar, Akhil

doi:10.1023/A:1008633406999

New Techniques for Data Reduction in a Database System for Knowledge Discovery Applications

Published: January 1998

Volume 10, pages 31–48, (1998)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Akhil Kumar

146 Accesses
21 Citations
Explore all metrics

Abstract

Databases store large amounts of information about consumer transactions and other kinds of transactions. This information can be used to deduce rules about consumer behavior, and the rules can in turn be used to determine company policies, for instance with regards to production, marketing and in several other areas. Since databases typically store millions of records, and each record could have up to 100 or more attributes, as an initial step it is necessary to reduce the size of the database by eliminating attributes that do not influence the decision at all or do so very minimally. In this paper we present techniques that can be employed effectively for exact and approximate reduction in a database system. These techniques can be implemented efficiently in a database system using SQL (structured query language) commands. We tested their performance on a real data set and validated them. The results showed that the classification performance actually improved with a reduced set of attributes as compared to the case when all the attributes were present. We also discuss how our techniques differ from statistical methods and other data reduction methods such as rough sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Various Strategies and Technical Aspects of Data Mining: A Theoretical Approach

Building a model to exploit association rules and analyze purchasing behavior based on rough set theory

Article 07 February 2022

Duy Thanh Tran & Jun-Ho Huh

SQL-Based KDD with Infobright’s RDBMS: Attributes, Reducts, Trees

References

Aasheim, O.T. and Solheim, H.G. (1996). Rough Sets as a Framework for Data Mining, Project Report, Knowledge Systems Group, The Norwegian University of Science and Technology, Trondheim.
Google Scholar
Berenson, M., Levine, D., and Goldstein, M. (1983). Intermediate Statistical Methods and Applications, Prentice-Hall Publishers.
Breiman, et al. (1984). Classification and Regression Trees, Wadsworth Publishers.
Fayyad, U. et al. (1996). Advances in Knowledge Discovery and Data Mining, MIT Press.
Friedman, J.H. (1991). Multivariate Adaptive Regression Splines, The Annals of Statistics, 19, 1–141.
Google Scholar
Korth, H. and Silberschatz, A. (1991). Database Systems Concepts (Second edition), McGraw Hill Publishers.
Kretowski, M. and Stepaniuk, J. (1996). Selection of objects and attributes a tolerance rough set approach. 9th Int. Symp. on Methodologies for Intelligent Systems, Poland.
Kumar, A., Rao, V.R., and Soni, H. (1995). An Empirical Comparison of Neural Network and Logistic Regression Models, Marketing Letters, 6, 251–263.
Google Scholar
Kuncheva, L.I. (1992). Fuzzy Rough Sets: Applications to Feature Selection, Fuzzy Sets and Systems, 51, 147–153.
Google Scholar
Mingers, J. (1989). An Empirical Comparison of Pruning Methods for Decision Tree Induction, Machine Learning, 4, 227–243.
Google Scholar
Mollestad, T. and Skowron, A. (1996). A rough set framework for data mining of propositional default rules, 9th Int. Symp. on Methodologies for Intelligent Systems, Poland.
Nguyen S.H., Nguyen T.T., Polkowski L., Skowron A., Synak P., and Wroblewski J. (1996a). Decision rules for large data tables. Proc. CESA'96, France.
Nguyen S.H., Polkowski L., Skowron A., Synak P., and Wroblewski J. (1996b). Searching for approximate description of decision tables. Proc. 4th Int. Workshop on Rough Sets, Fuzzy Sets and Machine Discovery, Tokyo.
Pawlak, Z. (1991). Rough Sets, Kluwer Academic Publishers.
Piatetsky-Shapiro, G. and Frawley, W. (1991). Knowledge Discovery in Databases, MIT Press.
Quinlan, J.R. (1986). Induction of Decision Trees, Machine Learning, 1, 86–106.
Google Scholar
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning, Morgan Kaufman Publishers.
Rumelhart, D.E., Hinton, G.E., and Williams, R.J. (1986). Learning Internal Representations by Error Propagation. In D.E. Rumelhart, J.L. Mclelland, and the PDP Group (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press.
Simoudis, E. et al. (1996). Integrating inductive and deductive reasoning for data mining. In (Fayyad et al., 1996).
Slowinski, R. (1992). Intelligent Decision Support: Handbook of Applications and Advances of Rough Set Theory, Kluwer Academic Publishers.
Slowinski, R. and Stefanowski, J. (1993). Handling various types of uncertainty in the rough set approach, Proc. Int. Workshop on Rough Sets, Fuzzy Sets and Machine Discovery, Alberta, Canada.
Stepaniuk, J. and Kretowski, M. (1996). Similarity based rough sets and learning. 4th Int. Workshop on Rough Sets, Fuzzy Sets and Machine Discovery, Tokyo.
Tanaka, H., Ishibuchi, H., and Shigenaga, T. (1992). Fuzzy inference system based on rough sets and its applications to medical diagnosis. In (Slowinski, 1992), pp. 111–117.
Weiss, S. and Kulikowski, C. (1991). Computer Systems that Learn, Morgan Kaufman Publishers.
Yasdi, R. (1991). Learning Classification Rules from Database in the Context of Knowledge-Acquisition and Representation, IEEE Transactions on Knowledge and Data Engineering, 3(3), 293–306.
Google Scholar
Ziarko, W. (1991). The Discovery, Analysis and Representation of Data Dependencies in Databases. In (Piatetsky-Shapiro and Frawley, 1991), pp. 195–209.

Download references

Authors

Akhil Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, A. New Techniques for Data Reduction in a Database System for Knowledge Discovery Applications. Journal of Intelligent Information Systems 10, 31–48 (1998). https://doi.org/10.1023/A:1008633406999

Download citation

Issue Date: January 1998
DOI: https://doi.org/10.1023/A:1008633406999

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New Techniques for Data Reduction in a Database System for Knowledge Discovery Applications

Abstract

Access this article

Similar content being viewed by others

Various Strategies and Technical Aspects of Data Mining: A Theoretical Approach

Building a model to exploit association rules and analyze purchasing behavior based on rough set theory

SQL-Based KDD with Infobright’s RDBMS: Attributes, Reducts, Trees

References

Rights and permissions

About this article

Cite this article

Navigation

New Techniques for Data Reduction in a Database System for Knowledge Discovery Applications

Abstract

Access this article

Similar content being viewed by others

Various Strategies and Technical Aspects of Data Mining: A Theoretical Approach

Building a model to exploit association rules and analyze purchasing behavior based on rough set theory

SQL-Based KDD with Infobright’s RDBMS: Attributes, Reducts, Trees

References

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation