Skip to main content
Log in

New Techniques for Data Reduction in a Database System for Knowledge Discovery Applications

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Databases store large amounts of information about consumer transactions and other kinds of transactions. This information can be used to deduce rules about consumer behavior, and the rules can in turn be used to determine company policies, for instance with regards to production, marketing and in several other areas. Since databases typically store millions of records, and each record could have up to 100 or more attributes, as an initial step it is necessary to reduce the size of the database by eliminating attributes that do not influence the decision at all or do so very minimally. In this paper we present techniques that can be employed effectively for exact and approximate reduction in a database system. These techniques can be implemented efficiently in a database system using SQL (structured query language) commands. We tested their performance on a real data set and validated them. The results showed that the classification performance actually improved with a reduced set of attributes as compared to the case when all the attributes were present. We also discuss how our techniques differ from statistical methods and other data reduction methods such as rough sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aasheim, O.T. and Solheim, H.G. (1996). Rough Sets as a Framework for Data Mining, Project Report, Knowledge Systems Group, The Norwegian University of Science and Technology, Trondheim.

    Google Scholar 

  • Berenson, M., Levine, D., and Goldstein, M. (1983). Intermediate Statistical Methods and Applications, Prentice-Hall Publishers.

  • Breiman, et al. (1984). Classification and Regression Trees, Wadsworth Publishers.

  • Fayyad, U. et al. (1996). Advances in Knowledge Discovery and Data Mining, MIT Press.

  • Friedman, J.H. (1991). Multivariate Adaptive Regression Splines, The Annals of Statistics, 19, 1–141.

    Google Scholar 

  • Korth, H. and Silberschatz, A. (1991). Database Systems Concepts (Second edition), McGraw Hill Publishers.

  • Kretowski, M. and Stepaniuk, J. (1996). Selection of objects and attributes a tolerance rough set approach. 9th Int. Symp. on Methodologies for Intelligent Systems, Poland.

  • Kumar, A., Rao, V.R., and Soni, H. (1995). An Empirical Comparison of Neural Network and Logistic Regression Models, Marketing Letters, 6, 251–263.

    Google Scholar 

  • Kuncheva, L.I. (1992). Fuzzy Rough Sets: Applications to Feature Selection, Fuzzy Sets and Systems, 51, 147–153.

    Google Scholar 

  • Mingers, J. (1989). An Empirical Comparison of Pruning Methods for Decision Tree Induction, Machine Learning, 4, 227–243.

    Google Scholar 

  • Mollestad, T. and Skowron, A. (1996). A rough set framework for data mining of propositional default rules, 9th Int. Symp. on Methodologies for Intelligent Systems, Poland.

  • Nguyen S.H., Nguyen T.T., Polkowski L., Skowron A., Synak P., and Wroblewski J. (1996a). Decision rules for large data tables. Proc. CESA'96, France.

  • Nguyen S.H., Polkowski L., Skowron A., Synak P., and Wroblewski J. (1996b). Searching for approximate description of decision tables. Proc. 4th Int. Workshop on Rough Sets, Fuzzy Sets and Machine Discovery, Tokyo.

  • Pawlak, Z. (1991). Rough Sets, Kluwer Academic Publishers.

  • Piatetsky-Shapiro, G. and Frawley, W. (1991). Knowledge Discovery in Databases, MIT Press.

  • Quinlan, J.R. (1986). Induction of Decision Trees, Machine Learning, 1, 86–106.

    Google Scholar 

  • Quinlan, J.R. (1993). C4.5: Programs for Machine Learning, Morgan Kaufman Publishers.

  • Rumelhart, D.E., Hinton, G.E., and Williams, R.J. (1986). Learning Internal Representations by Error Propagation. In D.E. Rumelhart, J.L. Mclelland, and the PDP Group (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press.

  • Simoudis, E. et al. (1996). Integrating inductive and deductive reasoning for data mining. In (Fayyad et al., 1996).

  • Slowinski, R. (1992). Intelligent Decision Support: Handbook of Applications and Advances of Rough Set Theory, Kluwer Academic Publishers.

  • Slowinski, R. and Stefanowski, J. (1993). Handling various types of uncertainty in the rough set approach, Proc. Int. Workshop on Rough Sets, Fuzzy Sets and Machine Discovery, Alberta, Canada.

  • Stepaniuk, J. and Kretowski, M. (1996). Similarity based rough sets and learning. 4th Int. Workshop on Rough Sets, Fuzzy Sets and Machine Discovery, Tokyo.

  • Tanaka, H., Ishibuchi, H., and Shigenaga, T. (1992). Fuzzy inference system based on rough sets and its applications to medical diagnosis. In (Slowinski, 1992), pp. 111–117.

  • Weiss, S. and Kulikowski, C. (1991). Computer Systems that Learn, Morgan Kaufman Publishers.

  • Yasdi, R. (1991). Learning Classification Rules from Database in the Context of Knowledge-Acquisition and Representation, IEEE Transactions on Knowledge and Data Engineering, 3(3), 293–306.

    Google Scholar 

  • Ziarko, W. (1991). The Discovery, Analysis and Representation of Data Dependencies in Databases. In (Piatetsky-Shapiro and Frawley, 1991), pp. 195–209.

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, A. New Techniques for Data Reduction in a Database System for Knowledge Discovery Applications. Journal of Intelligent Information Systems 10, 31–48 (1998). https://doi.org/10.1023/A:1008633406999

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008633406999

Navigation