Skip to main content

Data Preprocessing and Data Mining as Generalization

  • Chapter

Part of the book series: Studies in Computational Intelligence ((SCI,volume 118))

Summary

We present here an abstract model in which data preprocessing and data mining proper stages of the Data Mining process are are described as two different types of generalization. In the model the data mining and data preprocessing algorithms are defined as certain generalization operators. We use our framework to show that only three Data Mining operators: classification, clustering, and association operator are needed to express all Data Mining algorithms for classification, clustering, and association, respectively. We also are able to show formally that the generalization that occurs in the preprocessing stage is different from the generalization inherent to the data mining proper stage.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Hadjimichael, A. Wasilewska. A Hierarchical Model for Information Generalization. Proceedings of the Fourth Joint Conference on Information Sciences, Rough Sets, Data Mining and Granual Computing (RSDMGrC’98), NC, USA, vol. II, pp. 306–309

    Google Scholar 

  2. J. Han, M. Kamber. Data Mining: Concepts and Techniques. Morgan Kauffman, Los Altos, CA, 2000

    Google Scholar 

  3. M. Inuiguchi, T. Tanino. Classification Versus Approximation Oriented Generalization of Rough Sets. Bulletin of International Rough Set Society, 7:1/2, 2003

    Google Scholar 

  4. J. Komorowski. Modelling Biological Phenomena with Rough Sets. Proceedings of Third International Conference RSCTC’02, Malvern, PA, October 2002, p. 13. Springer Lecture Notes in Artificial Intelligence

    Google Scholar 

  5. T.Y. Lin. Database Mining on Derived Attributes. Proceedings of Third International Conference RSCTC’02, Malvern, PA, October 2002, pp. 14–32. Springer Lecture Notes in Artificial Intelligence

    Google Scholar 

  6. J.F. Martinez, E. Menasalvas, A. Wasilewska, C. Fernández, M. Hadjimichael. Extension of Relational Management System with Data Mining Capabilities. Proceedings of Third International Conference RSCTC’02, Malvern, PA, October 2002, pp. 421–428. Springer Lecture Notes in Artificial Intelligence

    Google Scholar 

  7. E. Menasalvas, A. Wasilewska, C. Fernández. The Lattice Structure of the KDD Process: Mathematical Expression of the Model and its Operators. International Journal of Information Systems and Fundamenta Informaticae, 48–62, special issues, 2001

    Google Scholar 

  8. E. Menasalvas, A. Wasilewska, C. Fernández, J.F. Martinez. Data Mining – A Semantical Model. Proceedings of 2002 World Congress on Computational Intelligence, Honolulu, Hawai, May 11–17, 2002, pp. 435–441

    Google Scholar 

  9. Z. Pawlak, Information Systems – Theoretical Foundations. Information Systems, 6:205–218, 1981

    Article  MATH  Google Scholar 

  10. Z. Pawlak, Rough Sets – Theoretical Aspects Reasoning About Data. Kluwer, Dordecht, 1991

    MATH  Google Scholar 

  11. A. Skowron, Data Filtration: A Rough Set Approach. Proceedings de Rough Sets, Fuzzy Sets and Knowledge Discovery. 1993, pp. 108–118

    Google Scholar 

  12. A. Wasilewska, E.M. Ruiz, M.C. Fernández-Baizan. Modelization of Rough Set Functions in the KDD Frame. First International Conference on Rough Sets and Current Trends in Computing (RSCTC’98), Warsaw, Poland, June 22–26 1998

    Google Scholar 

  13. A. Wasilewska, E. Menasalvas. Data Preprocessing and Data Mining as Generalization Process. Proceedings of ICDM’04, the Fourth IEEE International Conference on Data Mining, Brighton, UK, November 1–4, 2004, pp. 25–29

    Google Scholar 

  14. A. Wasilewska, E. Menasalvas. Data Mining Operators. Proceedings of ICDM’04, the Fourth IEEE International Conference on Data Mining, Brighton, UK, November 1–4, 2004, pp. 43–52

    Google Scholar 

  15. A. Wasilewska, E. Menasalvas, C. Scharff. Uniform Model for Data Mining. Proceedings of FDM05 (Foundations of Data Mining), in ICDM2005, Fifth IEEE International Conference on Data Mining, Austin, Texas, November 27–29, 2005, pp. 19–27

    Google Scholar 

  16. A. Wasilewska, E.M. Ruiz. Data Mining as Generalization: A Formal Model. Foundation and Advances in Data Mining, T.Y. Lin, W. Chu, editors. Springer Lecture Notes in Artificial Intelligence, 2005

    Google Scholar 

  17. W. Ziarko, X. Fei. VPRSM Approach to WEB Searching. Proceedings of Third International RSCTC’02 Conference, Malvern, PA, October 2002, pp. 514–522. Springer Lecture Notes in Artificial Intelligence

    Google Scholar 

  18. W. Ziarko. Variable Precision Rough Set Model. Journal of Computer and Systen Sciences, 46(1):39–59, 1993

    Article  MATH  MathSciNet  Google Scholar 

  19. J.T. Yao, Y.Y. Yao. Induction of Classification Rules by Granular Computing. Proceedings of Third International RSCTC’02 Conference, Malvern, PA, October 2002, pp. 331–338. Springer Lecture Notes in Artificial Intelligence

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wasilewska, A., Menasalvas, E. (2008). Data Preprocessing and Data Mining as Generalization. In: Lin, T.Y., Xie, Y., Wasilewska, A., Liau, CJ. (eds) Data Mining: Foundations and Practice. Studies in Computational Intelligence, vol 118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78488-3_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78488-3_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78487-6

  • Online ISBN: 978-3-540-78488-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics