Skip to main content

Data Preparation for Data Mining in Medical Data Sets

  • Chapter
Transactions on Rough Sets VI

Part of the book series: Lecture Notes in Computer Science ((TRS,volume 4374))

  • 619 Accesses

Abstract

Data preparation is a very important but also a time consuming part of a Data Mining process. In this paper we describe a hierarchical method of text classification based on regular expressions. We use the presented method in our data mining system during a pre-processing stage to transform Latin free-text medical reports into a decision table. Such decision tables are used as an input for rough sets based rule induction subsystem. In this study we also compare accuracy and scalability of our method with a standard approach based on dictionary phrases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pyle, D.: Data preparation for data mining. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  2. Kozen, D.: On Kleene Algebras and Closed Semirings. In: Mathematical Foundations of Computer Science, Banská Bystrica, pp. 26–47 (1990)

    Google Scholar 

  3. McCulloch, W., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 115–133 (1943)

    Google Scholar 

  4. Pawlak, Z.: Rough sets. International Journal of Computer and Information Science 11, 341–356 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  5. Sipser, M.: Introduction to the Theory of Computation. Course Technology (2006)

    Google Scholar 

  6. Pawlak, Z.: Knowledge and Uncertainty: A Rough Set Approach. In: SOFTEKS Workshop on Incompleteness and Uncertainty in Information Systems, pp. 34–42 (1993)

    Google Scholar 

  7. Pawlak, Z., et al.: Rough Sets. Commun. ACM 38, 88–95 (1995)

    Article  Google Scholar 

  8. Ilczuk, G., Wakulicz-Deja, A.: Rough Sets Approach to Medical Diagnosis System. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 204–210. Springer, Heidelberg (2005)

    Google Scholar 

  9. Ilczuk, G., Wakulicz-Deja, A.: Attribute Selection and Rule Generation Techniques for Medical Diagnosis Systems. In: Ślęzak, D., et al. (eds.) RSFDGrC 2005. LNCS (LNAI), vol. 3641, pp. 352–361. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Wakulicz-Deja, A., Paszek, P.: Applying Rough Set Theory to Multi Stage Medical Diagnosing. Fundam. Inform. 54, 387–408 (2003)

    MATH  MathSciNet  Google Scholar 

  11. Grzymala-Busse, J.W.: MLEM2 - Discretization During Rule Induction. In: IIS 2003, Zakopane, pp. 499–508 (2003)

    Google Scholar 

  12. Ilczuk, G., et al.: Rough Sets Techniques for Medical Diagnosis Systems. In: Computers in Cardiology 2005, Lyon, pp. 837–840 (2005)

    Google Scholar 

  13. Mlynarski, R., et al.: Automated Decision Support and Guideline Verification in Clinical Practice. In: Computers in Cardiology 2005, Lyon, pp. 375–378 (2005)

    Google Scholar 

  14. Chan, C.C., Grzymala-Busse, J.W.: On the two local inductive algorithms: PRISM and LEM2. Foundations of Computing and Decision Sciences 19, 185–203 (1994)

    MATH  Google Scholar 

  15. Chan, C.C., Grzymala-Busse, J.W.: On the attribute redundancy and the learning programs ID3, PRISM, and LEM2. Department of Computer Science, University of Kansas, TR-91-14 (1991)

    Google Scholar 

  16. Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundam. Inform. 31, 27–39 (1997)

    MATH  Google Scholar 

  17. Komorowski, H.J., et al.: Rough Sets: A Tutorial. Springer, Singapore (1999)

    Google Scholar 

  18. Farion, K., et al.: Rough Set Methodology in Clinical Practice: Controlled Hospital Trial of the MET System. In: Tsumoto, S., et al. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 805–814. Springer, Heidelberg (2004)

    Google Scholar 

  19. Grzymala-Busse, J.W., Goodwin, L.K.: Predicting pre-term birth risk using machine learning from data with missing values. Bulletin of the International Rough Set Society (IRSS) 1, 17–21 (1997)

    Google Scholar 

  20. Paszek, P., Wakulicz-Deja, A.: The Application of Support Diagnose in Mitochondrial Encephalomyopathies. In: Alpigini, J.J., et al. (eds.) RSCTC 2002. LNCS (LNAI), vol. 2475, pp. 586–593. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  21. Pawlak, Z., Slowinski, K., Slowinski, R.: Rough Classification of Patients After Highly Selective Vagotomy for Duodenal Ulcer. International Journal of Man-Machine Studies 24, 413–433 (1986)

    Article  Google Scholar 

  22. Tsumoto, S., et al.: Discretization of continuous attributes on decision system in mitochondrial encephalomyopathies. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 483–490. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  23. Slowinski, K., Slowinski, R., Stefanowski, J.: Rough sets approach to analysis of data from peritoneal lavage in acute pancreatitis. Medical Informatics 13, 143–159 (1988)

    Article  Google Scholar 

  24. Tsumoto, S., Tanaka, H.: Induction of Disease Description based on Rough Sets. In: 1st Online Workshop on Soft Computing, pp. 19–30 (1996)

    Google Scholar 

  25. Komorowski, H.J., Øhrn, A.: Modelling prognostic power of cardiac tests using rough sets. Artificial Intelligence in Medicine 15, 167–191 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

James F. Peters Andrzej Skowron Ivo Düntsch Jerzy Grzymała-Busse Ewa Orłowska Lech Polkowski

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this chapter

Cite this chapter

Ilczuk, G., Wakulicz-Deja, A. (2007). Data Preparation for Data Mining in Medical Data Sets. In: Peters, J.F., Skowron, A., Düntsch, I., Grzymała-Busse, J., Orłowska, E., Polkowski, L. (eds) Transactions on Rough Sets VI. Lecture Notes in Computer Science, vol 4374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71200-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71200-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71198-8

  • Online ISBN: 978-3-540-71200-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics