Abstract
Data preparation is a very important but also a time consuming part of a Data Mining process. In this paper we describe a hierarchical method of text classification based on regular expressions. We use the presented method in our data mining system during a pre-processing stage to transform Latin free-text medical reports into a decision table. Such decision tables are used as an input for rough sets based rule induction subsystem. In this study we also compare accuracy and scalability of our method with a standard approach based on dictionary phrases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Pyle, D.: Data preparation for data mining. Morgan Kaufmann, San Francisco (1999)
Kozen, D.: On Kleene Algebras and Closed Semirings. In: Mathematical Foundations of Computer Science, Banská Bystrica, pp. 26–47 (1990)
McCulloch, W., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 115–133 (1943)
Pawlak, Z.: Rough sets. International Journal of Computer and Information Science 11, 341–356 (1982)
Sipser, M.: Introduction to the Theory of Computation. Course Technology (2006)
Pawlak, Z.: Knowledge and Uncertainty: A Rough Set Approach. In: SOFTEKS Workshop on Incompleteness and Uncertainty in Information Systems, pp. 34–42 (1993)
Pawlak, Z., et al.: Rough Sets. Commun. ACM 38, 88–95 (1995)
Ilczuk, G., Wakulicz-Deja, A.: Rough Sets Approach to Medical Diagnosis System. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 204–210. Springer, Heidelberg (2005)
Ilczuk, G., Wakulicz-Deja, A.: Attribute Selection and Rule Generation Techniques for Medical Diagnosis Systems. In: Ślęzak, D., et al. (eds.) RSFDGrC 2005. LNCS (LNAI), vol. 3641, pp. 352–361. Springer, Heidelberg (2005)
Wakulicz-Deja, A., Paszek, P.: Applying Rough Set Theory to Multi Stage Medical Diagnosing. Fundam. Inform. 54, 387–408 (2003)
Grzymala-Busse, J.W.: MLEM2 - Discretization During Rule Induction. In: IIS 2003, Zakopane, pp. 499–508 (2003)
Ilczuk, G., et al.: Rough Sets Techniques for Medical Diagnosis Systems. In: Computers in Cardiology 2005, Lyon, pp. 837–840 (2005)
Mlynarski, R., et al.: Automated Decision Support and Guideline Verification in Clinical Practice. In: Computers in Cardiology 2005, Lyon, pp. 375–378 (2005)
Chan, C.C., Grzymala-Busse, J.W.: On the two local inductive algorithms: PRISM and LEM2. Foundations of Computing and Decision Sciences 19, 185–203 (1994)
Chan, C.C., Grzymala-Busse, J.W.: On the attribute redundancy and the learning programs ID3, PRISM, and LEM2. Department of Computer Science, University of Kansas, TR-91-14 (1991)
Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundam. Inform. 31, 27–39 (1997)
Komorowski, H.J., et al.: Rough Sets: A Tutorial. Springer, Singapore (1999)
Farion, K., et al.: Rough Set Methodology in Clinical Practice: Controlled Hospital Trial of the MET System. In: Tsumoto, S., et al. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 805–814. Springer, Heidelberg (2004)
Grzymala-Busse, J.W., Goodwin, L.K.: Predicting pre-term birth risk using machine learning from data with missing values. Bulletin of the International Rough Set Society (IRSS) 1, 17–21 (1997)
Paszek, P., Wakulicz-Deja, A.: The Application of Support Diagnose in Mitochondrial Encephalomyopathies. In: Alpigini, J.J., et al. (eds.) RSCTC 2002. LNCS (LNAI), vol. 2475, pp. 586–593. Springer, Heidelberg (2002)
Pawlak, Z., Slowinski, K., Slowinski, R.: Rough Classification of Patients After Highly Selective Vagotomy for Duodenal Ulcer. International Journal of Man-Machine Studies 24, 413–433 (1986)
Tsumoto, S., et al.: Discretization of continuous attributes on decision system in mitochondrial encephalomyopathies. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 483–490. Springer, Heidelberg (1998)
Slowinski, K., Slowinski, R., Stefanowski, J.: Rough sets approach to analysis of data from peritoneal lavage in acute pancreatitis. Medical Informatics 13, 143–159 (1988)
Tsumoto, S., Tanaka, H.: Induction of Disease Description based on Rough Sets. In: 1st Online Workshop on Soft Computing, pp. 19–30 (1996)
Komorowski, H.J., Øhrn, A.: Modelling prognostic power of cardiac tests using rough sets. Artificial Intelligence in Medicine 15, 167–191 (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this chapter
Cite this chapter
Ilczuk, G., Wakulicz-Deja, A. (2007). Data Preparation for Data Mining in Medical Data Sets. In: Peters, J.F., Skowron, A., Düntsch, I., Grzymała-Busse, J., Orłowska, E., Polkowski, L. (eds) Transactions on Rough Sets VI. Lecture Notes in Computer Science, vol 4374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71200-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-71200-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71198-8
Online ISBN: 978-3-540-71200-8
eBook Packages: Computer ScienceComputer Science (R0)