Abstract
The discretization of values plays a critical role in data mining and knowledge discovery. The representation of information through intervals is more concise and easier to understand at certain levels of knowledge than the representation by mean continuous values. In this paper, we propose a method for discretizing continuous attributes by means of fuzzy sets, which constitute a fuzzy partition of the domains of these attributes. This method carries out a fuzzy discretization of continuous attributes in two stages. A fuzzy decision tree is used in the first stage to propose an initial set of crisp intervals, while a genetic algorithm is used in the second stage to define the membership functions and the cardinality of the partitions. After defining the fuzzy partitions, we evaluate and compare them with previously existing ones in the literature.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Antonelli M, Ducange P, Lazzerini B, Marcelloni F (2010) Learning knowledge bases of multi-objective evolutionary fuzzy systems by simultaneously optimizing accuracy, complexity and partition integrity. Soft Comput. doi:10.1007/s00500-010-0665-0
Asuncion A, Newman DJ (2007) UCI Machine Learning Repository, http://www.ics.uci.edu/mlearn/MLRepository.html, University of California, School of Information and Computer Science, Irvine, CA
Au W-H, Chan KC, Wong A (2006) A fuzzy approach to partitioning continuous attributes for classification. IEEE Trans Knowl Data Eng 18(5):715–719
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B 57:289–300
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Boulle M (2004) Khiops: a statistical discretization method of continuous attributes. Mach Learn 55:53–69
Catlett J (1991) On changing continuous attributes into ordered discrete attributes. In: Proceedings of Fifth European Working Session on Learning. Porto, Portugal, pp 164–177
Chan CC, Bartur C, Srinivasasn A (1991) Determination of Quantization Intervals in Rule Based Model for Dynamic Systems. In: Proceedings of IEEE Conference on System, Man, and Cybernetics. Charlottesville, VA , USA, pp 1719–1723
Choi YS, Moon BR (2007) Feature Selection in Genetic Fuzzy Discretization for Pattern Classification Problems. IEICE Trans Inform Syst E90-D(7):1047–1054
Cox E, Taber R, O’Hagan M (1998) The fuzzy systems handbook. 2nd edn. AP Professional, Oswego (2nd Bk&Cd edition)
Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, California, USA, pp 194–202
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes in decision tree generation. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, Chambéry, France, pp 1022–1027
Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) From data mining to Knoweledge Discovery: An Overview. In: Advances in Knoweledge Discovery and Data Mining, U.M. Fayyad, G Piatetsky-Shapiro, P Smyth P, Uthrusamy R (eds.), AAAI/MIT Press, Massachusetts, pp 1–34
García S, Fernández A, Luengo J, Herrera F (2009) A study statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13(10):959–977
Gustafson DE, Kessel WC (1979) Fuzzy clustering with a fuzzy covariance matrix. In: Proceedins of IEEE Conference on Decision and Control, San Diego, CA, pp 761–766
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA Data Mining Software: An Update. ACM SIGKDD Explor Newslett 11(1):10–18
Goldberg D E (1989) Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., MA, USA
Ho KM, Scott PD (1997) Zeta: A Global Method for Discretization of Continuous Variables. In: Proceedings of 3rd International Conference on Knowledge Discovery and Data Mining, Newport Beach, California, pp 191–194
Holte RC (1993) Very simple classification rules perform well on most on most commonly used datasets. Mach Learn 11:63–90
Ihaka R, Gentleman R (1996) R: A Language for Data Analysis and Graphics. J Comput Graph Stat 5(3):299–314. http://www.r-project.org/
Janikov CZ (1999) Fuzzy partitionig with fid 3.1. In: Proceedings of 18th International Conference of the North American Fuzzy Information Processing Society, New York, USA, pp 467–471
Kbir MA, Maalmi K, Benslimane R, Benkirane H (2000) Hierarchical fuzzy partition for pattern classification with fuzzy if-then rules. Pattern Recognit Lett 21(6–7):503–509
Kerber R (1992) ChiMerge: Discretization of Numeric Attributes. In: Proceedings of Tenth Conf. Artificial Intelligence, CA, USA, pp 123–128
Khan SS, Ahmad A (2004) Cluster center initialization algorithm for K-means clustering. Pattern Recognit Lett 25(11):1293–1302
Kurgan L, Cios KJ (2004) CAIM discretization algorithm. IEEE Trans Knowl Data Eng 16(2):145–153
Li Ch (2009) A Combination Scheme for Fuzzy Partitions Based on Fuzzy Majority Voting Rule. In: Proceedings of International Conference on Networks Security, Wireless Communications and Trusted Computing. Wuhan, Hubei, China, pp 675–678
Li Ch, Wang Y, Dai H (2009) A Combination Scheme for Fuzzy Partitions Based on Fuzzy Weighted Majority Voting Rule. In: Proceedings of International Conference on Digital Image Processing. Bangkok, Thailand, pp 3–7
Li Ch, Wang Y, Zuo Y (2009) A Selection Model for Optimal Fuzzy Clustering Algorithm and Number of Cluster Based on Competitive Comprehensive Fuzzy Evaluation. IEEE Trans Fuzzy Syst 17:568–577
Liu H, Setiono R (1997) Feature Selection via Discretization. IEEE Trans Knowl Data Eng 9(4):642–645
Liu H, Hussain F, Tan CL, Dash M (2002) Discretization: an enabling technique. J Data Min Knowl Discov 6(4):393–423
Marzuki Z, Ahmad F (2007) Data Mining Discretization Methods and Performances. In: Proceedings of the International Conference on Electrical Engineering and Informatics. Bandung, Indonesia, pp 535–537
Mirkin B (1996) Mathematical classification and clustering. Kluwer Academic Publishers, Netherlands
Mirkin B, Satarov G (1990) Method of fuzzy additive types for analysis of multidimensional data: I, II. Autom Remote Control 51(5–6):683–688, 817–821
Myles AJ, Brown SD (2003) Induction of Decision Trees Using Fuzzy Partitions. J Chemom 17:531–536
Nascimento S, Mirkin B, Moura-Pires F (2000) A fuzzy clustering model of data and fuzzy c-means. In: Proceedings of the IEEE Conference on Fuzzy Systems, San Antonio, TX, USA, pp 302–307
Peng YH, Flach PA (2001) Soft Discretization to Enhance the Continuous Decision Tree Induction. In: Proceedings of ECML/PKDD-2001 Workshop IDDM-2001, Freiburg, Germany, pp 109–118
Piñero PY, Arco L, García MM, Acevedo L (2003) Algoritmos Genéticos en la construcción de funciones de pertenencia borrosas. Revista Iberoamericana de Inteligencia Artificial 18:25–33
Quilan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Francisco, CA
Redmond SJ, Heneghan C (2007) A method for initializing the K-means clustering algorithm using kd-tree. Pattern Recognit Lett 28:965–973
Sriparna S, Sanghamitra B (2007) A Fuzzy Genetic Clustering Technique Using a New Symmetry Based Distance for Automatic Evolution of Clusters. In: Proceedings of International Conference on Computing: Theory and Applications, Kolkata, India, pp 309–314
Torra V (2005) Fuzzy C-Means for Fuzzy Hierarchical Clustering. In: Proceedings of IEEE International Conference on Fuzzy Systems, Reno, Nevada, USA, pp 646–651
Tsai CJ, Lee CI, Yang WP (2008) A discretization algorithm based on class-attribute contingency coefficient. Inf Sci 178:714–731
Umano M, Okamolo H, Hatono I, Tamura H (1994) Fuzzy decision trees by fuzzy ID3 algorithm and its application to Diagnosis System. In: Proceedings of Third IEEE Intl. Conf. Fuzzy Systems, Orlando, USA, pp 2113–2118
Wu KL, Yang MS (2002) Alternative C-means Clustering Algorithm. Pattern Recognit 35(1):2267–2278
Yang Y, Jia Z, Chang C, Qin X, Li T, Wang H, Zhao J (2008) An efficient fuzzy kohonen clustering network algorithm. In: Proceedings of Fuzzy Systems and Knowledge Discovery, Shandong, China, pp 510–513
Zadeh LA (1975) The Concept of a Linguistic Variable and its Application to Approximate Reasoning-I. Inf Sci 8(3):199–249
Acknowledgments
Supported by the projects TIN2008-06872-C04-03 and TIN2011-27696-C02-02 of the MICINN of Spain and European Fund for Regional Development. Thanks also to “Fundación Séneca” (Spain) for the Funding Program for Research Groups of Excellence (04552/GERM/06) and the support given to R. Martínez by FPI scholarship program.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cadenas, J.M., Garrido, M.C., Martínez, R. et al. OFP_CLASS: a hybrid method to generate optimized fuzzy partitions for classification. Soft Comput 16, 667–682 (2012). https://doi.org/10.1007/s00500-011-0778-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-011-0778-0