MEMOD: a novel multivariate evolutionary multi-objective discretization

Tahan, Marzieh Hajizadeh; Asadi, Shahrokh

doi:10.1007/s00500-016-2475-5

MEMOD: a novel multivariate evolutionary multi-objective discretization

Methodologies and Application
Published: 05 January 2017

Volume 22, pages 301–323, (2018)
Cite this article

Soft Computing Aims and scope Submit manuscript

Marzieh Hajizadeh Tahan¹ &
Shahrokh Asadi¹

612 Accesses
Explore all metrics

Abstract

Discretization is an important preprocessing technique, especially in classification problems. It reduces and simplifies data, accelerates the learning process, and improves learner performance. The most challenging aspect of the discretization process is to maintain the accuracy of the classification algorithm and to prevent information loss while reducing the number of discretized values. In this paper, using evolutionary multi-objective optimization, classification error (the first objective function) and number of cut points (the second objective function) are simultaneously reduced. The third objective function involves selecting low-frequency cut points so that a smaller degree of information is lost during this conversion (from continuous to discrete). To the best of our knowledge, this is the first paper to consider the discretization process as a multi-objective optimization problem. Previous discretization methods result in only one solution. However, in real-world problems, decision makers often need several alternatives to make better decisions—a requirement which cannot be fulfilled using these techniques. The multi-objective nature of the proposed algorithm enables the generation of numerous solutions (i.e., the Pareto front) allowing the user to select the most appropriate solution according to the nuances of the problem. A total of 20 benchmark data sets were used to test the performance of the proposed algorithm. Our results show that the proposed algorithm offers superior performance compared to other methods in the literature. Thus, it presents better discretization in classification problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Acosta-Mesa H-G, Rechy-Ramírez F, Mezura-Montes E, Cruz-Ramírez N, Jiménez RH (2014) Application of time series discretization using evolutionary programming for classification of precancerous cervical lesions. J. Biomed. Inf. 49:73–83
Article Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of 20th international conference on very large data bases, VLDB, pp 487–499
Alcala-Fdez J et al (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13:307–318
Ali R, Siddiqi MH, Lee S (2015) Rough set-based approaches for discretization: a compact review. Artif Intell Rev 44:235–263
Asadi S, Shahrabi J (2016a) ACORI: a novel ACO algorithm for rule induction. Knowl-Based Syst 97:175–187
Asadi S, Shahrabi J (2016b) RipMC: RIPPER for multiclass classification. Neurocomputing 191:19–33
Asadi S, Hadavandi E, Mehmanpazir F, Nakhostin MM (2012a) Hybridization of evolutionary Levenberg–Marquardt neural networks and data pre-processing for stock market prediction. Knowl-Based Syst 35:245–258
Article Google Scholar
Asadi S, Tavakoli A, Hejazi SR (2012b) A new hybrid for improvement of auto-regressive integrated moving average models applying particle swarm optimization. Expert Syst Appl 39:5332–5337
Article Google Scholar
Asadi S, Shahrabi J, Abbaszadeh P, Tabanmehr S (2013) A new hybrid artificial neural networks for rainfall-runoff process modeling. Neurocomputing 121:470–480
Article Google Scholar
Augasta MG, Kathirvalavakumar T (2012) A new discretization algorithm based on range coefficient of dispersion and skewness for neural networks classifier. Appl Soft Comput 12:619–625
Article Google Scholar
Baka A, Wettayaprasit W, Vanichayobon S (2014) A novel discretization technique using Class Attribute Interval Average. In: Fourth International Conference on Digital Information and Communication Technology and it’s Applications (DICTAP), Bangkok. IEEE, pp 95–100
Blake C, Merz C (2000) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, London
MATH Google Scholar
Cano A, Nguyen DT, Ventura S, Cios KJ (2016) ur-CAIM: improved CAIM discretization for unbalanced and balanced data. Soft Comput 20:173–188
Cococcioni M, Ducange P, Lazzerini B, Marcelloni F (2007) A Pareto-based multi-objective evolutionary approach to the identification of Mamdani fuzzy systems. Soft Comput 11:1013–1031
Article Google Scholar
Coello CAC, Van Veldhuizen DA, Lamont GB (2002) Evolutionary algorithms for solving multi-objective problems, vol 242. Springer, Berlin
Book MATH Google Scholar
de Sá CR, Soares C, Knobbe A, Azevedo P (2013) Jorge AM multi-interval discretization of continuous attributes for label ranking. In: Discovery science. Springer, Berlin, pp 155–169
de Sá CR, Soares C, Knobbe A (2016) Entropy-based discretization methods for ranking data. Inf Sci 329:921–936
Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. Lecture notes in computer science, vol 1917, pp 849–858
del Jesús MJ, Gámez JA, Puerta JM (2009) Evolutionary and metaheuristics based data mining. Soft Comput A Fusion Found Methodol Appl 13:209–212
Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56:52–64
Article MathSciNet MATH Google Scholar
Eshelman LJ (2014) The CHC adaptive search algorithm: how to have safe search when engaging. Found Genetic Algorithms 1991 (FOGA 1) 1:265
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of 13th international joint conference artificial intelligence (IJCAI), pp 1022–1029
Ferreira AJ, Figueiredo MA (2015) Feature discretization with relevance and mutual information criteria. In: Pattern recognition applications and methods. Springer, pp 101–118
Finner H (1993) On a monotonicity problem in step-down multiple test procedures. J Am Stat Assoc 88:920–923
Article MathSciNet MATH Google Scholar
García S, Fernández A, Luengo J, Herrera F (2009) A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13:959–977
Article Google Scholar
Garcia S, Luengo J, Sáez JA, López V, Herrera F (2013) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng 25:734–750
Article Google Scholar
García S, Luengo J, Herrera F (2015) Discretization. In: Data preprocessing in data mining. Springer, pp 245–283
Gonzalez-Abril L, Cuberos FJ, Velasco F, Ortega JA (2009) Ameva: an autonomous discretization algorithm. Expert Syst Appl 36:5327–5332
Article Google Scholar
Hochberg Y (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75:800–802
Article MathSciNet MATH Google Scholar
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
Hu H-W, Chen Y-L, Tang K (2009) A dynamic discretization approach for constructing decision trees with a continuous label. IEEE Trans Knowl Data Eng 21:1505–1514
Article Google Scholar
Huang W, Pan Y, Wu J (2013) Supervised discretization with GK $\tau $. Proc Comput Sci 17:114–120
Article Google Scholar
Huang W, Pan Y, Wu J (2014) Supervised discretization for optimal prediction. Proc Comput Sci 30:75–80
Article Google Scholar
Ishibuchi H, Yamamoto T (2004) Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining. Fuzzy Sets Syst 141:59–88
Article MATH Google Scholar
Ishibuchi H, Yamamoto T (2005) Rule weight specification in fuzzy rule-based classification systems. IEEE Trans Fuzzy Syst 13:428–435
Article Google Scholar
Jung Y-G, Kim KM, Kwon YM (2012) Using weighted hybrid discretization method to analyze climate changes. In: Computer applications for graphics, grid computing, and industrial environment. Springer, pp 189–195
Kerber R (1991) Chimerge: Discretization of numeric attributes. In: Proceedings of the tenth national conference on artificial intelligence. Aaai Press, pp 123–128
Kurgan L, Cios KJ (2004) CAIM discretization algorithm. IEEE Trans Knowl Data Eng 16:145–153
Article Google Scholar
Li JD (2008) A two-step rejection procedure for testing multiple hypotheses. J Stat Plan Inference 138:1521–1527
Article MathSciNet MATH Google Scholar
Liu H, Setiono R (1996) Dimensionality reduction via discretization. Knowl-Based Syst 9:67–72
Article Google Scholar
Liu H, Hussain F, Tan CL, Dash M (2002) Discretization: an enabling technique. Data Min Knowl Discov 6:393–423
Article MathSciNet Google Scholar
Madhu G, Rajinikanth T, Govardhan A (2014) Improve the classifier accuracy for continuous attributes in biomedical datasets using a new discretization method. Proc Comput Sci 31:671–679
Article Google Scholar
Mehmanpazir F, Asadi S (2016) Development of an evolutionary fuzzy expert system for estimating future behavior of stock price. J Ind Eng Int 1–18
Moskovitch R, Shahar Y (2015) Classification-driven temporal discretization of multivariate time series. Data Min Knowl Disc 29:871–913
Mukhopadhyay A, Maulik U, Bandyopadhyay S, Coello Coello C (2014) A survey of multiobjective evolutionary algorithms for data mining: part I. IEEE Trans Evolut Comput 18:4–19
Article Google Scholar
Ngatchou P, Zarei A, El-Sharkawi, M Pareto (2005) multi objective optimization. In: Proceedings of the 13th international conference on intelligent systems application to power systems. IEEE, pp 84–91
Nguyen H-V, Müller E, Vreeken J, Böhm K (2014) Unsupervised interaction-preserving discretization of multivariate data. Data Min Knowl Discov 28:1366–1397
Article MathSciNet MATH Google Scholar
Øhrn A (2000) The Rosetta C++ Library: overview of files and classes department of computer and information science. Norwegian University of Science and Technology (NTNU), Trondheim
Google Scholar
Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, Amsterdam
Rahman MG, Islam MZ (2016) Discretization of continuous attributes through low frequency numerical values and attribute interdependency. Expert Syst Appl 45:410–423
Article Google Scholar
Ramirez-Gallego S, Garcia S, Benitez JM, Herrera F (2016) Multivariate discretization based on evolutionary cut points selection for classification. IEEE Trans Cybern 46:595–608. doi:10.1109/TCYB.2015.2410143
Article Google Scholar
Ramírez-Gallego S, García S, Benítez JM, Herrera F (2016) Multivariate discretization based on evolutionary cut points selection for classification. IEEE Trans Cybern 46:595–608
Article Google Scholar
Razavi SH, Ebadati EOM, Asadi S, Kaur H (2015) An efficient grouping genetic algorithm for data clustering and big data analysis. In: Computational intelligence for big data analysis. Springer, pp 119–142
Sang Y, Jin Y, Li K, Qi H (2013) UniDis: a universal discretization technique. J Intell Inf Syst 40:327–348
Article Google Scholar
Sang Y, Qi H, Li K, Jin Y, Yan D, Gao S (2014) An effective discretization method for disposing high-dimensional data. Inf Sci 270:73–91
Shehzad K (2012) EDISC: a class-tailored discretization technique for rule-based classification. IEEE Trans Knowl Data Eng 24:1435–1447
Article Google Scholar
Tao G, Yan YG, Zou J, Liu J (2015) The discretization of continuous attributes based on improved SOM clustering. In: Applied mechanics and materials, Trans Tech Publ, pp 88–93
Tay FE, Shen L (2002) A modified chi2 algorithm for discretization. IEEE Trans Knowl Data Eng 14:666–670
Article Google Scholar
Wang C, Wang M, She Z, Cao L (2012) CD: a coupled discretization algorithm. In: Advances in knowledge discovery and data mining. Springer, pp 407–418
Wei Y, Qiu J, Karimi HR, Wang M (2014) Model reduction for continuous-time Markovian jump systems with incomplete statistics of mode information. Int J Syst Sci 45:1496–1507
Article MathSciNet MATH Google Scholar
Wei Y, Qiu J, Karimi HR (2015) Quantized ${\cal{H}}\infty $ filtering for continuous-time Markovian jump systems with deficient mode information. Asian J Control 17:1914–1923
Article MathSciNet MATH Google Scholar
Wei Y, Qiu J, Lam H-K, Wu L (2016a) Approaches to TS fuzzy-affine-model-based reliable output feedback control for nonlinear ITO stochastic systems. IEEE Trans Fuzzy Syst 99:1–14
Wei Y, Qiu J, Shi P, Lam H-K (2016b) A new design of H-infinity piecewise filtering for discrete-time nonlinear time-varying delay systems via TS fuzzy affine models. IEEE Trans Syst Man Cybern Syst 99:1–14
Yan D, Liu D, Sang Y (2014) A new approach for discretizing continuous attributes in learning systems. Neurocomputing 133:507–511
Article Google Scholar
Yang Y, Webb GI (2009) Discretization for naive-Bayes learning: managing discretization bias and variance. Mach Learn 74:39–74
Article Google Scholar
Yang Y, Webb GI, Wu X (2005) Discretization methods. In: Data mining and knowledge discovery handbook. Springer, pp 113–130
Zhao J, Han C, Wei B, Han D (2012) A novel univariate marginal distribution algorithm based discretization algorithm. Stat Probab Lett 82:2001–2007
Article MathSciNet MATH Google Scholar
Zighed DA, Rabaséda S, Rakotomalala R (1998) FUSINTER: a method for discretization of continuous attributes. Int J Uncertain Fuzziness Knowl-Based Syst 6:307–326
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering, University of Tehran, Farabi Campus, Tehran, Iran
Marzieh Hajizadeh Tahan & Shahrokh Asadi

Authors

Marzieh Hajizadeh Tahan
View author publications
You can also search for this author in PubMed Google Scholar
Shahrokh Asadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shahrokh Asadi.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tahan, M.H., Asadi, S. MEMOD: a novel multivariate evolutionary multi-objective discretization. Soft Comput 22, 301–323 (2018). https://doi.org/10.1007/s00500-016-2475-5

Download citation

Published: 05 January 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s00500-016-2475-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MEMOD: a novel multivariate evolutionary multi-objective discretization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Wrapper Evolutionary Approach for Supervised Multivariate Discretization: A Case Study on Decision Trees

A Logarithmic Distance-Based Multi-Objective Genetic Programming Approach for Classification of Imbalanced Data

A Review of Multiobjective Evolutionary Algorithms for Data Clustering Problems

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

MEMOD: a novel multivariate evolutionary multi-objective discretization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Wrapper Evolutionary Approach for Supervised Multivariate Discretization: A Case Study on Decision Trees

A Logarithmic Distance-Based Multi-Objective Genetic Programming Approach for Classification of Imbalanced Data

A Review of Multiobjective Evolutionary Algorithms for Data Clustering Problems

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation