Skip to main content

Advertisement

MEMOD: a novel multivariate evolutionary multi-objective discretization

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Discretization is an important preprocessing technique, especially in classification problems. It reduces and simplifies data, accelerates the learning process, and improves learner performance. The most challenging aspect of the discretization process is to maintain the accuracy of the classification algorithm and to prevent information loss while reducing the number of discretized values. In this paper, using evolutionary multi-objective optimization, classification error (the first objective function) and number of cut points (the second objective function) are simultaneously reduced. The third objective function involves selecting low-frequency cut points so that a smaller degree of information is lost during this conversion (from continuous to discrete). To the best of our knowledge, this is the first paper to consider the discretization process as a multi-objective optimization problem. Previous discretization methods result in only one solution. However, in real-world problems, decision makers often need several alternatives to make better decisions—a requirement which cannot be fulfilled using these techniques. The multi-objective nature of the proposed algorithm enables the generation of numerous solutions (i.e., the Pareto front) allowing the user to select the most appropriate solution according to the nuances of the problem. A total of 20 benchmark data sets were used to test the performance of the proposed algorithm. Our results show that the proposed algorithm offers superior performance compared to other methods in the literature. Thus, it presents better discretization in classification problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Acosta-Mesa H-G, Rechy-Ramírez F, Mezura-Montes E, Cruz-Ramírez N, Jiménez RH (2014) Application of time series discretization using evolutionary programming for classification of precancerous cervical lesions. J. Biomed. Inf. 49:73–83

    Article  Google Scholar 

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of 20th international conference on very large data bases, VLDB, pp 487–499

  • Alcala-Fdez J et al (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13:307–318

  • Ali R, Siddiqi MH, Lee S (2015) Rough set-based approaches for discretization: a compact review. Artif Intell Rev 44:235–263

  • Asadi S, Shahrabi J (2016a) ACORI: a novel ACO algorithm for rule induction. Knowl-Based Syst 97:175–187

  • Asadi S, Shahrabi J (2016b) RipMC: RIPPER for multiclass classification. Neurocomputing 191:19–33

  • Asadi S, Hadavandi E, Mehmanpazir F, Nakhostin MM (2012a) Hybridization of evolutionary Levenberg–Marquardt neural networks and data pre-processing for stock market prediction. Knowl-Based Syst 35:245–258

    Article  Google Scholar 

  • Asadi S, Tavakoli A, Hejazi SR (2012b) A new hybrid for improvement of auto-regressive integrated moving average models applying particle swarm optimization. Expert Syst Appl 39:5332–5337

    Article  Google Scholar 

  • Asadi S, Shahrabi J, Abbaszadeh P, Tabanmehr S (2013) A new hybrid artificial neural networks for rainfall-runoff process modeling. Neurocomputing 121:470–480

    Article  Google Scholar 

  • Augasta MG, Kathirvalavakumar T (2012) A new discretization algorithm based on range coefficient of dispersion and skewness for neural networks classifier. Appl Soft Comput 12:619–625

    Article  Google Scholar 

  • Baka A, Wettayaprasit W, Vanichayobon S (2014) A novel discretization technique using Class Attribute Interval Average. In: Fourth International Conference on Digital Information and Communication Technology and it’s Applications (DICTAP), Bangkok. IEEE, pp 95–100

  • Blake C, Merz C (2000) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html

  • Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, London

    MATH  Google Scholar 

  • Cano A, Nguyen DT, Ventura S, Cios KJ (2016) ur-CAIM: improved CAIM discretization for unbalanced and balanced data. Soft Comput 20:173–188

  • Cococcioni M, Ducange P, Lazzerini B, Marcelloni F (2007) A Pareto-based multi-objective evolutionary approach to the identification of Mamdani fuzzy systems. Soft Comput 11:1013–1031

    Article  Google Scholar 

  • Coello CAC, Van Veldhuizen DA, Lamont GB (2002) Evolutionary algorithms for solving multi-objective problems, vol 242. Springer, Berlin

    Book  MATH  Google Scholar 

  • de Sá CR, Soares C, Knobbe A, Azevedo P (2013) Jorge AM multi-interval discretization of continuous attributes for label ranking. In: Discovery science. Springer, Berlin, pp 155–169

  • de Sá CR, Soares C, Knobbe A (2016) Entropy-based discretization methods for ranking data. Inf Sci 329:921–936

  • Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. Lecture notes in computer science, vol 1917, pp 849–858

  • del Jesús MJ, Gámez JA, Puerta JM (2009) Evolutionary and metaheuristics based data mining. Soft Comput A Fusion Found Methodol Appl 13:209–212

    Google Scholar 

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56:52–64

    Article  MathSciNet  MATH  Google Scholar 

  • Eshelman LJ (2014) The CHC adaptive search algorithm: how to have safe search when engaging. Found Genetic Algorithms 1991 (FOGA 1) 1:265

  • Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of 13th international joint conference artificial intelligence (IJCAI), pp 1022–1029

  • Ferreira AJ, Figueiredo MA (2015) Feature discretization with relevance and mutual information criteria. In: Pattern recognition applications and methods. Springer, pp 101–118

  • Finner H (1993) On a monotonicity problem in step-down multiple test procedures. J Am Stat Assoc 88:920–923

    Article  MathSciNet  MATH  Google Scholar 

  • García S, Fernández A, Luengo J, Herrera F (2009) A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13:959–977

    Article  Google Scholar 

  • Garcia S, Luengo J, Sáez JA, López V, Herrera F (2013) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng 25:734–750

    Article  Google Scholar 

  • García S, Luengo J, Herrera F (2015) Discretization. In: Data preprocessing in data mining. Springer, pp 245–283

  • Gonzalez-Abril L, Cuberos FJ, Velasco F, Ortega JA (2009) Ameva: an autonomous discretization algorithm. Expert Syst Appl 36:5327–5332

    Article  Google Scholar 

  • Hochberg Y (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75:800–802

    Article  MathSciNet  MATH  Google Scholar 

  • Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70

  • Hu H-W, Chen Y-L, Tang K (2009) A dynamic discretization approach for constructing decision trees with a continuous label. IEEE Trans Knowl Data Eng 21:1505–1514

    Article  Google Scholar 

  • Huang W, Pan Y, Wu J (2013) Supervised discretization with GK \(\tau \). Proc Comput Sci 17:114–120

    Article  Google Scholar 

  • Huang W, Pan Y, Wu J (2014) Supervised discretization for optimal prediction. Proc Comput Sci 30:75–80

    Article  Google Scholar 

  • Ishibuchi H, Yamamoto T (2004) Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining. Fuzzy Sets Syst 141:59–88

    Article  MATH  Google Scholar 

  • Ishibuchi H, Yamamoto T (2005) Rule weight specification in fuzzy rule-based classification systems. IEEE Trans Fuzzy Syst 13:428–435

    Article  Google Scholar 

  • Jung Y-G, Kim KM, Kwon YM (2012) Using weighted hybrid discretization method to analyze climate changes. In: Computer applications for graphics, grid computing, and industrial environment. Springer, pp 189–195

  • Kerber R (1991) Chimerge: Discretization of numeric attributes. In: Proceedings of the tenth national conference on artificial intelligence. Aaai Press, pp 123–128

  • Kurgan L, Cios KJ (2004) CAIM discretization algorithm. IEEE Trans Knowl Data Eng 16:145–153

    Article  Google Scholar 

  • Li JD (2008) A two-step rejection procedure for testing multiple hypotheses. J Stat Plan Inference 138:1521–1527

    Article  MathSciNet  MATH  Google Scholar 

  • Liu H, Setiono R (1996) Dimensionality reduction via discretization. Knowl-Based Syst 9:67–72

    Article  Google Scholar 

  • Liu H, Hussain F, Tan CL, Dash M (2002) Discretization: an enabling technique. Data Min Knowl Discov 6:393–423

    Article  MathSciNet  Google Scholar 

  • Madhu G, Rajinikanth T, Govardhan A (2014) Improve the classifier accuracy for continuous attributes in biomedical datasets using a new discretization method. Proc Comput Sci 31:671–679

    Article  Google Scholar 

  • Mehmanpazir F, Asadi S (2016) Development of an evolutionary fuzzy expert system for estimating future behavior of stock price. J Ind Eng Int 1–18

  • Moskovitch R, Shahar Y (2015) Classification-driven temporal discretization of multivariate time series. Data Min Knowl Disc 29:871–913

  • Mukhopadhyay A, Maulik U, Bandyopadhyay S, Coello Coello C (2014) A survey of multiobjective evolutionary algorithms for data mining: part I. IEEE Trans Evolut Comput 18:4–19

    Article  Google Scholar 

  • Ngatchou P, Zarei A, El-Sharkawi, M Pareto (2005) multi objective optimization. In: Proceedings of the 13th international conference on intelligent systems application to power systems. IEEE, pp 84–91

  • Nguyen H-V, Müller E, Vreeken J, Böhm K (2014) Unsupervised interaction-preserving discretization of multivariate data. Data Min Knowl Discov 28:1366–1397

    Article  MathSciNet  MATH  Google Scholar 

  • Øhrn A (2000) The Rosetta C++ Library: overview of files and classes department of computer and information science. Norwegian University of Science and Technology (NTNU), Trondheim

    Google Scholar 

  • Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, Amsterdam

  • Rahman MG, Islam MZ (2016) Discretization of continuous attributes through low frequency numerical values and attribute interdependency. Expert Syst Appl 45:410–423

    Article  Google Scholar 

  • Ramirez-Gallego S, Garcia S, Benitez JM, Herrera F (2016) Multivariate discretization based on evolutionary cut points selection for classification. IEEE Trans Cybern 46:595–608. doi:10.1109/TCYB.2015.2410143

    Article  Google Scholar 

  • Ramírez-Gallego S, García S, Benítez JM, Herrera F (2016) Multivariate discretization based on evolutionary cut points selection for classification. IEEE Trans Cybern 46:595–608

    Article  Google Scholar 

  • Razavi SH, Ebadati EOM, Asadi S, Kaur H (2015) An efficient grouping genetic algorithm for data clustering and big data analysis. In: Computational intelligence for big data analysis. Springer, pp 119–142

  • Sang Y, Jin Y, Li K, Qi H (2013) UniDis: a universal discretization technique. J Intell Inf Syst 40:327–348

    Article  Google Scholar 

  • Sang Y, Qi H, Li K, Jin Y, Yan D, Gao S (2014) An effective discretization method for disposing high-dimensional data. Inf Sci 270:73–91

  • Shehzad K (2012) EDISC: a class-tailored discretization technique for rule-based classification. IEEE Trans Knowl Data Eng 24:1435–1447

    Article  Google Scholar 

  • Tao G, Yan YG, Zou J, Liu J (2015) The discretization of continuous attributes based on improved SOM clustering. In: Applied mechanics and materials, Trans Tech Publ, pp 88–93

  • Tay FE, Shen L (2002) A modified chi2 algorithm for discretization. IEEE Trans Knowl Data Eng 14:666–670

    Article  Google Scholar 

  • Wang C, Wang M, She Z, Cao L (2012) CD: a coupled discretization algorithm. In: Advances in knowledge discovery and data mining. Springer, pp 407–418

  • Wei Y, Qiu J, Karimi HR, Wang M (2014) Model reduction for continuous-time Markovian jump systems with incomplete statistics of mode information. Int J Syst Sci 45:1496–1507

    Article  MathSciNet  MATH  Google Scholar 

  • Wei Y, Qiu J, Karimi HR (2015) Quantized \({\cal{H}}\infty \) filtering for continuous-time Markovian jump systems with deficient mode information. Asian J Control 17:1914–1923

    Article  MathSciNet  MATH  Google Scholar 

  • Wei Y, Qiu J, Lam H-K, Wu L (2016a) Approaches to TS fuzzy-affine-model-based reliable output feedback control for nonlinear ITO stochastic systems. IEEE Trans Fuzzy Syst 99:1–14

  • Wei Y, Qiu J, Shi P, Lam H-K (2016b) A new design of H-infinity piecewise filtering for discrete-time nonlinear time-varying delay systems via TS fuzzy affine models. IEEE Trans Syst Man Cybern Syst 99:1–14

  • Yan D, Liu D, Sang Y (2014) A new approach for discretizing continuous attributes in learning systems. Neurocomputing 133:507–511

    Article  Google Scholar 

  • Yang Y, Webb GI (2009) Discretization for naive-Bayes learning: managing discretization bias and variance. Mach Learn 74:39–74

    Article  Google Scholar 

  • Yang Y, Webb GI, Wu X (2005) Discretization methods. In: Data mining and knowledge discovery handbook. Springer, pp 113–130

  • Zhao J, Han C, Wei B, Han D (2012) A novel univariate marginal distribution algorithm based discretization algorithm. Stat Probab Lett 82:2001–2007

    Article  MathSciNet  MATH  Google Scholar 

  • Zighed DA, Rabaséda S, Rakotomalala R (1998) FUSINTER: a method for discretization of continuous attributes. Int J Uncertain Fuzziness Knowl-Based Syst 6:307–326

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shahrokh Asadi.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tahan, M.H., Asadi, S. MEMOD: a novel multivariate evolutionary multi-objective discretization. Soft Comput 22, 301–323 (2018). https://doi.org/10.1007/s00500-016-2475-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-016-2475-5

Keywords