Abstract
Feature selection and discretisation have shown their effectiveness for data preprocessing especially for high-dimensional data with many irrelevant features. While feature selection selects only relevant features, feature discretisation finds a discrete representation of data that contains enough information but ignoring some minor fluctuation. These techniques are usually applied in two stages, discretisation and then selection since many feature selection methods work only on discrete features. Most commonly used discretisation methods are univariate in which each feature is discretised independently; therefore, the feature selection stage may not work efficiently since information showing feature interaction is not considered in the discretisation process. In this study, we propose a new method called PSO-DFS using bare-bone particle swarm optimisation (BBPSO) for discretisation and feature selection in a single stage. The results on ten high-dimensional datasets show that PSO-DFS obtains a substantial dimensionality reduction for all datasets. The classification performance is significantly improved or at least maintained on nine out of ten datasets by using the transformed “small” data obtained from PSO-DFS. Compared to applying the two-stage approach which uses PSO for feature selection on the discretised data, PSO-DFS achieves better performance on six datasets, and similar performance on three datasets with a much smaller number of features selected.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014)
Xue, B., Cervante, L., Shang, L., Browne, W., Zhang, M.: A multi-objective particle swarm optimisation for filter-based feature selection in classification problems. Connection Sci. 24, 91–116 (2012)
Ferreira, A.J., Figueiredo, M.A.: Efficient feature selection filters for high-dimensional data. Pattern Recogn. Lett. 33, 1794–1804 (2012)
Tran, B., Xue, B., Zhang, M.: Improved PSO for feature selection on high-dimensional datasets. In: Dick, G., et al. (eds.) SEAL 2014. LNCS, vol. 8886, pp. 503–515. Springer, Heidelberg (2014)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 3, 185–205 (2005)
Dougherty, J., Kohavi, R., Sahami, M., et al.: Supervised and unsupervised discretization of continuous features. In: Machine Learning: Proceedings of the Twelfth International Conference, vol. 12, pp. 194–202 (1995)
Ferreira, A.J., Figueiredo, M.A.: An unsupervised approach to feature discretization and selection. Pattern Recognit. 45, 3048–3060 (2012)
Chao, S., Li, Y.: Multivariate interdependent discretization for continuous attribute. In: Third International Conference on Information Technology and Applications, vol. 1, pp. 167–172. IEEE (2005)
Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micro Machine and Human Science, pp. 39–43 (1995)
Xue, B., Zhang, M., Browne, W.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43, 1656–1671 (2013)
Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl. Soft Comput. 18, 261–276 (2014)
Cervante, L., Xue, B., Zhang, M., Shang, L.: Binary particle swarm optimisation for feature selection: a filter based approach. In: IEEE Congress on Evolutionary Computation (CEC 2012), pp. 881–888 (2012)
Mohamad, M., Omatu, S., Deris, S., Yoshioka, M.: A modified binary particle swarm optimization for selecting the small subset of informative genes from gene expression data. Inf. Technol. Biomed. 15, 813–822 (2011)
Zhou, W., Dickerson, J.A.: A novel class dependent feature selection method for cancer biomarker discovery. Comput. Biol. Med. 47, 66–75 (2014)
Van den Bergh, F., Engelbrecht, A.P.: A study of particle swarm optimization particle trajectories. Inf. Sci. 176, 937–971 (2006)
Kennedy, J.: Bare bones particle swarms. In: Proceedings of IEEE Swarm Intelligence Symposium (SIS 2003), pp. 80–87. IEEE (2003)
Zhang, Y., Gong, D., Hu, Y., Zhang, W.: Feature selection algorithm based on bare bones particle swarm optimization. Neurocomputing 148, 150–157 (2015)
Garcia, S., Luengo, J., Sáez, J.A., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25, 734–750 (2013)
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Disc. 6, 393–423 (2002)
Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32, 47–58 (2006)
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 164–178. Springer, Heidelberg (1991)
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–90 (1993)
Grzymala-Busse, J.W.: Discretization based on entropy and multiple scanning. Entropy 15, 1486–1502 (2013)
Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. Machine Learning (1993)
Cano, A., Nguyen, D.T., Ventura, S., Cios, K.J.: ur-CAIM: improved CAIM discretization for unbalanced and balanced data. Soft Comput. 20, 173–188 (2014)
Yang, P., Li, J.S., Huang, Y.X.: Hdd: a hypercube division-based algorithm for discretisation. Int. J. Syst. Sci. 42, 557–566 (2011)
Flores, J.L., Inza, I., Larrañaga, P.: Wrapper discretization by means of estimation of distribution algorithms. Intell. Data Anal. 11, 525–545 (2007)
Ramirez-Gallego, S., Garcia, S., Benitez, J.M., Herrera, F.: Multivariate discretization based on evolutionary cut points selection for classification. IEEE Trans. Cybern. (2015)
Mahanta, P., Ahmed, H.A., Kalita, J.K., Bhattacharyya, D.K.: Discretization in gene expression data analysis: a selected survey. In: Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology, pp. 69–75. ACM (2012)
Liu, H., Setiono, R.: Chi2: feature selection and discretization of numeric attributes. In: Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, TAI 1995, p. 88. IEEE Computer Society (1995)
Kerber, R.: Chimerge: discretization of numeric attributes. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 123–128. AAAI Press (1992)
Sheela, J.L., Shanthi, D.V.: An approach for discretization and feature selection of continuous-valued attributes in medical images for classification learning. Int. J. Comput. Theory Eng. 1, 154–158 (2009)
Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 129–134. AAAI Press (1992)
Tran, B., Xue, B., Zhang, M.: Overview of particle swarm optimisation for feature selection in classification. In: Dick, G., et al. (eds.) SEAL 2014. LNCS, vol. 8886, pp. 605–617. Springer, Heidelberg (2014)
Zhu, Z., Ong, Y.S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn. 40, 3236–3248 (2007)
Patterson, G., Zhang, M.: Fitness functions in genetic programming for classification with unbalanced data. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 769–775. Springer, Heidelberg (2007)
Chuang, L.Y., Chang, H.W., Tu, C.J., Yang, C.H.: Improved binary PSO for feature selection using gene expression data. Comput. Biol. Chem. 32, 29–38 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Tran, B., Xue, B., Zhang, M. (2016). Bare-Bone Particle Swarm Optimisation for Simultaneously Discretising and Selecting Features for High-Dimensional Classification. In: Squillero, G., Burelli, P. (eds) Applications of Evolutionary Computation. EvoApplications 2016. Lecture Notes in Computer Science(), vol 9597. Springer, Cham. https://doi.org/10.1007/978-3-319-31204-0_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-31204-0_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31203-3
Online ISBN: 978-3-319-31204-0
eBook Packages: Computer ScienceComputer Science (R0)