Abstract
Instance selection and feature extraction is one of the most important task in data mining, due to the huge amount of data is constantly being produced in many fields. If the dataset is very large means most of the existing machine learning algorithms are inapplicable to handle such huge amount of data and computational cost is high. Two of the approaches have been used for solving this problem. One is scaling up algorithms and another one is data reduction. Scaling up data mining algorithm is not always feasible, but data reduction is possible. In this paper we take both, instance selection and feature extraction for data reduction. Instance selection is a technique that will reduce the size of the original training data. Feature extraction is input data having m dimension space that should be mapped into lower dimension space i.e., eliminate those components which are contributing less information. In this paper Cuttlefish optimization algorithm is used for instance selection, while principal component analysis is used for feature extraction. The combination of feature extraction and instance selection will reduce the large amount of computational time of training the classifiers. The optimal extracted subset of data points and reduced feature space are providing almost similar detection rate, accuracy rate, false positive rate and takes less amount of computational time for training the classifiers what we obtained from using original dataset.
Similar content being viewed by others
References
Huan, L., Motoda, H.: Instance Selection and Construction for Data Mining The Kluwer International Series in Engineering and Computer Science. Springer, New York (2001)
Arnaiz-Gonzalez, A., Diez-Pastor, J.-F., Rodriguez, J.J., Gracia-Osoria, C.: Instance selection of linear complexity for big data. Knowl. Based Syst. 107, 83–95 (2016)
Huan, L., Hiroshi, M.: On issues of instance selection. Data Min. Knowl. Discov. 6, 115–130 (2002)
Sarveniazi, A.: An actual survey of dimensionality reduction. Am. J. Comput. Math. 4, 55–57 (2014)
Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34(2), 133–143 (2010)
Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Trans. Evol. Comput. 4(2), 164–171 (2000)
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)
Kordas, M., Klos-Witkowska, A.: Increasing speed of genetic algorithm based instance selection. In: The 8th IEEE international conference on intelligent data acquisition and advanced computing system: technology and applications, September 2015, Warsaw, Poland (2015)
Silva, D.A.N.S., Souza, L.C., Motta, G.H.M.B.: An Instance selection method for large datasets based on markov geometric diffusion. Data Knowl. Eng. 101, 24–41 (2016)
Tsai, C.-F., Eberale, W., Chu, C.-Y.: Genetic algorithms in feature and instance selection. Knowl. Based Syst. 39, 240–247 (2013)
Derrac, J., García, S., Herrera, F.: A survey on evolutionary instance selection and generation. Int. J. Appl. Metaheuristic Comput. 1(1), 60–92 (2010)
Garcia, S., Derrac, J., Triguero, I., Carmona, C.J., Herrera, F.: Evolutionary-based selection of generalized instances for imbalanced classification. Knowl. Based Syst. 25(1), 3–12 (2012)
Parez-Jimenez, A.J., Perez-Cortex, J.C.: Genetic algorithms for linear feature extraction. Pattern Recognit. Lett. 27(13), 1508–1514 (2006)
Fu, Z., Golden, B.L., Lele, S., Raghavan, S., Wasil, E.A.: A genetic algorithm-based approach for building accurate decision trees. INFORMS J. Comput. 15(1), 3–22 (2003)
Eesa, A.S., Orman, Z., Brifcani, A.M.A.: A novel feature-selection approach based on the cuttlefish optimization algorithm for intrusion detection systems. Expert Syst. Appl. 42, 2670–2679 (2015)
Eesa, A.S., Brifcani, A.M.A., Orman, Z.: A new tool for global optimization problems-cuttlefish algorithm. Int. J. Math. Comput. Stat. Nat. Phys. Eng. 8(9), 1203–1207 (2014)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Suganthi, M., Karunakaran, V. Instance selection and feature extraction using cuttlefish optimization algorithm and principal component analysis using decision tree. Cluster Comput 22 (Suppl 1), 89–101 (2019). https://doi.org/10.1007/s10586-018-1821-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-018-1821-z