Skip to main content
Log in

Instance selection and feature extraction using cuttlefish optimization algorithm and principal component analysis using decision tree

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Instance selection and feature extraction is one of the most important task in data mining, due to the huge amount of data is constantly being produced in many fields. If the dataset is very large means most of the existing machine learning algorithms are inapplicable to handle such huge amount of data and computational cost is high. Two of the approaches have been used for solving this problem. One is scaling up algorithms and another one is data reduction. Scaling up data mining algorithm is not always feasible, but data reduction is possible. In this paper we take both, instance selection and feature extraction for data reduction. Instance selection is a technique that will reduce the size of the original training data. Feature extraction is input data having m dimension space that should be mapped into lower dimension space i.e., eliminate those components which are contributing less information. In this paper Cuttlefish optimization algorithm is used for instance selection, while principal component analysis is used for feature extraction. The combination of feature extraction and instance selection will reduce the large amount of computational time of training the classifiers. The optimal extracted subset of data points and reduced feature space are providing almost similar detection rate, accuracy rate, false positive rate and takes less amount of computational time for training the classifiers what we obtained from using original dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Huan, L., Motoda, H.: Instance Selection and Construction for Data Mining The Kluwer International Series in Engineering and Computer Science. Springer, New York (2001)

    Google Scholar 

  2. Arnaiz-Gonzalez, A., Diez-Pastor, J.-F., Rodriguez, J.J., Gracia-Osoria, C.: Instance selection of linear complexity for big data. Knowl. Based Syst. 107, 83–95 (2016)

    Article  Google Scholar 

  3. Huan, L., Hiroshi, M.: On issues of instance selection. Data Min. Knowl. Discov. 6, 115–130 (2002)

    Article  MathSciNet  Google Scholar 

  4. Sarveniazi, A.: An actual survey of dimensionality reduction. Am. J. Comput. Math. 4, 55–57 (2014)

    Article  Google Scholar 

  5. Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34(2), 133–143 (2010)

    Article  Google Scholar 

  6. Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Trans. Evol. Comput. 4(2), 164–171 (2000)

    Article  Google Scholar 

  7. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)

    Article  MATH  Google Scholar 

  8. Kordas, M., Klos-Witkowska, A.: Increasing speed of genetic algorithm based instance selection. In: The 8th IEEE international conference on intelligent data acquisition and advanced computing system: technology and applications, September 2015, Warsaw, Poland (2015)

  9. Silva, D.A.N.S., Souza, L.C., Motta, G.H.M.B.: An Instance selection method for large datasets based on markov geometric diffusion. Data Knowl. Eng. 101, 24–41 (2016)

    Article  Google Scholar 

  10. Tsai, C.-F., Eberale, W., Chu, C.-Y.: Genetic algorithms in feature and instance selection. Knowl. Based Syst. 39, 240–247 (2013)

    Article  Google Scholar 

  11. Derrac, J., García, S., Herrera, F.: A survey on evolutionary instance selection and generation. Int. J. Appl. Metaheuristic Comput. 1(1), 60–92 (2010)

    Article  Google Scholar 

  12. Garcia, S., Derrac, J., Triguero, I., Carmona, C.J., Herrera, F.: Evolutionary-based selection of generalized instances for imbalanced classification. Knowl. Based Syst. 25(1), 3–12 (2012)

    Article  Google Scholar 

  13. Parez-Jimenez, A.J., Perez-Cortex, J.C.: Genetic algorithms for linear feature extraction. Pattern Recognit. Lett. 27(13), 1508–1514 (2006)

    Article  Google Scholar 

  14. Fu, Z., Golden, B.L., Lele, S., Raghavan, S., Wasil, E.A.: A genetic algorithm-based approach for building accurate decision trees. INFORMS J. Comput. 15(1), 3–22 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  15. Eesa, A.S., Orman, Z., Brifcani, A.M.A.: A novel feature-selection approach based on the cuttlefish optimization algorithm for intrusion detection systems. Expert Syst. Appl. 42, 2670–2679 (2015)

    Article  Google Scholar 

  16. Eesa, A.S., Brifcani, A.M.A., Orman, Z.: A new tool for global optimization problems-cuttlefish algorithm. Int. J. Math. Comput. Stat. Nat. Phys. Eng. 8(9), 1203–1207 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Karunakaran.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suganthi, M., Karunakaran, V. Instance selection and feature extraction using cuttlefish optimization algorithm and principal component analysis using decision tree. Cluster Comput 22 (Suppl 1), 89–101 (2019). https://doi.org/10.1007/s10586-018-1821-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-018-1821-z

Keywords

Navigation