Skip to main content

Machine Learning Methods Based Preprocessing to Improve Categorical Data Classification

  • Conference paper
  • First Online:
Book cover Intelligent Data Engineering and Automated Learning – IDEAL 2018 (IDEAL 2018)

Abstract

The following study is aimed at dealing with large volumes of data whose main characteristic is to contain a high number of variables, most of which are categorical in nature. In the knowledge extraction process, Knowledge Discovery in Databases (KDD), it is very common to deal with a stage of data pre-processing and dimensionality reduction. A key part of extracting information is having high quality data. This paper proposes the use of the Pairwise and Listwise methods as part of the dimensionality reduction process, when there is a high level of missing values present in one or more variables. As part of the pre-processing, we generate n-clusters using Kohonen Self-Organizing Maps (SOM) algorithm with H2O on R. A comparison of the performance and accuracy of classification algorithms is made with the complete subdata set and the algorithms are applied to each cluster. As a case study, we analyzed the characteristics that influence the level of schooling of women of childbearing age.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aldehim, G., Wang, W.: Determining appropriate approaches for using data in feature selection. Int. J. Mach. Learn. Cybern. 8(3), 915–928 (2017)

    Article  Google Scholar 

  2. Chackiel, J.: Métodos de estimaciones demográficas de pueblos indígenas a partir de censos de población: La Fecundidad y la Mortalidad. In: Pueblos indigenas y afrodescendientes de América Latina y el Caribe: relevancia y pertinencia de la informacion sociodemografica para politicas y programas, p. 30 (2005)

    Google Scholar 

  3. Gorade, M.S.M., Deo, A., Purohit, P.: A study of some data mining classification techniques. IRJET 4, 3112–3115 (2017)

    Google Scholar 

  4. Acuña, M.: Redatam Informa. Redatam Inf. 19(19), 13–17 (2013)

    Google Scholar 

  5. Mojirsheibani, M., Shaw, C.: Classification with incomplete functional covariates. Stat. Prob. Lett. 139, 40–46 (2018)

    Article  MathSciNet  Google Scholar 

  6. Pandey, G., Ren, Z., Wang, S., Veijalainen, J., de Rijke, M.: Linear feature extraction for ranking. Inf. Retrieval J. 1, 1–26 (2018)

    Google Scholar 

  7. Ramírez-Gallego, S., Krawczyk, B., García, S., Woźniak, M., Herrera, F.: A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239, 39–57 (2017)

    Article  Google Scholar 

  8. Roy, A., Cruz, R.M., Sabourin, R., Cavalcanti, G.D.: A study on combining dynamic selection and data preprocessing for imbalance learning. Neurocomputing 286, 179–192 (2018)

    Article  Google Scholar 

  9. Shirzad, M.B., Keyvanpour, M.R.: A systematic study of feature selection methods for learning to rank algorithms. Int. J. Inf. Retr. Res. (IJIRR) 8(3), 46–67 (2018)

    Google Scholar 

  10. Spolaôr, N., Cherman, E.A., Monard, M.C., Lee, H.D.: A comparison of multi-label feature selection methods using the problem transformation approach. Electron. Notes Theor. Comput. Sci. 292, 135–151 (2013)

    Article  Google Scholar 

  11. Zulkepli, F.S., Ibrahim, R., Saeed, F.: Data preprocessing techniques for research performance analysis. In: Patnaik, S., Popentiu-Vladicescu, F. (eds.) Recent Developments in Intelligent Computing, Communication and Devices. AISC, vol. 555, pp. 157–162. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-3779-5_20

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zoila Ruiz-Chavez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ruiz-Chavez, Z., Salvador-Meneses, J., Garcia-Rodriguez, J. (2018). Machine Learning Methods Based Preprocessing to Improve Categorical Data Classification. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2018. IDEAL 2018. Lecture Notes in Computer Science(), vol 11314. Springer, Cham. https://doi.org/10.1007/978-3-030-03493-1_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03493-1_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03492-4

  • Online ISBN: 978-3-030-03493-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics