Skip to main content

Evaluating the Impact of Categorical Data Encoding and Scaling on Neural Network Classification Performance: The Case of Repeat Consumption of Identical Cultural Goods

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 311))

Abstract

This article investigated the impact of categorical input encoding and scaling approaches on neural network sensitivity and overall classification performance in the context of predicting the repeat viewing propensity of movie goers. The results show that neural network out of sample minimum sensitivity and overall classification performance are indifferent to the scaling of the categorical inputs. However, the encoding of inputs had a significant impact on classification accuracy and utilising ordinal or thermometer encoding approaches for categorical inputs significantly increases the out of sample accuracy of the neural network classifier. These findings confirm that the impact of categorical encoding is problem specific for an ordinal approach, and support thermometer encoding as most suitable for categorical inputs. The classification performance of neural networks was compared against a logistic regression model and the results show that in this instance, the non-parametric approach does not offer any advantage over standard statistical models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Paliwal, M., Kumar, U.A.: Neural networks and statistical techniques: A review of applications. ESWA 36(1), 2–17 (2009)

    Google Scholar 

  2. Brouwer, R.: A feed-forward network for input that is both categorical and quantitative. NN (2002)

    Google Scholar 

  3. Crone, S., Lessmann, S., Stahlbock, R.: The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing. EJOR 9(16), 781–800 (2006)

    Article  MathSciNet  Google Scholar 

  4. Niu, D., Wang, Y., Wu, D.D.: Power load forecasting using support vector machine and ant colony optimization. ESWA 37(3), 2531–2539 (2010)

    Google Scholar 

  5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)

    MATH  Google Scholar 

  6. Kim, K., Han, I.: Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index. ESWA 19(2), 125–132 (2000)

    MathSciNet  Google Scholar 

  7. Collins, A., Hand, C., Linnell, M.: Analyzing repeat consumption of identical cultural goods: some exploratory evidence from moviegoing. J. Cult. Econ. 32(3), 187–199 (2008)

    Article  Google Scholar 

  8. Sharda, R., Delen, D.: Predicting box-office success of motion pictures with neural networks. ESWA 30(2), 243–254 (2006)

    Google Scholar 

  9. Zhang, L., Luo, J., Yang, S.: Forecasting box office revenue of movies with BP neural network. ESWA 36(2), 6580–6587 (2009)

    Google Scholar 

  10. Kim, S.: Prediction of hotel bankruptcy using support vector machine, artificial neural network, logistic regression, and multivariate discriminant analysis. Serv. Ind. J. 31(3), 441–468 (2011)

    Article  Google Scholar 

  11. Mazzatorta, P., Benfenati, E., Neagu, D., Gini, G.: The importance of scaling in data mining for toxicity prediction. JCICS 42(5), 1250–1255 (2002)

    Google Scholar 

  12. Viaene, S., Dedene, G., Derrig, R.: Auto claim fraud detection using Bayesian learning neural networks. ESWA 29(3), 653–666 (2005)

    Google Scholar 

  13. Sahoo, G., Ray, C., Mehnert, E., Keefer, D.: Application of artificial neural networks to assess pesticide contamination in shallow groundwater. SCTEN 367(1), 234–251 (2006)

    Google Scholar 

  14. Setiono, R., Thong, J., Yap, C.: Symbolic rule extraction from neural networks: An application to identifying organizations adopting IT. Inform. & Manage. 34(2), 91–101 (1998)

    Article  Google Scholar 

  15. Hsu, C.: Generalizing self-organizing map for categorical data. NN (2006)

    Google Scholar 

  16. Sakai, S., Kobayashi, K., Toyabe, S.I., Mandai, N., Kanda, T., Akazawa, T.: Comparison of the Levels of Accuracy of an Artificial Neural Network Model and a Logistic Regression Model for the Diagnosis of Acute Appendicitis. J. Med. Syst. 31(5), 357–364 (2007)

    Article  Google Scholar 

  17. Lai, K.K., Yu, L., Wang, S., Zhou, L.: Neural Network Metalearning for Credit Scoring. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006, Part I. LNCS, vol. 4113, pp. 403–408. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Basheer, I., Hajmeer, M.: Artificial neural networks: fundamentals, computing, design, and application. J. Microbiol. Methods 43(1), 3–31 (2000)

    Article  Google Scholar 

  19. Kaastra, I., Boyd, M.: Designing a neural network for forecasting financial and economic time series. NC 10(3), 215–236 (1996)

    Google Scholar 

  20. Carter, R.J., Dubchak, I., Holbrook, S.R.: A computational approach to identify genes for functional RNAs in genomic sequences. NAR 29(19), 3928–3938 (2001)

    Google Scholar 

  21. Haykin, S.: Neural Netwoks and Learning Machines, 3rd edn. Pearson Intenational Edition (2009)

    Google Scholar 

  22. Fernández-Navarro, F., Hervás-Martínez, C., García-Alonso, C., Torres-Jimenez, M.: Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity. ESWA 38(10), 12483–12490 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fitkov-Norris, E., Vahid, S., Hand, C. (2012). Evaluating the Impact of Categorical Data Encoding and Scaling on Neural Network Classification Performance: The Case of Repeat Consumption of Identical Cultural Goods. In: Jayne, C., Yue, S., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2012. Communications in Computer and Information Science, vol 311. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32909-8_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32909-8_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32908-1

  • Online ISBN: 978-3-642-32909-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics