Evaluating the Impact of Categorical Data Encoding and Scaling on Neural Network Classification Performance: The Case of Repeat Consumption of Identical Cultural Goods

Fitkov-Norris, Elena; Vahid, Samireh; Hand, Chris

doi:10.1007/978-3-642-32909-8_35

Evaluating the Impact of Categorical Data Encoding and Scaling on Neural Network Classification Performance: The Case of Repeat Consumption of Identical Cultural Goods

Elena Fitkov-Norris⁴,
Samireh Vahid⁴ &
Chris Hand⁴

Conference paper

1663 Accesses
10 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 311))

Abstract

This article investigated the impact of categorical input encoding and scaling approaches on neural network sensitivity and overall classification performance in the context of predicting the repeat viewing propensity of movie goers. The results show that neural network out of sample minimum sensitivity and overall classification performance are indifferent to the scaling of the categorical inputs. However, the encoding of inputs had a significant impact on classification accuracy and utilising ordinal or thermometer encoding approaches for categorical inputs significantly increases the out of sample accuracy of the neural network classifier. These findings confirm that the impact of categorical encoding is problem specific for an ordinal approach, and support thermometer encoding as most suitable for categorical inputs. The classification performance of neural networks was compared against a logistic regression model and the results show that in this instance, the non-parametric approach does not offer any advantage over standard statistical models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Paliwal, M., Kumar, U.A.: Neural networks and statistical techniques: A review of applications. ESWA 36(1), 2–17 (2009)
Google Scholar
Brouwer, R.: A feed-forward network for input that is both categorical and quantitative. NN (2002)
Google Scholar
Crone, S., Lessmann, S., Stahlbock, R.: The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing. EJOR 9(16), 781–800 (2006)
Article MathSciNet Google Scholar
Niu, D., Wang, Y., Wu, D.D.: Power load forecasting using support vector machine and ant colony optimization. ESWA 37(3), 2531–2539 (2010)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)
MATH Google Scholar
Kim, K., Han, I.: Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index. ESWA 19(2), 125–132 (2000)
MathSciNet Google Scholar
Collins, A., Hand, C., Linnell, M.: Analyzing repeat consumption of identical cultural goods: some exploratory evidence from moviegoing. J. Cult. Econ. 32(3), 187–199 (2008)
Article Google Scholar
Sharda, R., Delen, D.: Predicting box-office success of motion pictures with neural networks. ESWA 30(2), 243–254 (2006)
Google Scholar
Zhang, L., Luo, J., Yang, S.: Forecasting box office revenue of movies with BP neural network. ESWA 36(2), 6580–6587 (2009)
Google Scholar
Kim, S.: Prediction of hotel bankruptcy using support vector machine, artificial neural network, logistic regression, and multivariate discriminant analysis. Serv. Ind. J. 31(3), 441–468 (2011)
Article Google Scholar
Mazzatorta, P., Benfenati, E., Neagu, D., Gini, G.: The importance of scaling in data mining for toxicity prediction. JCICS 42(5), 1250–1255 (2002)
Google Scholar
Viaene, S., Dedene, G., Derrig, R.: Auto claim fraud detection using Bayesian learning neural networks. ESWA 29(3), 653–666 (2005)
Google Scholar
Sahoo, G., Ray, C., Mehnert, E., Keefer, D.: Application of artificial neural networks to assess pesticide contamination in shallow groundwater. SCTEN 367(1), 234–251 (2006)
Google Scholar
Setiono, R., Thong, J., Yap, C.: Symbolic rule extraction from neural networks: An application to identifying organizations adopting IT. Inform. & Manage. 34(2), 91–101 (1998)
Article Google Scholar
Hsu, C.: Generalizing self-organizing map for categorical data. NN (2006)
Google Scholar
Sakai, S., Kobayashi, K., Toyabe, S.I., Mandai, N., Kanda, T., Akazawa, T.: Comparison of the Levels of Accuracy of an Artificial Neural Network Model and a Logistic Regression Model for the Diagnosis of Acute Appendicitis. J. Med. Syst. 31(5), 357–364 (2007)
Article Google Scholar
Lai, K.K., Yu, L., Wang, S., Zhou, L.: Neural Network Metalearning for Credit Scoring. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006, Part I. LNCS, vol. 4113, pp. 403–408. Springer, Heidelberg (2006)
Chapter Google Scholar
Basheer, I., Hajmeer, M.: Artificial neural networks: fundamentals, computing, design, and application. J. Microbiol. Methods 43(1), 3–31 (2000)
Article Google Scholar
Kaastra, I., Boyd, M.: Designing a neural network for forecasting financial and economic time series. NC 10(3), 215–236 (1996)
Google Scholar
Carter, R.J., Dubchak, I., Holbrook, S.R.: A computational approach to identify genes for functional RNAs in genomic sequences. NAR 29(19), 3928–3938 (2001)
Google Scholar
Haykin, S.: Neural Netwoks and Learning Machines, 3rd edn. Pearson Intenational Edition (2009)
Google Scholar
Fernández-Navarro, F., Hervás-Martínez, C., García-Alonso, C., Torres-Jimenez, M.: Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity. ESWA 38(10), 12483–12490 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Kingston University, Kingston Hill, Kingston-upon-Thames, KT2 7LB, UK
Elena Fitkov-Norris, Samireh Vahid & Chris Hand

Authors

Elena Fitkov-Norris
View author publications
You can also search for this author in PubMed Google Scholar
Samireh Vahid
View author publications
You can also search for this author in PubMed Google Scholar
Chris Hand
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Coventry University, Priory Street,, CV1 5FB, Coventry, UK
Chrisina Jayne
University of Lincoln, LN6 7TS, Lincoln, UK
Shigang Yue
University of Thrace, 193 Pandazidou st., 68200 N, Orestiada, Greece
Lazaros Iliadis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fitkov-Norris, E., Vahid, S., Hand, C. (2012). Evaluating the Impact of Categorical Data Encoding and Scaling on Neural Network Classification Performance: The Case of Repeat Consumption of Identical Cultural Goods. In: Jayne, C., Yue, S., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2012. Communications in Computer and Information Science, vol 311. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32909-8_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-32909-8_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32908-1
Online ISBN: 978-3-642-32909-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics