Abstract
The study investigates how data processing affects the performance of a predictive model trained on processed (vs. unprocessed) data. By unprocessed data, we refer to data that includes information on the influence of each mixture of transmissions from the input neurons. By processed data, we refer to data where the transmission from each input neuron is treated independently, and its co-activity behaviors with other neurons are processed into a single signal. It is intuitive that predicting the output may be less accurate when the input data undergo such processing. However, it is not immediately clear what factors determine the degree of accuracy loss.
Employing the simplest structure for the described data processing, namely two input neurons and one output neuron, we built predictive models that can forecast the activity of the output neuron based on the states of the input neurons, using a synthetic processed (vs. unprocessed) data set of historical system states. It has been discovered that a significant decrease in accuracy occurs when the conditional output firing probabilities are high for opposite inputs. The prediction of significant effects of data processing was demonstrated by utilizing a real data set of breast cancer. The study emphasizes the significance of evaluating the degree of impact caused by data processing, with possible uses for predictive models and applications, such as explainable AI.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Data Availability Statement
Publicly archived Python code and data sets analyzed or generated in this study can be accessed here
References
Delen, D., Kuzey, C., Uyar, A.: Measuring firm performance using financial ratios: a decision tree approach. Expert Syst. Appl. 40(10), 3970–3983 (2013)
Ding, S., Li, H., Chunyang, S., Junzhao, Yu., Jin, F.: Evolutionary artificial neural networks: a review. Artif. Intell. Rev. 39(3), 251–260 (2013)
Dehmer, M., Mowshowitz, A.: A history of graph entropy measures. Inf. Sci. 181(1), 57–78 (2011)
Hebb, D.O.: The organization of behavior: a neuropsychological theory. Inf, Sci (1949)
Kempter, R., Wulfram, G., Van Hemmen, J.L.: Hebbian learning and spiking neurons. Phys. Rev. E 59(4), 4498 (1999)
Murtagh, F.: Multilayer perceptrons for classification and regression. Neurocomputing 2(5–6), 183–197 (1991)
Namdari, A., Zhaojun, L.: A review of entropy measures for uncertainty quantification of stochastic processes. Adv. Mech. Eng. 11(6) (2019)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Safavian, S.R., David, L.: A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 21(3), 660–674 (1991)
Yao, X.: A review of evolutionary artificial neural networks. Int. J. Intell. Syst. 1(1), 539–567 (1993)
Zhang, W., Becciolini, A., Biggeri, A., Pacini, P., Muirhead, C.R.: Second malignancies in breast cancer patients following radiotherapy: a study in Florence. Italy. Breast Cancer Res. 13(2), 1–9 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Freedman, G.G. (2024). The Price of Data Processing: Impact on Predictive Model Performance. In: Nicosia, G., Ojha, V., La Malfa, E., La Malfa, G., Pardalos, P.M., Umeton, R. (eds) Machine Learning, Optimization, and Data Science. LOD 2023. Lecture Notes in Computer Science, vol 14505. Springer, Cham. https://doi.org/10.1007/978-3-031-53969-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-53969-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53968-8
Online ISBN: 978-3-031-53969-5
eBook Packages: Computer ScienceComputer Science (R0)