Abstract
We propose a generalized model with configurable discretizer actuators as a solution to the problem of the discretization of massive numerical datasets. Our solution is based on a concurrent distribution of the actuators and uses dynamic memory management schemes to provide a complete scalable basis for the optimization strategy. This prevents the limited memory from halting while minimizing the discretization time and adapting new observations without re-scanning the entire old data. Using different discretization algorithms on publicly available massive datasets, we conducted a number of experiments which showed that using our discretizer actuators with the Hellinger’s algorithm results in better performance compared to using conventional discretization algorithms implemented in the Hugin and Weka in terms of memory and computational resources. By showing that massive numerical datasets can be discretized within limited memory and time, these results suggest the integration of our configurable actuators into the learning process to reduce the computational complexity of modeling Bayesian networks to a minimum acceptable level.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Russell, S., Norvig, P.: Artificial Intelligence, A Modern Approach, 2nd edn., p. 07458. Prentice Hall Series Inc., New Jersey (2003)
Chickering, D., Heckerman, D., Meek, C.: Large-Sample Learning of Bayesian Networks is NP-Hard. The Journal of Machine Learning Research 5, 1287–1330 (2004)
William, H., Haipeng, G., Benjamin, P., Julie, S.: A Permutation Genetic Algorithm For Variable Ordering In Learning Bayesian Networks From Data. In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 383–390. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Li, J., Liu, H., Wong, L.: Mean-entropy Discretized Features are Effective for Classifying High-dimensional Biomedical data. In: Proceedings of the 3rd ACM SIGKDD Workshop on Data Mining in Bioinformatics, Washington, DC (2003)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: 12th International Conference on Machine Learning (1995)
Lee, C.: A Hellinger-based discretization method for numeric attributes in classification learning. Knowledge-Based Systems 20, 419–425 (2007)
Witten, I.H., Eibe, F.: Data Mining Practical Machine Learning Techniques and Tools, University of Waikato - WEKA. Morgan Kaufmann, San Francisco (1999), http://www.cs.waikato.ac.nz/~ml/weka/
Olesen, K.G., Lauritzen, S.L., Jensen, F.V.: aHugin: A system creating adaptive causal probabilistic networks. In: Dubois, D., Wellman, M.P., D’Ambrosio, B., Smets, P. (eds.) Proceedings of the Eighth Conference on Uncertainty in Artificial Intelligence, pp. 223–229. Morgan Kaufmann, San Mateo (1992), http://hugin.sourceforge.net/download/
Newman, D., Hettich, S., Blake, C., Merz, C.: UCI Repository of Machine Learning Databases (University of California, Department of Information and Computer Science, Irvine,CA) (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Zhang, Y., Luke, E.A.: Dynamic Memory Management in the Loci Framework. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3515, pp. 790–797. Springer, Heidelberg (2005)
Graham, R.M.: Principles of Systems Programming. John Wiley & sons Inc., New York (1975)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Osunmakinde, I.O., Bagula, A. (2009). Supporting Scalable Bayesian Networks Using Configurable Discretizer Actuators. In: Kolehmainen, M., Toivanen, P., Beliczynski, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2009. Lecture Notes in Computer Science, vol 5495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04921-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-04921-7_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04920-0
Online ISBN: 978-3-642-04921-7
eBook Packages: Computer ScienceComputer Science (R0)