Abstract
In this paper we argue that parallel and/or distributed compute resources can be used differently: instead of focusing on speeding up algorithms, we propose to focus on improving accuracy. In a nutshell, the goal is to tune data mining algorithms to produce better results in the same time rather than producing similar results a lot faster. We discuss a number of generic ways of tuning data mining algorithms and elaborate on two prominent examples in more detail. A series of exemplary experiments is used to illustrate the effect such use of parallel resources can have.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press (1972)
Johnson, D.S.: Approximation algorithms for combinatorial problems. In: Proceedings of the Fifth Annual ACM Symposium on Theory of Computing, STOC 1973, pp. 38–49. ACM, New York (1973)
Beasley, J.E.: Or-library: Distributing test problems by electronic mail. The Journal of the Operational Research Society 41(11), 1069–1072 (1990)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey (1984)
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Quinlan, J.R., Cameron-Jones, R.M.: Oversearching and layered search in empirical learning. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 2, pp. 1019–1024 (1995)
Murthy, S., Salzberg, S.: Lookahead and pathology in decision tree induction. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 2, pp. 1025–1031 (1995)
Iba, W., Langley, P.: Induction of one-level decision trees. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 233–240 (1992)
Sarkar, U., Chakrabarti, P., Ghose, S., Desarkar, S.: Improving Greedy Algorithms by Lookahead-Search. Journal of Algorithms 16(1), 1–23 (1994)
Elomaa, T., Malinen, T.: On Lookahead Heuristics in Decision Tree Learning. In: Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds.) ISMIS 2003. LNCS (LNAI), vol. 2871, pp. 445–453. Springer, Heidelberg (2003)
Esmeir, S., Markovitch, S.: Lookahead-based algorithms for anytime induction of decision trees. In: Twenty-First International Conference on Machine Learning, ICML 2004, pp. 33–40. ACM Press, New York (2004)
Brönnimann, H., Goodrich, M.T.: Almost optimal set covers in finite vc-dimension (preliminary version). In: Proceedings of the Tenth Annual Symposium on Computational Geometry, SCG 1994, pp. 293–302. ACM, New York (1994)
Berger, B., Rompel, J., Shor, P.W.: Efficient nc algorithms for set cover with applications to learning and geometry. Journal of Computer and System Sciences 49(3), 454–477 (1994)
Blelloch, G.E., Peng, R., Tangwongsan, K.: Linear-work greedy parallel approximate set cover and variants. In: Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2011, pp. 23–32. ACM, New York (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Akbar, Z., Ivanova, V.N., Berthold, M.R. (2012). Parallel Data Mining Revisited. Better, Not Faster. In: Hollmén, J., Klawonn, F., Tucker, A. (eds) Advances in Intelligent Data Analysis XI. IDA 2012. Lecture Notes in Computer Science, vol 7619. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34156-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-34156-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34155-7
Online ISBN: 978-3-642-34156-4
eBook Packages: Computer ScienceComputer Science (R0)