Abstract
When using a greedy algorithm for finding a model, as is the case in many data mining algorithms, there is a risk of getting caught in local extrema, i.e., suboptimal solutions. Widening is a technique for enhancing greedy algorithms by using parallel resources to broaden the search in the model space. The most important component of widening is the selector, a function that chooses the next models to refine. This selector ideally enforces diversity within the selected set of models in order to ensure that parallel workers explore sufficiently different parts of the model space and do not end up mimicking a simple beam search. Previous publications have shown that this works well for problems with a suitable distance measure for the models, but if no such measure is available, applying widening is challenging. In addition these approaches require extensive, sequential computations for diverse subset selection, making the entire process much slower than the original greedy algorithm. In this paper we propose the bucket selector, a model-independent randomized selection strategy. We find that (a) the bucket selector is a lot faster and not significantly worse when a diversity measure exists and (b) it performs better than existing selection strategies in cases without a diversity measure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Due to the nature of the set cover problem and the chosen heuristic, models with the same score occur frequently and other tie-breaking methods may be feasible. This is, however, out of scope of this work.
- 2.
http://openjdk.java.net/projects/code-tools/jmh/ (1/26/2017).
References
Akbar, Z., Ivanova, V.N., Berthold, M.R.: Parallel data mining revisited. Better, not faster. In: Hollmén, J., Klawonn, F., Tucker, A. (eds.) IDA 2012. LNCS, vol. 7619, pp. 23–34. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34156-4_4
Amado, N., Gama, J., Silva, F.: Parallel implementation of decision tree learning algorithms. In: Brazdil, P., Jorge, A. (eds.) EPIA 2001. LNCS, vol. 2258, pp. 6–13. Springer, Heidelberg (2001). doi:10.1007/3-540-45329-6_4
Beasley, J.E.: OR-Library: distributing test problems by electronic mail. J. Opl. Res. Soc. 41(11), 1069–1072 (1990)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Bruno, N., Galindo-Legaria, C.A., Joshi, M.: Polynomial heuristics for query optimization. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 589–600 (2010)
Zhihua, D., Lin, F.: A novel parallelization approach for hierarchical clustering. Parallel Comput. 31(5), 523–527 (2005)
Fillbrunn, A., Berthold, M.R.: Diversity-driven widening of hierarchical agglomerative clustering. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 84–94. Springer, Cham (2015). doi:10.1007/978-3-319-24465-5_8
Goldberg, D.E., Richardson, J.T.: Genetic algorithms with sharing for multimodal function optimization. In: Proceedings of International Conference on Genetic Algorithms (ICGA), pp. 41–49 (1987)
Ivanova, V.N., Berthold, M.R.: Diversity-driven widening. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) IDA 2013. LNCS, vol. 8207, pp. 223–236. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41398-8_20
Johnson, D.S.: Approximation algorithms for combinatorial problems. J. Comput. Syst. Sci. 9(3), 256–278 (1974)
Korte, B., Vygen, J.: Combinatorial Optimization. Algorithms and Combinatorics. Springer, Heidelberg (2013)
Sampson, O., Berthold, M.R., Widened, K.: Better performance through diverse parallelism. In: Proceedings of International Symposium on Intelligent Data Analysis (IDA), pp. 276–285 (2014)
Sampson, O.R., Berthold, M.R.: Widened learning of Bayesian network classifiers. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds.) IDA 2016. LNCS, vol. 9897, pp. 215–225. Springer, Cham (2016). doi:10.1007/978-3-319-46349-0_19
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of International Conference on Management of Data (SIGMOD), pp. 23–34 (1979)
Acknowledgements
This work was partially funded by BMBF (grant 031A535C) and the Konstanz Research School Chemical Biology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Fillbrunn, A., Wörteler, L., Grossniklaus, M., Berthold, M.R. (2017). Bucket Selection: A Model-Independent Diverse Selection Strategy for Widening. In: Adams, N., Tucker, A., Weston, D. (eds) Advances in Intelligent Data Analysis XVI. IDA 2017. Lecture Notes in Computer Science(), vol 10584. Springer, Cham. https://doi.org/10.1007/978-3-319-68765-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-68765-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68764-3
Online ISBN: 978-3-319-68765-0
eBook Packages: Computer ScienceComputer Science (R0)