Bucket Selection: A Model-Independent Diverse Selection Strategy for Widening

Fillbrunn, Alexander; Wörteler, Leonard; Grossniklaus, Michael; Berthold, Michael R.

doi:10.1007/978-3-319-68765-0_8

Alexander Fillbrunn^16,17,
Leonard Wörteler¹⁶,
Michael Grossniklaus¹⁶ &
…
Michael R. Berthold^16,17,18

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10584))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1044 Accesses

Abstract

When using a greedy algorithm for finding a model, as is the case in many data mining algorithms, there is a risk of getting caught in local extrema, i.e., suboptimal solutions. Widening is a technique for enhancing greedy algorithms by using parallel resources to broaden the search in the model space. The most important component of widening is the selector, a function that chooses the next models to refine. This selector ideally enforces diversity within the selected set of models in order to ensure that parallel workers explore sufficiently different parts of the model space and do not end up mimicking a simple beam search. Previous publications have shown that this works well for problems with a suitable distance measure for the models, but if no such measure is available, applying widening is challenging. In addition these approaches require extensive, sequential computations for diverse subset selection, making the entire process much slower than the original greedy algorithm. In this paper we propose the bucket selector, a model-independent randomized selection strategy. We find that (a) the bucket selector is a lot faster and not significantly worse when a diversity measure exists and (b) it performs better than existing selection strategies in cases without a diversity measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A heuristic search based on diversity for solving combinatorial problems

Article 04 April 2022

Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means

A-BIRCH: Automatic Threshold Estimation for the BIRCH Clustering Algorithm

Notes

1.
Due to the nature of the set cover problem and the chosen heuristic, models with the same score occur frequently and other tie-breaking methods may be feasible. This is, however, out of scope of this work.
2.
http://openjdk.java.net/projects/code-tools/jmh/ (1/26/2017).

References

Akbar, Z., Ivanova, V.N., Berthold, M.R.: Parallel data mining revisited. Better, not faster. In: Hollmén, J., Klawonn, F., Tucker, A. (eds.) IDA 2012. LNCS, vol. 7619, pp. 23–34. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34156-4_4
Chapter Google Scholar
Amado, N., Gama, J., Silva, F.: Parallel implementation of decision tree learning algorithms. In: Brazdil, P., Jorge, A. (eds.) EPIA 2001. LNCS, vol. 2258, pp. 6–13. Springer, Heidelberg (2001). doi:10.1007/3-540-45329-6_4
Chapter Google Scholar
Beasley, J.E.: OR-Library: distributing test problems by electronic mail. J. Opl. Res. Soc. 41(11), 1069–1072 (1990)
Article Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Bruno, N., Galindo-Legaria, C.A., Joshi, M.: Polynomial heuristics for query optimization. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 589–600 (2010)
Google Scholar
Zhihua, D., Lin, F.: A novel parallelization approach for hierarchical clustering. Parallel Comput. 31(5), 523–527 (2005)
Article Google Scholar
Fillbrunn, A., Berthold, M.R.: Diversity-driven widening of hierarchical agglomerative clustering. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 84–94. Springer, Cham (2015). doi:10.1007/978-3-319-24465-5_8
Chapter Google Scholar
Goldberg, D.E., Richardson, J.T.: Genetic algorithms with sharing for multimodal function optimization. In: Proceedings of International Conference on Genetic Algorithms (ICGA), pp. 41–49 (1987)
Google Scholar
Ivanova, V.N., Berthold, M.R.: Diversity-driven widening. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) IDA 2013. LNCS, vol. 8207, pp. 223–236. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41398-8_20
Chapter Google Scholar
Johnson, D.S.: Approximation algorithms for combinatorial problems. J. Comput. Syst. Sci. 9(3), 256–278 (1974)
Article MathSciNet MATH Google Scholar
Korte, B., Vygen, J.: Combinatorial Optimization. Algorithms and Combinatorics. Springer, Heidelberg (2013)
MATH Google Scholar
Sampson, O., Berthold, M.R., Widened, K.: Better performance through diverse parallelism. In: Proceedings of International Symposium on Intelligent Data Analysis (IDA), pp. 276–285 (2014)
Google Scholar
Sampson, O.R., Berthold, M.R.: Widened learning of Bayesian network classifiers. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds.) IDA 2016. LNCS, vol. 9897, pp. 215–225. Springer, Cham (2016). doi:10.1007/978-3-319-46349-0_19
Chapter Google Scholar
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of International Conference on Management of Data (SIGMOD), pp. 23–34 (1979)
Google Scholar

Download references

Acknowledgements

This work was partially funded by BMBF (grant 031A535C) and the Konstanz Research School Chemical Biology.

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Konstanz, 78457, Konstanz, Germany
Alexander Fillbrunn, Leonard Wörteler, Michael Grossniklaus & Michael R. Berthold
Konstanz Research School Chemical Biology (KoRS-CB), Konstanz, Germany
Alexander Fillbrunn & Michael R. Berthold
KNIME AG, 8005, Zurich, Switzerland
Michael R. Berthold

Authors

Alexander Fillbrunn
View author publications
You can also search for this author in PubMed Google Scholar
Leonard Wörteler
View author publications
You can also search for this author in PubMed Google Scholar
Michael Grossniklaus
View author publications
You can also search for this author in PubMed Google Scholar
Michael R. Berthold
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Fillbrunn .

Editor information

Editors and Affiliations

Imperial College London, London, United Kingdom
Niall Adams
Brunel University London, Uxbridge, United Kingdom
Allan Tucker
Birkbeck, University of London, London, United Kingdom
David Weston

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fillbrunn, A., Wörteler, L., Grossniklaus, M., Berthold, M.R. (2017). Bucket Selection: A Model-Independent Diverse Selection Strategy for Widening. In: Adams, N., Tucker, A., Weston, D. (eds) Advances in Intelligent Data Analysis XVI. IDA 2017. Lecture Notes in Computer Science(), vol 10584. Springer, Cham. https://doi.org/10.1007/978-3-319-68765-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-68765-0_8
Published: 04 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68764-3
Online ISBN: 978-3-319-68765-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Bucket Selection: A Model-Independent Diverse Selection Strategy for Widening

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A heuristic search based on diversity for solving combinatorial problems

Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means

A-BIRCH: Automatic Threshold Estimation for the BIRCH Clustering Algorithm

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Bucket Selection: A Model-Independent Diverse Selection Strategy for Widening

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A heuristic search based on diversity for solving combinatorial problems

Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means

A-BIRCH: Automatic Threshold Estimation for the BIRCH Clustering Algorithm

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation