Careful Ranking of Multiple Solvers with Timeouts and Ties

Van Gelder, Allen

doi:10.1007/978-3-642-21581-0_25

Allen Van Gelder¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6695))

Included in the following conference series:

International Conference on Theory and Applications of Satisfiability Testing

1142 Accesses
4 Citations

Abstract

In several fields, Satisfiability being one, there are regular competitions to compare multiple solvers in a common setting. Due to the fact some benchmarks of interest are too difficult for all solvers to complete within available time, time-outs occur and must be considered.

Through some strange evolution, time-outs became the only factor that was considered in evaluation. Previous work in SAT 2010 observed that this evaluation method is unreliable and lacks a way to attach statistical significance to its conclusions. However, the proposed alternative was quite complicated and is unlikely to see general use.

This paper describes a simpler system, called careful ranking, that permits a measure of statistical significance, and still meets many of the practical requirements of an evaluation system. It incorporates one of the main ideas of the previous work: that outcomes had to be freed of assumptions about timing distributions, so that non-parametric methods were necessary. Unlike the previous work, it incorporates ties.

The careful ranking system has several important non-mathematical properties that are desired in an evaluation system: (1) the relative ranking of two solvers cannot be influenced by a third solver; (2) after the competition results are published, a researcher can run a new solver on the same benchmarks and determine where the new solver would have ranked; (3) small timing differences can be ignored; (4) the computations should be easy to understand and reproduce. Voting systems proposed in the literature lack some or all of these properties.

A property of careful ranking is that the pairwise ranking might contain cycles. Whether this is a bug or a feature is a matter of opinion. Whether it occurs among leaders in practice is a matter of experience.

The system is implemented and has been applied to the SAT 2009 Competition. No cycles occurred among the leaders, but there was a cycle among some low-ranking solvers. To measure robustness, the new and current systems were computed with a range of simulated time-outs, to see how often the top rankings changed. That is, times above the simulated time-out are reclassified as time-outs and the rankings are computed with this data. Careful ranking exhibited many fewer changes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brglez, F., Li, X.Y., Stallmann, M.F.: On SAT instance classes and a method for reliable performance experiments with sat solvers. Annals of Mathematics and Artificial Intelligence 43, 1–34 (2005)
Article MATH Google Scholar
Brglez, F., Osborne, J.A.: Performance testing of combinatorial solvers with isomorph class instances. In: Workshop on Experimental Computer Science, San Diego. ACM, New York (2007) (co-located with FCRC 2007)
Google Scholar
Le Berre, D., Simon, L.: The essentials of the sat 2003 competition. In: Proc. SAT (2004)
Google Scholar
Levin, J., Nalebuff, B.: An introduction to vote-counting schemes. The Journal of Economic Perspectives 9, 3–26 (1995)
Article Google Scholar
Moulin, H.: Condorcet’s principle implies the no-show paradox. The Journal of Economic Theory 45, 53–64 (1988)
Article MATH Google Scholar
Nikolić, M.: Statistical methodology for comparison of SAT solvers. In: Strichman, O., Szeider, S. (eds.) SAT 2010. LNCS, vol. 6175, pp. 209–222. Springer, Heidelberg (2010)
Chapter Google Scholar
Pomerol, J.-C., Barba-Romero, S.: Multicriterion Decision in Management: Principles and Practice. Springer, Heidelberg (2000)
Book MATH Google Scholar
Pulina, L.: Empirical evaluation of scoring methods. In: Third European Starting AI Researcher Symposium (2006)
Google Scholar
Schulze, M.: A new monotonic and clone-independent single-winner election method. In: Tideman, N. (ed.) Voting Matters, vol. (17), pp. 9–19 (October 2003), http://www.votingmatters.org.uk
Tideman, N.: Collective Decisions and Voting: the Potential for Public Choice. Ashgate (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Santa Cruz, CA, 95064, USA
Allen Van Gelder

Authors

Allen Van Gelder
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Michigan, 48109, Ann Arbor, MI, USA
Karem A. Sakallah
LRI / CNRS UMR8623 / INRIA Saclay, Univ Paris-Sud, F-91405, Orsay, France
Laurent Simon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Van Gelder, A. (2011). Careful Ranking of Multiple Solvers with Timeouts and Ties. In: Sakallah, K.A., Simon, L. (eds) Theory and Applications of Satisfiability Testing - SAT 2011. SAT 2011. Lecture Notes in Computer Science, vol 6695. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21581-0_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-21581-0_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21580-3
Online ISBN: 978-3-642-21581-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics