From tinelli at cs.uiowa.edu Wed Feb 1 22:11:52 2006 From: tinelli at cs.uiowa.edu (Cesare Tinelli) Date: Wed Feb 1 21:49:42 2006 Subject: [SMTCOMP] request for comment In-Reply-To: <200601251621.k0PGL6805641@priam.cse.wustl.edu> References: <200601251621.k0PGL6805641@priam.cse.wustl.edu> Message-ID: <43E1A2A8.2080408@cs.uiowa.edu> Hi Aaron, I went over the draft. I find it quite good already and I notice that it already incorporated some of major changes suggested by various people since SMT-COMP'05. Here are some further comments or questions in the hope that you may find them useful. page 2 Is the decision to allow people to submit the results of their solver to the demonstration section a nod to the Microsoft people? Just curious. page 2 I'm not sure how you are going to tell that two versions of the same solvers are not to be allowed if those versions are submitted in binary format. page 2 The session for system presentations on the 21st is a good idea. Is it going to be part of CAV's program? If it after CAV's sessions that day, have you already secured a room with the organizers? Just checking. I would propose that you also informally organize an SMT-COMP dinner on the 21 or the day before, compatibly with the rest of the FloC schedule. I think it would a great way for the participants to get to know one another, and for building a sense of community. CASC does that and it has worked rather well, although I would not go as far as organizing an official dinner, with a dinner registration and fee, as they do. page 3 The new version of SMT-LIB will will be 1.2. It will be a minor revision of the current one. I think it will be available by the end of March, together with updated theories, logics and benchmarks as needed. For the bitvectors, Silvio and I will propose a temporary theory and logic that does not need the introduction of dependent types, so that we can already have the bitvectors division at this SMT-COMP. Clark already has a good number of benchmarks in CVCL format that should be easy to translate in SMT-LIB 1.2. page 3 this should be first discussed with Clark I guess. But he has made good progress with CVCL in solving the majority of a very large set of quantified benchmarks from NASA. It might be good to have a division for these benchmarks, to stimulate work on quantifiers. CVCL would not be the only one here. Clark and one of his students have also defined a translator for Simplify, so Simplify could be entered as well. In addition, I think that the Vampire and the SPASS people might be interested in participating as they have been working on those benchmarks too. page 4 The classification into hard and easy benchmarks might be a source of trouble and contention, because there are no clear objective measures. You may want to spend more time here in devising a clean procedure for deciding the difficulty of a benchmark. Apart from that, the algorithm for choosing the benchmarks seems like a reasonably good one, which should shield you from accusations of favoritism. Methodologically though, the problem I see in your proposal is that if SMT-LIB happens to have for a given division a disproportionately large number of benchmarks from the same class of problems (such as synthetic benchmarks), picking randomly with a uniform distribution will give you a biased sampling set. I'm not sure how to address that properly though. About the scrambling issue, scrambling might be hard to justify if it changes the structure of the problem, given that structure is an important factor in practice. On the other hand, the only reasonable structure preserving transformations might be renamings of free symbols, which is probably not a deterrent for cheaters. I suppose that the only good deterrent would be to have a large enough number of benchmarks in SMT-LIB. So, we need more benchmarks! Page 5 I see you now propose to give the same penalty to wrong unsat and sat answers. Any reasons for the change? I'm not sure that disqualifying a system after three wrong answers makes a lot of sense. You either allow any number of wrong answers, and rely on the scoring system to penalize, on average, buggy systems, or you completely discount the results of a system as soon as it gives you a wrong result. It is possible to make a good argument in favor of each of these two choices, but I cannot see how to justify your compromise solution. Given that - this is not the first edition of the competition, - you plan to have benchmarks available well in advance, - we expect to have a correct classification of the benchmarks, and - a solver has the option to answer "unknown" (or timeout), I would propose to go for immediate disqualification after the first wrong answer. Perhaps though not a disqualification from all divisions but only from those where wrong answers were given. The rationale here is that, since SMT solvers will most likely have specialized mechanisms for each divisions it is possible that a solver is correct for one division and buggy for another. Finally, in case you are not aware of it, you may want to take a look of what people are doing for the pseudo-boolean solver evaluation (http://www.cril.univ-artois.fr/PB06). It might be another useful source of suggestions. Cheers, Cesare From kendroe at hotmail.com Mon Feb 6 19:18:27 2006 From: kendroe at hotmail.com (Kenneth Roe) Date: Mon Feb 6 18:55:23 2006 Subject: [SMTCOMP] Bit vector problems Message-ID: Since we do not have a bit vector division currently, I think it would be good to get some of the problems well before the June 1st date proposed in the SMT-COMP rules. I know CVC Lite has a substantial library of problems. However, they all look like small test cases. It would be nice to get a few large test cases within a month or two. I've heard rumors that Intel might donate some test cases. Can someone fill me in on the plans? Also, I've heard noises about quantifier problems and adding substantially to the other sections. What is the story here? - Ken From demoura at csl.sri.com Thu Feb 9 15:11:53 2006 From: demoura at csl.sri.com (Leonardo de Moura) Date: Thu Feb 9 14:48:17 2006 Subject: [SMTCOMP] WiSA benchmarks Message-ID: Hi I'm looking for more benchmarks from the Wisconsin Safety Analyzer (WiSA) project. SMT-LIB has only five instances of these benchmarks. Does anyone know where I can find them? Thanks, Leonardo From stump at cse.wustl.edu Tue Feb 21 10:56:19 2006 From: stump at cse.wustl.edu (Aaron Stump) Date: Tue Feb 21 10:32:40 2006 Subject: [SMTCOMP] final rules posted Message-ID: <200602211856.k1LIuJQ06553@priam.cse.wustl.edu> This is just to announce that we have revised the SMT-COMP 2006 rules according to the community feedback we received. Thanks to all who gave us suggestions on this. The rules are posted on the SMT-COMP 2006 site: http://www.csl.sri.com/users/demoura/smt-comp/ Below is copied a summary of the suggestions we received ("--") and our responses ("**"). Best wishes, Aaron for Clark and Leonardo, too. ---------------------------------------------------------------------- Windows -- could we also support Windows in addition to linux. ** no, we should not try to support Windows. Running the competition on multiple platforms is likely to provoke criticism, and isn't it the case that for remote execution machines need to be running Remote Desktop Server? Other results -- Results reported to the organizers by tools in a demonstration division should not be reported unverified to the public. ** well, that is a point. We are dropping the "demonstration division". Benchmarks -- Make *all* benchmarks available in advance ** yes, we should definitely do this, and not spring any new benchmarks on people that they haven't had a chance to run. Scrambling -- do perform simple formula scrambling. -- do not scramble structure; also there may be no point because simple scrambling can be easily defeated by an adversary ** we will design a simple pseudo-random scrambler, and publish it enough in advance so that people can make sure their tools handle its output. Buggy solvers: -- do eliminate solvers with more than some number of bugs. -- either disqualify solvers with even one buggy answer, or else just use the point system to penalize them out of contention. The compromise we suggest is hard to justify. ** We respectfully disagree with the last point, and will stick with the proposed scheme (disqualify a tool from all divisions if in any one division it gets more than three wrong answers; penalize by 8 points for each wrong answer). Benchmark selection -- do allow secret benchmarks -- uniformly random selection will be biased if the set of benchmarks contain a disproportionate number of benchmarks of a particular kind (e.g., "synthetic") ** no secret benchmarks, although it would be an intriguing twist to the competition; also, we must try to avoid skewing the selection to one kind of benchmark, as much as possible. Community building -- organize an informal dinner at FLoC. ** sounds good. Bitvectors -- the theory will probably be specified by end of March Quantifiers -- try to add a division with quantified formulas ** ok, if we have some formulas (as it sounds like we will). From demoura at csl.sri.com Wed Feb 22 08:18:36 2006 From: demoura at csl.sri.com (Leonardo de Moura) Date: Wed Feb 22 07:53:05 2006 Subject: [SMTCOMP] Scrambler Message-ID: This is just to announce that we have posted the benchmark scrambler is available on the SMT-COMP website. http://www.csl.sri.com/users/demoura/smt-comp/ From demoura at csl.sri.com Wed Feb 22 11:14:52 2006 From: demoura at csl.sri.com (Leonardo de Moura) Date: Wed Feb 22 10:49:21 2006 Subject: [SMTCOMP] Scrambler updated Message-ID: The benchmark scrambler has been updated. This update fixes a bug in examples that contain big numbers.