The `dlv` Project
Benchmarking

Back to the main page

We're sometimes asked how to (correctly) benchmark LPNMR systems. Here you'll find some of our thoughts. If you have any further remarks or questions, please do not hesitate to contact us.

What should be measured?

You should measure all of the time which is needed by the system, which is to be benchmarked on a particular machine, in order to solve a particular problem, and only that time.

This time should include all of the activities of the system (e.g. the loading of file contents and the time needed to start the program should be included), because this is what a user of the system will observe. On the other hand, this time should possibly not be influenced by other computational tasks performed concurrently on the machine, such that the benchmark is reproducible.

In other words, the ideal time measurement represents the time a user observes on a perfectly unloaded machine. In reality, machines are never completely unloaded, so we have to approximate this time as good as possible.

How should one measure the duration of the computation?

Use the built-in timing commands of the operating system - for most systems that would be time or timex - and consider the sum of the "user" and "system" times.

Note 1: Some shells like tcsh provide built-in timing commands, but we recommend against using these.

Note 2: It is important not to ignore the "system" time. It constitutes a (sometimes significant) part of total time taken for the computation.

For more information about CPU time and timing in general, see also the documentation of the GNU C library.

Well, why doesn't `dlv` print its timing?

In most cases the timing performed by a system itself will be inaccurate, or even completely wrong (and this is without the system's author intentionally twisting results). This is because necessarily only some part of the computation is measured (the startup time for the executable can not be measured in this way). Also, if you just compare two time stamps, some concurrently running tasks' time consumptions will be included in the measurement.

Systems involving several pipelined programs

If a system consists of several programs that are combined via shell piplines, the time command will not work correctly since the shell splits off the parts separated by the pipeline symbols before time even comes into play.

The solution is to wrap a shell-invocation around the pipelined commands, either by using

time sh -c "module1 | ... | moduleN"

or by generating a shellscript and invoking time on this script.

The time spent on invoking the shell is neglectable on most modern systems. If you are in doubt, measure it by

time sh -c "exit"

and substract the result from all benchmark timings.

So how do I take the time for...

DeReS?
time deres options
dlv?
time dl options
smodels?
smodels itself already prints a timing by default, but this timing does not include the time spent in the grounding phase, as smodels is a two pass system consisting of two binaries and only the time spent in the second binary is considered. For some examples, almost all of the computation time is spent in the first binary!
We thus suggest to write a small shell script sm like
#!/bin/sh parse < $2 | smodels $1
and do time sm options. (See also the note about multi-part systems above.)

If you know about any further systems, please drop us a note!