The dlv Project

Back to the main page

We're sometimes asked how to (correctly) benchmark LPNMR systems. Here you'll find some of our thoughts. If you have any further remarks or questions, please do not hesitate to contact us.

What should be measured?

You should measure all of the time which is needed by the system, which is to be benchmarked on a particular machine, in order to solve a particular problem, and only that time.

This time should include all of the activities of the system (e.g. the loading of file contents and the time needed to start the program should be included), because this is what a user of the system will observe. On the other hand, this time should possibly not be influenced by other computational tasks performed concurrently on the machine, such that the benchmark is reproducible.

In other words, the ideal time measurement represents the time a user observes on a perfectly unloaded machine. In reality, machines are never completely unloaded, so we have to approximate this time as good as possible.

How should one measure the duration of the computation?

Use the built-in timing commands of the operating system - for most systems that would be time or timex - and consider the sum of the "user" and "system" times.

Note 1: Some shells like tcsh provide built-in timing commands, but we recommend against using these.

Note 2: It is important not to ignore the "system" time. It constitutes a (sometimes significant) part of total time taken for the computation.

For more information about CPU time and timing in general, see also the documentation of the GNU C library.

Well, why doesn't dlv print its timing?

In most cases the timing performed by a system itself will be inaccurate, or even completely wrong (and this is without the system's author intentionally twisting results). This is because necessarily only some part of the computation is measured (the startup time for the executable can not be measured in this way). Also, if you just compare two time stamps, some concurrently running tasks' time consumptions will be included in the measurement.

Systems involving several pipelined programs

If a system consists of several programs that are combined via shell piplines, the time command will not work correctly since the shell splits off the parts separated by the pipeline symbols before time even comes into play.

The solution is to wrap a shell-invocation around the pipelined commands, either by using

time sh -c "module1 | ... | moduleN"
or by generating a shellscript and invoking time on this script.

The time spent on invoking the shell is neglectable on most modern systems. If you are in doubt, measure it by

time sh -c "exit"
and substract the result from all benchmark timings.

So how do I take the time for...

If you know about any further systems, please drop us a note!

Back to the main page &
Last modified 2005-08-18>