next up previous contents index
Next: Examples Up: Analyzing the Output Previous: Analyzing the Output


The Sus Filter Tools

Output generated by experimentation programs can come in a myriad of forms. However, the output desired for analyzing or presenting results of experiments generally has one of two forms: a table of values or a graph.

Again, we do not wish to restrict users unnecessarily by insisting on a particular form of output that may be difficult to produce for certain programs. Instead, we provide a set of tools that can process text given in any form to produce a particular internal data format. This internal format, called sus (for ``Script-readable User Statistics''), is text-based and easily readable by humans (Section [*]) so users can produce this format directly with their programs if they wish. The tools that convert from the internal format can also process the data by performing mathematical calculations.

Users are asked to specify which data values to extract from a given input file using key words and regular expressions. The Sus Filter Tools use the powerful Python language expressions for specifying regular expressions and reformatting data. For details on Python, see http://www.python.org.

Two programs, table2sus and text2sus, convert data to the sus format. The first program can be used to convert an ASCII table to the internal format and the second converts a text file in any format to the sus format by selecting from the file the data values indicated by the user. For converting from the sus format, we provide programs that produce

All these programs act as filters by default (reading from standard input and writing to standard output), making it easy to convert any given data file to one of the supported output file formats.

   cat prog.out | text2sus num_nodes time | sus2plot
Alternatively, input and output files can be specified for each program using the --input and --output options, respectively. For example, the above command could be rewritten as:
  text2sus --input prog.out num_nodes time | sus2plot
Multiple --input commands are possible and effect a merging of the input files as if the files were catenated together one after another.

The other command-line options for these tools allow one to specify which data to extract from the file and the format in which to display the output. In the simplest form, the user specifies simply the keyword labels for the desired data, as with the above example. The program will then extract the next word or number after the label as the value to associate with this label in each data record. If you wish to extract a value other than the next word or number to associate with this label, regular expressions can be used on the command line to express the desired value. See the Python documentation or Section [*] for more information on forming regular expressions. See Sections [*] and  [*] for examples.

The tools that convert from a sus file allow you to manipulate the data in several ways. In particular, one can

  1. add a new field (command-line option: --add).
    This can be useful to perform mathematical operations on the data values, or to reformat the output for pretty printing.
  2. filter out data (command-line option: --filter).
    Only records where the expression gives a nonzero value (not "" or 0) are used.
  3. sort the data (command-line option: --sort).
    The records are sorted according to the sorting expression. Records with the same key are merged (in accordance with the value given for the --combine option). By default, the average of numeric values is taken; string values are always simply concatenated.
For filtering, sorting, or adding new data fields, the full expresion power of Python can be used. This includes numerical expressions:
     sus2sus '--filter=float(upperbound) > float(lowerbound)*100'
(filter out all records for which the upper bound is more than 100 times the lower bound),

formatting expressions similar to printf in C:

     sus2sus '--add=formattedOpt="%-20f"%float(opt)'
(add a formatted opt value, left-justified in a field of 20 characters),

and string manipulation expressions, using Python's string module:

     sus2sus '--sort=float(split(time,":")[0])'
(sort the records based on the number of hours in a time field with format hh:mm:ss. Note the float-function that has to be used to sort, e.g. 9 and 10, in the correct order).

This string module, providing functions such as join, split, strip, and replace, is particularly useful for formatting output. We also provide some extensions:

iff(exp,then,else):

returns <then> if <exp> is not empty ('', '0' or not defined), otherwise <else>.

Example:

    sus2sus '--add=solvedText=iff(solved,"solved","not solved")'
format(val):
Formats a number. Optional parameters:
length:
minimal length of the output (negative values produce left-justified output)
digits:
number of digits after the decimal point
min:
minimal value
max:
maximal value
pad:
padding character, may be
'0' for zero padding
'+' for +/-
'-' for left-justified output

Examples:
Print opt left-justified in a field of minimal length 20 (produces the same result as the example illustrating string formatting above).

   sus2sus  '--add=formattedOpt=format(opt,length=-20)'

Round the time to two digits with a minimum value of 0.01

   sus2sus '--add=formattedTime=format(time,digits=2,min=0.01)'

If a variable that is used in an expression is not defined, it will take on '000' as a default value. The switch --eval changes this behaviour:

-eval=strict
undefined variables cause an error,
-eval=warn
undefined variables produce warnings,
-eval=debug
print out every expression evaluation.
-eval=invalid
use '--' as default (nice for sus2plot)


next up previous contents index
Next: Examples Up: Analyzing the Output Previous: Analyzing the Output
Susan Hert
2002-08-29