File Compressor

The zfp executable in the bin directory is primarily intended for evaluating the rate-distortion (compression ratio and quality) provided by the compressor, but since version 0.5.0 also allows reading and writing compressed data sets. zfp takes as input a raw, binary array of floats, doubles, or integers in native byte order and optionally outputs a compressed or reconstructed array obtained after lossy compression followed by decompression. Various statistics on compression ratio and error are also displayed.

The uncompressed input and output files should be a flattened, contiguous sequence of scalars without any header information, generated for instance by

double* data = new double[nx * ny * nz];
// populate data
FILE* file = fopen("data.bin", "wb");
fwrite(data, sizeof(*data), nx * ny * nz, file);
fclose(file);

zfp requires a set of command-line options, the most important being the -i option that specifies that the input is uncompressed. When present, -i tells zfp to read an uncompressed input file and compress it to memory. If desired, the compressed stream can be written to an output file using -z. When -i is absent, on the other hand, -z names the compressed input (not output) file, which is then decompressed. In either case, -o can be used to output the reconstructed array resulting from lossy compression and decompression.

So, to compress a file, use -i file.in -z file.zfp. To later decompress the file, use -z file.zfp -o file.out. A single dash “-” can be used in place of a file name to denote standard input or output.

When reading uncompressed input, the floating-point precision (single or double) must be specified using either -f (float) or -d (double). In addition, the array dimensions must be specified using -1 (for 1D arrays), -2 (for 2D arrays), or -3 (for 3D arrays). For multidimensional arrays, x varies faster than y, which in turn varies faster than z. That is, a 3D input file corresponding to a flattened C array a[nz][ny][nx] is specified as -3 nx ny nz.

Note that -2 nx ny is not equivalent to -3 nx ny 1, even though the same number of values are compressed. One invokes the 2D codec, while the other uses the 3D codec, which in this example has to pad the input to an nx × ny × 4 array since arrays are partitioned into blocks of dimensions 4d. Such padding usually negatively impacts compression.

Moreover, -2 nx ny is not equivalent to -2 ny nx, i.e., with the dimensions transposed. It is crucial for accuracy and compression ratio that the array dimensions are listed in the order expected by zfp so that the array layout is correctly interpreted. See this discussion for more details.

Using -h, the array dimensions and type are stored in a header of the compressed stream so that they do not have to be specified on the command line during decompression. The header also stores compression parameters, which are described below. The compressor and decompressor must agree on whether headers are used, and it is up to the user to enforce this.

zfp accepts several options for specifying how the data is to be compressed. The most general of these, the -c option, takes four constraint parameters that together can be used to achieve various effects. These constraints are:

minbits: the minimum number of bits used to represent a block
maxbits: the maximum number of bits used to represent a block
maxprec: the maximum number of bit planes encoded
minexp:  the smallest bit plane number encoded

These parameters are discussed in detail in the section on compression modes. Options -r, -p, and -a provide a simpler interface to setting all of the above parameters by invoking fixed-rate (-r), -precision (-p), and -accuracy (-a).

Usage

Below is a description of each command-line option accepted by zfp.

General options

-h

Read/write array and compression parameters from/to compressed header.

-q

Quiet mode; suppress diagnostic output.

-s

Evaluate and print the following error statistics:

  • rmse: The root mean square error.
  • nrmse: The root mean square error normalized to the range.
  • maxe: The maximum absolute pointwise error.
  • psnr: The peak signal to noise ratio in decibels.

Input and output

-i <path>

Name of uncompressed binary input file. Use “-” for standard input.

-o <path>

Name of decompressed binary output file. Use “-” for standard output. May be used with either -i, -z, or both.

-z <path>

Name of compressed input (without -i) or output file (with -i). Use “-” for standard input or output.

When -i is specified, data is read from the corresponding uncompressed file, compressed, and written to the compressed file specified by -z (when present). Without -i, compressed data is read from the file specified by -z and decompressed. In either case, the reconstructed data can be written to the file specified by -o.

Array type and dimensions

-f

Single precision (float type). Shorthand for -t f32.

-d

Double precision (double type). Shorthand for -t f64.

-t <type>

Specify scalar type as one of i32, i64, f32, f64 for 32- or 64-bit integer or floating scalar type.

-1 <nx>

Dimensions of 1D C array a[nx].

-2 <nx> <ny>

Dimensions of 2D C array a[ny][nx].

-3 <nx> <ny> <nz>

Dimensions of 3D C array a[nz][ny][nx].

When -i is used, the scalar type and array dimensions must be specified. One of -f, -d, or -t specifies the input scalar type. -1, -2, or -3 specifies the array dimensions. The same parameters must be given when decompressing data (without -i), unless a header was stored using -h during compression.

Compression parameters

-r <rate>

Specify fixed rate in terms of number of compressed bits per floating-point value.

-p <precision>

Specify fixed precision in terms of number of uncompressed bits per value.

-a <tolerance>

Specify fixed accuracy in terms of absolute error tolerance.

-c <minbits> <maxbits> <maxprec> <minexp>

Specify expert mode parameters.

When -i is used, the compression parameters must be specified. The same parameters must be given when decompressing data (without -i), unless a header was stored using -h when compressing. See the section on compression modes for a discussion of these parameters.

Examples

  • -i file : read uncompressed file and compress to memory
  • -z file : read compressed file and decompress to memory
  • -i ifile -z zfile : read uncompressed ifile, write compressed zfile
  • -z zfile -o ofile : read compressed zfile, write decompressed ofile
  • -i ifile -o ofile : read ifile, compress, decompress, write ofile
  • -i file -s : read uncompressed file, compress to memory, print stats
  • -i - -o - -s : read stdin, compress, decompress, write stdout, print stats
  • -f -3 100 100 100 -r 16 : 2x fixed-rate compression of 100 × 100 × 100 floats
  • -d -1 1000000 -r 32 : 2x fixed-rate compression of 1,000,000 doubles
  • -d -2 1000 1000 -p 32 : 32-bit precision compression of 1000 × 1000 doubles
  • -d -1 1000000 -a 1e-9 : compression of 1,000,000 doubles with < 10-9 max error
  • -d -1 1000000 -c 64 64 0 -1074 : 4x fixed-rate compression of 1,000,000 doubles