File Compressor

This section describes a simple, file-based zfp compression tool that is part of the zfp distribution named zfp. Other, third-party, file-based compression options are discussed in the Application Support section.

The zfp executable in the bin directory is primarily intended for evaluating the rate-distortion (compression ratio and quality) provided by the compressor, but since version 0.5.0 also allows reading and writing compressed data sets. zfp takes as input a raw, binary array of floats, doubles, or integers in native byte order and optionally outputs a compressed or reconstructed array obtained after lossy compression followed by decompression. Various statistics on compression ratio and error are also displayed.

The uncompressed input and output files should be a flattened, contiguous sequence of scalars without any header information, generated for instance by

double* data = new double[nx * ny * nz];
// populate data
FILE* file = fopen("data.bin", "wb");
fwrite(data, sizeof(*data), nx * ny * nz, file);
fclose(file);

zfp requires a set of command-line options, the most important being the -i option that specifies that the input is uncompressed. When present, -i tells zfp to read an uncompressed input file and compress it to memory. If desired, the compressed stream can be written to an output file using -z. When -i is absent, on the other hand, -z names the compressed input (not output) file, which is then decompressed. In either case, -o can be used to output the reconstructed array resulting from lossy compression and decompression.

So, to compress a file, use -i file.in -z file.zfp. To later decompress the file, use -z file.zfp -o file.out. A single dash “-” can be used in place of a file name to denote standard input or output.

When reading uncompressed input, the scalar type must be specified using -f (float) or -d (double), or using -t for integer-valued data. In addition, the array dimensions must be specified using -1 (for 1D arrays), -2 (for 2D arrays), -3 (for 3D arrays), or -4 (for 4D arrays). For multidimensional arrays, x varies faster than y, which in turn varies faster than z, and so on. That is, a 4D input file corresponding to a flattened C array a[nw][nz][ny][nx] is specified as -4 nx ny nz nw.

Note

Note that -2 nx ny is not equivalent to -3 nx ny 1, even though the same number of values are compressed. One invokes the 2D codec, while the other uses the 3D codec, which in this example has to pad the input to an nx × ny × 4 array since arrays are partitioned into blocks of dimensions 4d. Such padding usually negatively impacts compression.

In addition to ensuring correct dimensionality, the order of dimensions also matters. For instance, -2 nx ny is not equivalent to -2 ny nx, i.e., with the dimensions transposed.

Note

In multidimensional arrays, the order in which dimensions are specified is important. In zfp, the memory layout convention is such that x varies faster than y, which varies faster than z, and hence x should map to the innermost (rightmost) array dimension in a C array and to the leftmost dimension in a Fortran array. Getting the order of dimensions right is crucial for good compression and accuracy. See the discussion of dimensions and strides and FAQ #0 for further information.

Using -h, the array dimensions and type are stored in a header of the compressed stream so that they do not have to be specified on the command line during decompression. The header also stores compression parameters, which are described below. The compressor and decompressor must agree on whether headers are used, and it is up to the user to enforce this.

zfp accepts several options for specifying how the data is to be compressed. The most general of these, the -c option, takes four constraint parameters that together can be used to achieve various effects. These constraints are:

minbits: the minimum number of bits used to represent a block
maxbits: the maximum number of bits used to represent a block
maxprec: the maximum number of bit planes encoded
minexp:  the smallest bit plane number encoded

These parameters are discussed in detail in the section on compression modes. Options -r, -p, and -a provide a simpler interface to setting all of the above parameters by invoking fixed-rate (-r), -precision (-p), and -accuracy (-a) mode. Reversible mode for lossless compression is specified using -R.

Usage

Below is a description of each command-line option accepted by zfp.

General options

-h

Read/write array and compression parameters from/to compressed header.

-q

Quiet mode; suppress diagnostic output.

-s

Evaluate and print the following error statistics:

  • rmse: The root mean square error.
  • nrmse: The root mean square error normalized to the range.
  • maxe: The maximum absolute pointwise error.
  • psnr: The peak signal to noise ratio in decibels.

Input and output

-i <path>

Name of uncompressed binary input file. Use “-” for standard input.

-o <path>

Name of decompressed binary output file. Use “-” for standard output. May be used with either -i, -z, or both.

-z <path>

Name of compressed input (without -i) or output file (with -i). Use “-” for standard input or output.

When -i is specified, data is read from the corresponding uncompressed file, compressed, and written to the compressed file specified by -z (when present). Without -i, compressed data is read from the file specified by -z and decompressed. In either case, the reconstructed data can be written to the file specified by -o.

Array type and dimensions

-f

Single precision (float type). Shorthand for -t f32.

-d

Double precision (double type). Shorthand for -t f64.

-t <type>

Specify scalar type as one of i32, i64, f32, f64 for 32- or 64-bit integer or floating scalar type.

-1 <nx>

Dimensions of 1D C array a[nx].

-2 <nx> <ny>

Dimensions of 2D C array a[ny][nx].

-3 <nx> <ny> <nz>

Dimensions of 3D C array a[nz][ny][nx].

-4 <nx> <ny> <nz> <nw>

Dimensions of 4D C array a[nw][nz][ny][nx].

When -i is used, the scalar type and array dimensions must be specified. One of -f, -d, or -t specifies the input scalar type. -1, -2, -3, or -4 specifies the array dimensions. The same parameters must be given when decompressing data (without -i), unless a header was stored using -h during compression.

Compression parameters

One of the following compression modes must be selected.

-r <rate>

Specify fixed rate in terms of number of compressed bits per integer or floating-point value.

-p <precision>

Specify fixed precision in terms of number of uncompressed bits per value.

-a <tolerance>

Specify fixed accuracy in terms of absolute error tolerance.

-R

Reversible (lossless) mode.

-c <minbits> <maxbits> <maxprec> <minexp>

Specify expert mode parameters.

When -i is used, the compression parameters must be specified. The same parameters must be given when decompressing data (without -i), unless a header was stored using -h when compressing. See the section on compression modes for a discussion of these parameters.

Execution parameters

-x <policy>

Specify execution policy and parameters. The default policy is -x serial for sequential execution. To enable OpenMP parallel compression, use the omp policy. Without parameters, -x omp selects OpenMP with default settings, which typically implies maximum concurrency available. Use -x omp=threads to request a specific number of threads (see also zfp_stream_set_omp_threads()). A thread count of zero is ignored and results in the default number of threads. Use -x omp=threads,chunk_size to specify the chunk size in number of blocks (see also zfp_stream_set_omp_chunk_size()). A chunk size of zero is ignored and results in the default size. Use -x cuda to for parallel CUDA compression and decompression.

As of 0.5.4, the execution policy applies to both compression and decompression. If the execution policy is not supported for decompression, then zfp will attempt to fall back on serial decompression. This is done only when both compression and decompression are performed as part of a single execution, e.g., when specifying both -i and -o.

Examples

  • -i file : read uncompressed file and compress to memory
  • -z file : read compressed file and decompress to memory
  • -i ifile -z zfile : read uncompressed ifile, write compressed zfile
  • -z zfile -o ofile : read compressed zfile, write decompressed ofile
  • -i ifile -o ofile : read ifile, compress, decompress, write ofile
  • -i file -s : read uncompressed file, compress to memory, print stats
  • -i - -o - -s : read stdin, compress, decompress, write stdout, print stats
  • -f -3 100 100 100 -r 16 : 2x fixed-rate compression of 100 × 100 × 100 floats
  • -d -1 1000000 -r 32 : 2x fixed-rate compression of 1,000,000 doubles
  • -d -2 1000 1000 -p 32 : 32-bit precision compression of 1000 × 1000 doubles
  • -d -1 1000000 -a 1e-9 : compression of 1,000,000 doubles with < 10-9 max error
  • -d -1 1000000 -c 64 64 0 -1074 : 4x fixed-rate compression of 1,000,000 doubles
  • -x omp=16,256 : parallel compression with 16 threads, 256-block chunks