Introduction

zfp is an open-source library for representing multidimensional numerical arrays in compressed form to reduce storage and bandwidth requirements. zfp consists of four main components:

  • An efficient number format for representing small, fixed-size blocks of real values. The zfp format usually provides higher accuracy per bit stored than conventional number formats like IEEE 754 floating point.
  • A set of classes that implement storage and manipulation of a multidimensional array data type. zfp arrays support high-speed read and write random access to individual array elements and are a drop-in replacement for std::vector and native C/C++ arrays. zfp arrays provide accessors like proxy pointers, iterators, and views. zfp arrays allow specifying an exact memory footprint or an error tolerance.
  • A C library for streaming compression of partial or whole arrays of integers or floating-point numbers, e.g., for applications that read and write large data sets to and from disk. This library supports fast, parallel (de)compression via OpenMP and CUDA.
  • A command-line executable for compressing binary files of integer or floating-point arrays, e.g., as a substitute for general-purpose compressors like gzip.

As a compressor, zfp is primarily lossy, meaning that the numerical values are usually only approximately represented, though the user may specify error tolerances to limit the amount of loss. Fully lossless compression, where values are represented exactly, is also supported.

zfp is primarily written in C and C++ but also includes Python and Fortran bindings. zfp is being developed at Lawrence Livermore National Laboratory and is supported by the U.S. Department of Energy’s Exascale Computing Project.

Availability

zfp is freely available as open source on GitHub and is distributed under the terms of a permissive three-clause BSD license. zfp may be installed using CMake or GNU Make. Installation from source code is recommended for users who wish to configure the internals of zfp and select which components (e.g., programming models, language bindings) to install.

zfp is also available through several package managers, including Conda (both C/C++ and Python packages are available), PIP, and Spack. RPM packages are available for several Linux distributions and may be installed using apt or yum.

Application Support

zfp has been incorporated into several independently developed applications, plugins, and formats, such as

See this list for other software products that support zfp.

Usage

The typical user will interact with zfp via one or more of its components, specifically

  • Via the C API when doing I/O in an application or otherwise performing data (de)compression online. High-speed, parallel compression is supported via OpenMP and CUDA.
  • Via zfp’s in-memory compressed-array classes when performing computations on very large arrays that demand random access to array elements, e.g., in visualization, data analysis, or even in numerical simulation. These classes can often substitute C/C++ arrays and STL vectors in applications with minimal code changes.
  • Via the zfp command-line tool when compressing binary files offline.
  • Via third-party I/O libraries or tools that support zfp.

Technology

zfp compresses d-dimensional (1D, 2D, 3D, and 4D) arrays of integer or floating-point values by partitioning the array into cubical blocks of 4d values, i.e., 4, 16, 64, or 256 values for 1D, 2D, 3D, and 4D arrays, respectively. Each such block is independently compressed to a fixed- or variable-length bit string, and these bit strings may be concatenated into a single stream of bits.

zfp usually truncates each per-block bit string to a fixed number of bits to meet a storage budget or to some variable length needed to meet a given error tolerance, as dictated by the compressibility of the data. The bit string representing any given block may be truncated at any point and still yield a valid approximation. The early bits are most important; later bits progressively refine the approximation, similar to how the last few bits in a floating-point number have less significance than the first several bits. The trailing bits can usually be discarded (zeroed) with limited impact on accuracy.

zfp was originally designed for floating-point arrays only but has been extended to also support integer data, and could for instance be used to compress images and quantized volumetric data. To achieve high compression ratios, zfp generally uses lossy but optionally error-bounded compression. Bit-for-bit lossless compression is also possible through one of zfp’s compression modes.

zfp works best for 2D-4D arrays that exhibit spatial correlation, such as continuous fields from physics simulations, images, regularly sampled terrain surfaces, etc. Although zfp also provides support for 1D arrays, e.g., for audio signals or even unstructured floating-point streams, the compression scheme has not been well optimized for this use case, and compression ratio and quality may not be competitive with floating-point compressors designed specifically for 1D streams.

In all use cases, it is important to know how to use zfp’s compression modes as well as what the limitations of zfp are. Although it is not critical to understand the compression algorithm itself, having some familiarity with its major components may help understand what to expect and how zfp’s parameters influence the result.

Resources

zfp is based on the algorithm described in the following paper:

Peter Lindstrom
IEEE Transactions on Visualization and Computer Graphics
20(12):2674-2683, December 2014

zfp has evolved since the original publication; the algorithm implemented in the current version is described in:

James Diffenderfer, Alyson Fox, Jeffrey Hittinger, Geoffrey Sanders, Peter Lindstrom
SIAM Journal on Scientific Computing
41(3):A1867-A1898, 2019

For more information on zfp, please see the zfp website. For bug reports, please consult the GitHub issue tracker. For questions, comments, and requests, please contact us.