Python Bindings¶
zfp 0.5.5 adds zfPy: Python bindings that allow compressing and decompressing NumPy integer and floating-point arrays. The zfPy implementation is based on Cython and requires both NumPy and Cython to be installed. Currently, zfPy supports only serial execution.
The zfPy API is limited to two functions, for compression and decompression, which are described below.
Compression¶
- zfpy.compress_numpy(arr, tolerance=-1, rate=-1, precision=-1, write_header=True)¶
Compress NumPy array, arr, and return a compressed byte stream. The non-expert compression mode is selected by setting one of tolerance, rate, or precision. If none of these arguments is specified, then reversible mode is used. By default, a header that encodes array shape and scalar type as well as compression parameters is prepended, which can be omitted by setting write_header to False. If this function fails for any reason, an exception is thrown.
zfPy compression currently requires a NumPy array
(ndarray)
populated with the data to be compressed. The array metadata (i.e.,
shape, strides, and scalar type) are used to automatically populate the
zfp_field
structure passed to zfp_compress()
. By default,
all that is required to be passed to the compression function is the
NumPy array; this will result in a stream that includes a header and is
losslessly compressed using the reversible mode.
For example:
import zfpy
import numpy as np
my_array = np.arange(1, 20)
compressed_data = zfpy.compress_numpy(my_array)
decompressed_array = zfpy.decompress_numpy(compressed_data)
# confirm lossless compression/decompression
np.testing.assert_array_equal(my_array, decompressed_array)
Using the fixed-accuracy, fixed-rate, or fixed-precision modes simply requires setting one of the tolerance, rate, or precision arguments, respectively. For example:
compressed_data = zfpy.compress_numpy(my_array, tolerance=1e-3)
decompressed_array = zfpy.decompress_numpy(compressed_data)
# Note the change from "equal" to "allclose" due to the lossy compression
np.testing.assert_allclose(my_array, decompressed_array, atol=1e-3)
Since NumPy arrays are C-ordered by default (i.e., the rightmost index
varies fastest) and zfp_compress()
assumes Fortran ordering
(i.e., the leftmost index varies fastest), compress_numpy()
automatically reverses the order of dimensions and strides in order to
improve the expected memory access pattern during compression.
The decompress_numpy()
function also reverses the order of
dimensions and strides, and therefore decompression will restore the
shape of the original array. Note, however, that the zfp stream does
not encode the memory layout of the original NumPy array, and therefore
layout information like strides, contiguity, and C vs. Fortran order
may not be preserved. Nevertheless, zfPy correctly compresses NumPy
arrays with any memory layout, including Fortran ordering and non-contiguous
storage.
Byte streams produced by compress_numpy()
can be decompressed
by the zfp command-line tool. In general, they cannot
be deserialized as compressed arrays, however.
Note
decompress_numpy()
requires a header to decompress properly, so do
not set write_header = False during compression if you intend to
decompress the stream with zfPy.
Decompression¶
- zfpy.decompress_numpy(compressed_data)¶
Decompress a byte stream, compressed_data, produced by
compress_numpy()
(with header enabled) and return the decompressed NumPy array. This function throws on exception upon error.
decompress_numpy()
consumes a compressed stream that includes a
header and produces a NumPy array with metadata populated based on the
contents of the header. Stride information is not stored in the zfp
header, so decompress_numpy()
assumes that the array was compressed
with the first (leftmost) dimension varying fastest (typically referred to as
Fortran-ordering). The returned NumPy array is in C-ordering (the default
for NumPy arrays), so the shape of the returned array is reversed from
the shape information stored in the embedded header. For example, if the
header declares the array to be of shape (nx, ny, nz) = (2, 4, 8),
then the returned NumPy array will have a shape of (8, 4, 2).
Since the compress_numpy()
function also reverses the order of
dimensions, arrays both compressed and decompressed with zfPy will have
compatible shape.
Note
Decompressing a stream without a header requires using the
internal _decompress()
Python function (or the
C API).
- zfpy._decompress(compressed_data, ztype, shape, out=None, tolerance=-1, rate=-1, precision=-1)¶
Decompress a headerless compressed stream (if a header is present in the stream, it will be incorrectly interpreted as compressed data). ztype specifies the array scalar type while shape specifies the array dimensions; both must be known by the caller. The compression mode is selected by specifying one (or none) of tolerance, rate, and precision, as in
compress_numpy()
, and also must be known by the caller. If out = None, a new NumPy array is allocated. Otherwise, out specifies the NumPy array or memory buffer to decompress into. Regardless, the decompressed NumPy array is returned unless an error occurs, in which case an exception is thrown.
In _decompress()
, ztype is one of the zfp supported scalar types
(see zfp_type
), which are available in zfPy as
type_int32 = zfp_type_int32
type_int64 = zfp_type_int64
type_float = zfp_type_float
type_double = zfp_type_double
These can be manually specified (e.g., zfpy.type_int32
) or generated
from a NumPy dtype (e.g., zfpy.dtype_to_ztype(array.dtype)
).
If out is specified, the data is decompressed into the out buffer.
out can be a NumPy array or a pointer to memory large enough to hold the
decompressed data. Regardless of the type of out and whether it is provided,
_decompress()
always returns a NumPy array. If out is not
provided, then the array is allocated for the user. If out is provided,
then the returned NumPy array is just a pointer to or wrapper around the
user-supplied out. If out is a NumPy array, then its shape and scalar
type must match the required arguments shape and ztype. To avoid this
constraint check, use out = ndarray.data
rather than
out = ndarray
when calling _decompress()
.
Warning
_decompress()
is an “experimental” function currently used
internally for testing. It does allow decompression of streams without
headers, but providing too small of an output buffer or incorrectly
specifying the shape or strides can result in segmentation faults.
Use with care.