Note
Go to the end to download the full example code
Zarr Dataset I/O¶
To customize data write settings on a per-dataset basis, HDMF supports
wrapping of data arrays using DataIO
. To
support defining settings specific to Zarr hdmf-zarr
provides
the corresponding ZarrDataIO
class.
Create an example DynamicTable Container¶
As a simple example, we first create a DynamicTable
container
to store some arbitrary data columns.
# Import DynamicTable and get the ROOT_NAME
from hdmf.common.table import DynamicTable, VectorData
from hdmf_zarr.backend import ROOT_NAME
from hdmf_zarr import ZarrDataIO
import numpy as np
# Setup a DynamicTable for managing data about users
data = np.arange(50).reshape(10, 5)
column = VectorData(
name='test_data_default_settings',
description='Some 2D test data',
data=data)
test_table = DynamicTable(
name=ROOT_NAME,
description='a table containing data/metadata about users, one user per row',
columns=(column, ),
colnames=(column.name, )
)
Defining Data I/O settings¶
To define custom settings for write (e.g., for chunking and compression) we simply
wrap our data array using ZarrDataIO
.
from numcodecs import Blosc
data_with_data_io = ZarrDataIO(
data=data * 3,
chunks=(10, 10),
fillvalue=0,
compressor=Blosc(cname='zstd', clevel=1, shuffle=Blosc.SHUFFLE)
)
Adding the data to our table
test_table.add_column(
name='test_data_zstd_compression',
description='Some 2D test data',
data=data_with_data_io)
Next we add a column where we explicitly disable compression
data_without_compression = ZarrDataIO(
data=data*5,
compressor=False)
test_table.add_column(
name='test_data_nocompression',
description='Some 2D test data',
data=data_without_compression)
Note
To control linking to other datasets see the link_data
parameter of ZarrDataIO
Note
In the case of Data
(or here VectorData
) we
can also set the DataIO
object to use via the set_dataio()
function.
Writing and Reading¶
Reading and writing data with filters works as usual. See the ZarrIO Overview tutorial for details.
from hdmf.common import get_manager
from hdmf_zarr.backend import ZarrIO
zarr_dir = "example_data.zarr"
with ZarrIO(path=zarr_dir, manager=get_manager(), mode='w') as zarr_io:
zarr_io.write(test_table)
reading the table from Zarr
zarr_io = ZarrIO(path=zarr_dir, manager=get_manager(), mode='r')
intable = zarr_io.read()
intable.to_dataframe()
Check dataset settings used.
for c in intable.columns:
print("Name=%s, Chunks=% s, Compressor=%s" %
(c.name,
str(c.data.chunks),
str(c.data.compressor)))
Name=test_data_default_settings, Chunks=(10, 5), Compressor=Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Name=test_data_zstd_compression, Chunks=(10, 10), Compressor=Blosc(cname='zstd', clevel=1, shuffle=SHUFFLE, blocksize=0)
Name=test_data_nocompression, Chunks=(10, 5), Compressor=None