Zarr Dataset I/O

To customize data write settings on a per-dataset basis, HDMF supports wrapping of data arrays using DataIO. To support defining settings specific to Zarr hdmf-zarr provides the corresponding ZarrDataIO class.

Create an example DynamicTable Container

As a simple example, we first create a DynamicTable container to store some arbitrary data columns.

# Import DynamicTable and get the ROOT_NAME
from hdmf.common.table import DynamicTable, VectorData
from hdmf_zarr.backend import ROOT_NAME
from hdmf_zarr import ZarrDataIO
import numpy as np

# Setup a DynamicTable for managing data about users
data = np.arange(50).reshape(10, 5)
column = VectorData(
    name='test_data_default_settings',
    description='Some 2D test data',
    data=data)
test_table = DynamicTable(
    name=ROOT_NAME,
    description='a table containing data/metadata about users, one user per row',
    columns=(column, ),
    colnames=(column.name, )
)

Defining Data I/O settings

To define custom settings for write (e.g., for chunking and compression) we simply wrap our data array using ZarrDataIO.

from numcodecs import Blosc

data_with_data_io = ZarrDataIO(
    data=data * 3,
    chunks=(10, 10),
    fillvalue=0,
    compressor=Blosc(cname='zstd', clevel=1, shuffle=Blosc.SHUFFLE)
)

Adding the data to our table

test_table.add_column(
    name='test_data_zstd_compression',
    description='Some 2D test data',
    data=data_with_data_io)

Next we add a column where we explicitly disable compression

data_without_compression = ZarrDataIO(
    data=data*5,
    compressor=False)
test_table.add_column(
    name='test_data_nocompression',
    description='Some 2D test data',
    data=data_without_compression)

Note

To control linking to other datasets see the link_data parameter of ZarrDataIO

Note

In the case of Data (or here VectorData) we can also set the DataIO object to use via the set_dataio() function.

Writing and Reading

Reading and writing data with filters works as usual. See the ZarrIO Overview tutorial for details.

from hdmf.common import get_manager
from hdmf_zarr.backend import ZarrIO

zarr_dir = "example_data.zarr"
with ZarrIO(path=zarr_dir,  manager=get_manager(), mode='w') as zarr_io:
    zarr_io.write(test_table)

reading the table from Zarr

zarr_io = ZarrIO(path=zarr_dir,  manager=get_manager(), mode='r')
intable = zarr_io.read()
intable.to_dataframe()
test_data_default_settings test_data_zstd_compression test_data_nocompression
id
0 [0, 1, 2, 3, 4] [0, 3, 6, 9, 12] [0, 5, 10, 15, 20]
1 [5, 6, 7, 8, 9] [15, 18, 21, 24, 27] [25, 30, 35, 40, 45]
2 [10, 11, 12, 13, 14] [30, 33, 36, 39, 42] [50, 55, 60, 65, 70]
3 [15, 16, 17, 18, 19] [45, 48, 51, 54, 57] [75, 80, 85, 90, 95]
4 [20, 21, 22, 23, 24] [60, 63, 66, 69, 72] [100, 105, 110, 115, 120]
5 [25, 26, 27, 28, 29] [75, 78, 81, 84, 87] [125, 130, 135, 140, 145]
6 [30, 31, 32, 33, 34] [90, 93, 96, 99, 102] [150, 155, 160, 165, 170]
7 [35, 36, 37, 38, 39] [105, 108, 111, 114, 117] [175, 180, 185, 190, 195]
8 [40, 41, 42, 43, 44] [120, 123, 126, 129, 132] [200, 205, 210, 215, 220]
9 [45, 46, 47, 48, 49] [135, 138, 141, 144, 147] [225, 230, 235, 240, 245]


Check dataset settings used.

for c in intable.columns:
    print("Name=%s, Chunks=% s, Compressor=%s" %
          (c.name,
           str(c.data.chunks),
           str(c.data.compressor)))
Name=test_data_default_settings, Chunks=(10, 5), Compressor=Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Name=test_data_zstd_compression, Chunks=(10, 10), Compressor=Blosc(cname='zstd', clevel=1, shuffle=SHUFFLE, blocksize=0)
Name=test_data_nocompression, Chunks=(10, 5), Compressor=None

Gallery generated by Sphinx-Gallery