Zarr Dataset I/O¶

To customize data write settings on a per-dataset basis, HDMF supports wrapping of data arrays using DataIO. To support defining settings specific to Zarr hdmf-zarr provides the corresponding ZarrDataIO class.

Create an example DynamicTable Container¶

As a simple example, we first create a DynamicTable container to store some arbitrary data columns.

# Import DynamicTable and get the ROOT_NAME
from hdmf.common.table import DynamicTable, VectorData
from hdmf_zarr.backend import ROOT_NAME
from hdmf_zarr import ZarrDataIO
import numpy as np

# Setup a DynamicTable for managing data about users
data = np.arange(50).reshape(10, 5)
column = VectorData(
    name='test_data_default_settings',
    description='Some 2D test data',
    data=data)
test_table = DynamicTable(
    name=ROOT_NAME,
    description='a table containing data/metadata about users, one user per row',
    columns=(column, ),
    colnames=(column.name, )
)

Defining Data I/O settings¶

To define custom settings for write (e.g., for chunking and compression) we simply wrap our data array using ZarrDataIO.

from numcodecs import Blosc

data_with_data_io = ZarrDataIO(
    data=data * 3,
    chunks=(10, 10),
    fillvalue=0,
    compressor=Blosc(cname='zstd', clevel=1, shuffle=Blosc.SHUFFLE)
)

Adding the data to our table

test_table.add_column(
    name='test_data_zstd_compression',
    description='Some 2D test data',
    data=data_with_data_io)

Next we add a column where we explicitly disable compression

data_without_compression = ZarrDataIO(
    data=data*5,
    compressor=False)
test_table.add_column(
    name='test_data_nocompression',
    description='Some 2D test data',
    data=data_without_compression)

Note

To control linking to other datasets see the link_data parameter of ZarrDataIO

Note

In the case of Data (or here VectorData) we can also set the DataIO object to use via the set_dataio() function.

Writing and Reading¶

Reading and writing data with filters works as usual. See the ZarrIO Overview tutorial for details.

from hdmf.common import get_manager
from hdmf_zarr.backend import ZarrIO

zarr_dir = "example_data.zarr"
with ZarrIO(path=zarr_dir,  manager=get_manager(), mode='w') as zarr_io:
    zarr_io.write(test_table)

reading the table from Zarr

zarr_io = ZarrIO(path=zarr_dir,  manager=get_manager(), mode='r')
intable = zarr_io.read()
intable.to_dataframe()

	test_data_default_settings	test_data_zstd_compression	test_data_nocompression
id
0	[0, 1, 2, 3, 4]	[0, 3, 6, 9, 12]	[0, 5, 10, 15, 20]
1	[5, 6, 7, 8, 9]	[15, 18, 21, 24, 27]	[25, 30, 35, 40, 45]
2	[10, 11, 12, 13, 14]	[30, 33, 36, 39, 42]	[50, 55, 60, 65, 70]
3	[15, 16, 17, 18, 19]	[45, 48, 51, 54, 57]	[75, 80, 85, 90, 95]
4	[20, 21, 22, 23, 24]	[60, 63, 66, 69, 72]	[100, 105, 110, 115, 120]
5	[25, 26, 27, 28, 29]	[75, 78, 81, 84, 87]	[125, 130, 135, 140, 145]
6	[30, 31, 32, 33, 34]	[90, 93, 96, 99, 102]	[150, 155, 160, 165, 170]
7	[35, 36, 37, 38, 39]	[105, 108, 111, 114, 117]	[175, 180, 185, 190, 195]
8	[40, 41, 42, 43, 44]	[120, 123, 126, 129, 132]	[200, 205, 210, 215, 220]
9	[45, 46, 47, 48, 49]	[135, 138, 141, 144, 147]	[225, 230, 235, 240, 245]

Check dataset settings used.

for c in intable.columns:
    print("Name=%s, Chunks=% s, Compressor=%s" %
          (c.name,
           str(c.data.chunks),
           str(c.data.compressor)))

Name=test_data_default_settings, Chunks=(10, 5), Compressor=Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Name=test_data_zstd_compression, Chunks=(10, 10), Compressor=Blosc(cname='zstd', clevel=1, shuffle=SHUFFLE, blocksize=0)
Name=test_data_nocompression, Chunks=(10, 5), Compressor=None

zarr_io.close()

Gallery generated by Sphinx-Gallery