Note
Go to the end to download the full example code
ZarrIO Overview¶
The ZarrIO
backend behaves in general much like the
standard HDF5IO
available with HDMF and is an
adaptation of that backend to use Zarr instead of HDF5
Create an example DynamicTable Container¶
As a simple example, we here create a basic DynamicTable
for
describing basic user data.
Note
When writing a DynamicTable
(or any Container that is
normally not intended to be the root of a file) we need to use hdmf_zarr.backend.ROOT_NAME
as the name for the Container to ensure that link paths are created correctly by
ZarrIO
. This is due to the fact that the top-level Container
used during I/O is written as the root of the file. As such, the name of the root Container
of a file does not appear in the path to locate it.
# Import DynamicTable and get the ROOT_NAME
from hdmf.common.table import DynamicTable
from hdmf_zarr.backend import ROOT_NAME
# Setup a DynamicTable for managing data about users
users_table = DynamicTable(
name=ROOT_NAME,
description='a table containing data/metadata about users, one user per row',
)
users_table.add_column(
name='first_name',
description='the first name of the user',
)
users_table.add_column(
name='last_name',
description='the last name of the user',
)
users_table.add_column(
name='phone_number',
description='the phone number of the user',
index=True,
)
# Add some simple example data to our table
users_table.add_row(
first_name='Grace',
last_name='Hopper',
phone_number=['123-456-7890']
)
users_table.add_row(
first_name='Alan',
last_name='Turing',
phone_number=['555-666-7777', '888-111-2222']
)
# Show the table for validation
users_table.to_dataframe()
Writing the table to Zarr¶
from hdmf.common import get_manager
from hdmf_zarr.backend import ZarrIO
zarr_dir = "example.zarr"
with ZarrIO(path=zarr_dir, manager=get_manager(), mode='w') as zarr_io:
zarr_io.write(users_table)
Reading the table from Zarr¶
zarr_io = ZarrIO(path=zarr_dir, manager=get_manager(), mode='r')
intable = zarr_io.read()
intable.to_dataframe()
Converting to/from HDF5 using export
¶
Exporting the Zarr file to HDF5¶
To convert our Zarr file to HDF5 we can now simply read the file with our
ZarrIO
backend and the export the file
using HDMF’s HDF5IO
backend
from hdmf.backends.hdf5 import HDF5IO
with ZarrIO(path=zarr_dir, manager=get_manager(), mode='r') as zarr_read_io:
with HDF5IO(path="example.h5", manager=get_manager(), mode='w') as hdf5_export_io:
hdf5_export_io.export(src_io=zarr_read_io, write_args=dict(link_data=False)) # use export!
Note
When converting between backends we need to set link_data=False
as linking
between different storage backends (here from HDF5 to Zarr and vice versa) is
not supported.
Check that the HDF5 file is correct
with HDF5IO(path="example.h5", manager=get_manager(), mode='r') as hdf5_read_io:
intable_from_hdf5 = hdf5_read_io.read()
intable_hdf5_df = intable_from_hdf5.to_dataframe()
intable_hdf5_df # display the table in the gallery output
Exporting the HDF5 file to Zarr¶
In the same way as above, we can now also convert our HDF5 file back to Zarr
simply by reading the HDF5 file using HDMF’s HDF5IO
backend
and the exporting the file using the ZarrIO
backend.
with HDF5IO(path="example.h5", manager=get_manager(), mode='r') as hdf5_read_io:
with ZarrIO(path="example_exp.zarr", manager=get_manager(), mode='w') as zarr_export_io:
zarr_export_io.export(src_io=hdf5_read_io, write_args=dict(link_data=False)) # use export!
Check that the Zarr file is correct
with ZarrIO(path="example_exp.zarr", manager=get_manager(), mode='r') as zarr_read_io:
intable_from_zarr = zarr_read_io.read()
intable_zarr_df = intable_from_zarr.to_dataframe()
intable_zarr_df # display the table in the gallery output
Using custom Zarr storage backends¶
ZarrIO
supports a subset of data stores available
for Zarr, e.g., :py:class`~zarr.storage.DirectoryStore`, :py:class`~zarr.storage.TempStore`,
and :py:class`~zarr.storage.NestedDirectoryStore`. The supported stores are defined
in SUPPORTED_ZARR_STORES
. A main limitation to supporting
all possible Zarr stores in ZarrIO
is due to the fact that
Zarr does not support links and references.
To use a store other than the default, we simply need to instantiate the store
and set pass it to ZarrIO
via the path
parameter.
Here we use a :py:class`~zarr.storage.NestedDirectoryStore` to write a simple
hdmf.common.CSRMatrix
container to disk.
from zarr.storage import NestedDirectoryStore
from hdmf.common import CSRMatrix
zarr_nsd_dir = "example_nested_store.zarr"
store = NestedDirectoryStore(zarr_dir)
csr_container = CSRMatrix(
name=ROOT_NAME,
data=[1, 2, 3, 4, 5, 6],
indices=[0, 2, 2, 0, 1, 2],
indptr=[0, 2, 3, 6],
shape=(3, 3))
# Write the csr_container to Zarr using a NestedDirectoryStore
with ZarrIO(path=zarr_nsd_dir, manager=get_manager(), mode='w') as zarr_io:
zarr_io.write(csr_container)
# Read the CSR matrix to confirm the data was written correctly
with ZarrIO(path=zarr_nsd_dir, manager=get_manager(), mode='r') as zarr_io:
csr_read = zarr_io.read()
print(" data=%s\n indices=%s\n indptr=%s\n shape=%s" %
(str(csr_read.data), str(csr_read.indices), str(csr_read.indptr), str(csr_read.shape)))
data=[1 2 3 4 5 6]
indices=[0 2 2 0 1 2]
indptr=[0 2 3 6]
shape=[3 3]