Converting NWB HDF5 files to/from Zarr

This tutorial illustrates how to convert data between HDF5 and Zarr using a Neurodata Without Borders (NWB) file from the DANDI data archive as an example. In this tutorial we will convert our example file from HDF5 to Zarr and then back again to HDF5. The NWB standard is defined using HDMF and uses the HDF5IO HDF5 backend from HDMF for storage.

Setup

Here we use a small NWB file from the DANDI neurophysiology data archive from DANDIset 000009 as an example. To download the file directly from DANDI we can use:

1from dandi.dandiapi import DandiAPIClient
2dandiset_id = "000009"
3filepath = "sub-anm00239123/sub-anm00239123_ses-20170627T093549_ecephys+ogen.nwb"   # ~0.5MB file
4with DandiAPIClient() as client:
5    asset = client.get_dandiset(dandiset_id, 'draft').get_asset_by_path(filepath)
6    s3_path = asset.get_content_url(follow_redirects=1, strip_query=True)
7    filename = os.path.basename(asset.path)
8asset.download(filename)

We here use a local copy of a small file from this DANDIset as an example:

import os
import shutil
from pynwb import NWBHDF5IO
from hdmf_zarr.nwb import NWBZarrIO
from contextlib import suppress

# Input file to convert
basedir = "resources"
filename = os.path.join(basedir, "sub_anm00239123_ses_20170627T093549_ecephys_and_ogen.nwb")
# Zarr file to generate for converting from HDF5 to Zarr
zarr_filename = "test_zarr_" + os.path.basename(filename) + ".zarr"
# HDF5 file to generate for converting from Zarr to HDF5
hdf_filename = "test_hdf5_" + os.path.basename(filename)

# Delete our converted HDF5 and Zarr file from previous runs of this notebook
for fname in [zarr_filename, hdf_filename]:
    if os.path.exists(fname):
        print("Removing %s" % fname)
        if os.path.isfile(fname):  # Remove a single file (here the HDF5 file)
            os.remove(fname)
        else:  # remove whole directory and subtree (here the Zarr file)
            shutil.rmtree(zarr_filename)

Convert the NWB file from HDF5 to Zarr

To convert files between storage backends, we use HMDF’s export functionality. As this is an NWB file, we here use the pynwb.NWBHDF5IO backend for reading the file from from HDF5 and use the NWBZarrIO backend to export the file to Zarr.

with NWBHDF5IO(filename, 'r', load_namespaces=False) as read_io:  # Create HDF5 IO object for read
    with NWBZarrIO(zarr_filename, mode='w') as export_io:         # Create Zarr IO object for write
        export_io.export(src_io=read_io, write_args=dict(link_data=False))   # Export from HDF5 to Zarr

Note

When converting between backends we need to set link_data=False as linking from Zarr to HDF5 (and vice-versa) is not supported.

Read the Zarr file back in

The basic behavior of the NWBFile object is the same.

# Print the NWBFile to illustrate that
print(zf)
root pynwb.file.NWBFile at 0x140645900222912
Fields:
  devices: {
    ADunit <class 'pynwb.device.Device'>,
    laser <class 'pynwb.device.Device'>
  }
  electrode_groups: {
    ADunit_32 <class 'pynwb.ecephys.ElectrodeGroup'>
  }
  electrodes: electrodes <class 'hdmf.common.table.DynamicTable'>
  experiment_description: N/A
  experimenter: ['Zengcai Guo']
  file_create_date: [datetime.datetime(2019, 10, 7, 15, 10, 30, 595741, tzinfo=tzoffset(None, -18000))]
  identifier: anm00239123_2017-06-27_09-35-49
  institution: Janelia Research Campus
  intervals: {
    trials <class 'pynwb.epoch.TimeIntervals'>
  }
  keywords: <zarr.core.Array '/general/keywords' (6,) object read-only>
  ogen_sites: {
    left-ALM <class 'pynwb.ogen.OptogeneticStimulusSite'>
  }
  related_publications: ['doi:10.1038/nature22324']
  session_description: Extracellular ephys recording of mouse doing discrimination task(lick left/right), with optogenetic stimulation plus pole and auditory stimulus
  session_start_time: 2017-06-27 09:35:49-05:00
  subject: subject pynwb.file.Subject at 0x140645894236096
Fields:
  genotype: Ai32 x PV-Cre
  sex: M
  species: Mus musculus
  subject_id: anm00239123

  timestamps_reference_time: 2017-06-27 09:35:49-05:00
  trials: trials <class 'pynwb.epoch.TimeIntervals'>
  units: units <class 'pynwb.misc.Units'>

The main difference is that datasets are now represented by Zarr arrays compared to h5py Datasets when reading from HDF5.

print(type(zf.trials['start_time'].data))
<class 'zarr.core.Array'>

For illustration purposes, we here show a few columns of the Trials table.

zf.trials.to_dataframe()[['start_time', 'stop_time', 'type', 'photo_stim_type']]
zr.close()

Convert the Zarr file back to HDF5

Using the same approach as above, we can now convert our Zarr file back to HDF5.

with suppress(Exception):  # TODO: This is a temporary ignore on the convert_dtype exception.
    with NWBZarrIO(zarr_filename, mode='r') as read_io:  # Create Zarr IO object for read
        with NWBHDF5IO(hdf_filename, 'w') as export_io:  # Create HDF5 IO object for write
            export_io.export(src_io=read_io, write_args=dict(link_data=False))  # Export from Zarr to HDF5
/home/docs/checkouts/readthedocs.org/user_builds/hdmf-zarr/envs/latest/lib/python3.9/site-packages/hdmf/build/objectmapper.py:259: DtypeConversionWarning: Spec 'Units/spike_times_index': Value with data type int32 is being converted to data type uint32 (min specification: uint8).
  warnings.warn(full_warning_msg, DtypeConversionWarning)
/home/docs/checkouts/readthedocs.org/user_builds/hdmf-zarr/envs/latest/lib/python3.9/site-packages/hdmf/build/objectmapper.py:259: DtypeConversionWarning: Spec 'Units/electrodes_index': Value with data type int32 is being converted to data type uint32 (min specification: uint8).
  warnings.warn(full_warning_msg, DtypeConversionWarning)

Read the new HDF5 file back

Now our file has been converted from HDF5 to Zarr and back again to HDF5. Here we check that we can still read that file.

with suppress(Exception):  # TODO: This is a temporary ignore on the convert_dtype exception.
    with NWBHDF5IO(hdf_filename, 'r') as hr:
        hf = hr.read()

Gallery generated by Sphinx-Gallery