Note
Go to the end to download the full example code
Converting NWB HDF5 files to/from Zarr¶
This tutorial illustrates how to convert data between HDF5 and Zarr using
a Neurodata Without Borders (NWB) file from the DANDI data archive as an example.
In this tutorial we will convert our example file from HDF5 to Zarr and then
back again to HDF5. The NWB standard is defined using HDMF and uses the
HDF5IO
HDF5 backend from HDMF for storage.
Setup¶
Here we use a small NWB file from the DANDI neurophysiology data archive from DANDIset 000009 as an example. To download the file directly from DANDI we can use:
1from dandi.dandiapi import DandiAPIClient
2dandiset_id = "000009"
3filepath = "sub-anm00239123/sub-anm00239123_ses-20170627T093549_ecephys+ogen.nwb" # ~0.5MB file
4with DandiAPIClient() as client:
5 asset = client.get_dandiset(dandiset_id, 'draft').get_asset_by_path(filepath)
6 s3_path = asset.get_content_url(follow_redirects=1, strip_query=True)
7 filename = os.path.basename(asset.path)
8asset.download(filename)
We here use a local copy of a small file from this DANDIset as an example:
import os
import shutil
from pynwb import NWBHDF5IO
from hdmf_zarr.nwb import NWBZarrIO
from contextlib import suppress
# Input file to convert
basedir = "resources"
filename = os.path.join(basedir, "sub_anm00239123_ses_20170627T093549_ecephys_and_ogen.nwb")
# Zarr file to generate for converting from HDF5 to Zarr
zarr_filename = "test_zarr_" + os.path.basename(filename) + ".zarr"
# HDF5 file to generate for converting from Zarr to HDF5
hdf_filename = "test_hdf5_" + os.path.basename(filename)
# Delete our converted HDF5 and Zarr file from previous runs of this notebook
for fname in [zarr_filename, hdf_filename]:
if os.path.exists(fname):
print("Removing %s" % fname)
if os.path.isfile(fname): # Remove a single file (here the HDF5 file)
os.remove(fname)
else: # remove whole directory and subtree (here the Zarr file)
shutil.rmtree(zarr_filename)
Convert the NWB file from HDF5 to Zarr¶
To convert files between storage backends, we use HMDF’s export functionality.
As this is an NWB file, we here use the pynwb.NWBHDF5IO
backend for reading the file from
from HDF5 and use the NWBZarrIO
backend to export the file to Zarr.
with NWBHDF5IO(filename, 'r', load_namespaces=False) as read_io: # Create HDF5 IO object for read
with NWBZarrIO(zarr_filename, mode='w') as export_io: # Create Zarr IO object for write
export_io.export(src_io=read_io, write_args=dict(link_data=False)) # Export from HDF5 to Zarr
Note
When converting between backends we need to set link_data=False
as linking
from Zarr to HDF5 (and vice-versa) is not supported.
Read the Zarr file back in¶
zr = NWBZarrIO(zarr_filename, 'r')
zf = zr.read()
The basic behavior of the NWBFile
object is the same.
# Print the NWBFile to illustrate that
print(zf)
root pynwb.file.NWBFile at 0x140645900222912
Fields:
devices: {
ADunit <class 'pynwb.device.Device'>,
laser <class 'pynwb.device.Device'>
}
electrode_groups: {
ADunit_32 <class 'pynwb.ecephys.ElectrodeGroup'>
}
electrodes: electrodes <class 'hdmf.common.table.DynamicTable'>
experiment_description: N/A
experimenter: ['Zengcai Guo']
file_create_date: [datetime.datetime(2019, 10, 7, 15, 10, 30, 595741, tzinfo=tzoffset(None, -18000))]
identifier: anm00239123_2017-06-27_09-35-49
institution: Janelia Research Campus
intervals: {
trials <class 'pynwb.epoch.TimeIntervals'>
}
keywords: <zarr.core.Array '/general/keywords' (6,) object read-only>
ogen_sites: {
left-ALM <class 'pynwb.ogen.OptogeneticStimulusSite'>
}
related_publications: ['doi:10.1038/nature22324']
session_description: Extracellular ephys recording of mouse doing discrimination task(lick left/right), with optogenetic stimulation plus pole and auditory stimulus
session_start_time: 2017-06-27 09:35:49-05:00
subject: subject pynwb.file.Subject at 0x140645894236096
Fields:
genotype: Ai32 x PV-Cre
sex: M
species: Mus musculus
subject_id: anm00239123
timestamps_reference_time: 2017-06-27 09:35:49-05:00
trials: trials <class 'pynwb.epoch.TimeIntervals'>
units: units <class 'pynwb.misc.Units'>
The main difference is that datasets are now represented by Zarr arrays compared to h5py Datasets when reading from HDF5.
print(type(zf.trials['start_time'].data))
<class 'zarr.core.Array'>
For illustration purposes, we here show a few columns of the Trials table.
zf.trials.to_dataframe()[['start_time', 'stop_time', 'type', 'photo_stim_type']]
zr.close()
Convert the Zarr file back to HDF5¶
Using the same approach as above, we can now convert our Zarr file back to HDF5.
with suppress(Exception): # TODO: This is a temporary ignore on the convert_dtype exception.
with NWBZarrIO(zarr_filename, mode='r') as read_io: # Create Zarr IO object for read
with NWBHDF5IO(hdf_filename, 'w') as export_io: # Create HDF5 IO object for write
export_io.export(src_io=read_io, write_args=dict(link_data=False)) # Export from Zarr to HDF5
/home/docs/checkouts/readthedocs.org/user_builds/hdmf-zarr/envs/latest/lib/python3.9/site-packages/hdmf/build/objectmapper.py:259: DtypeConversionWarning: Spec 'Units/spike_times_index': Value with data type int32 is being converted to data type uint32 (min specification: uint8).
warnings.warn(full_warning_msg, DtypeConversionWarning)
/home/docs/checkouts/readthedocs.org/user_builds/hdmf-zarr/envs/latest/lib/python3.9/site-packages/hdmf/build/objectmapper.py:259: DtypeConversionWarning: Spec 'Units/electrodes_index': Value with data type int32 is being converted to data type uint32 (min specification: uint8).
warnings.warn(full_warning_msg, DtypeConversionWarning)
Read the new HDF5 file back¶
Now our file has been converted from HDF5 to Zarr and back again to HDF5. Here we check that we can still read that file.