Note
Go to the end to download the full example code.
Converting NWB HDF5 files to/from Zarr
This tutorial illustrates how to convert data between HDF5 and Zarr using
a Neurodata Without Borders (NWB) file from the DANDI data archive as an example.
In this tutorial we will convert our example file from HDF5 to Zarr and then
back again to HDF5. The NWB standard is defined using HDMF and uses the
HDF5IO HDF5 backend from HDMF for storage.
Setup
Here we use a small NWB file from the DANDI neurophysiology data archive from Dandiset 001333 as an example. To download the file directly from DANDI we can use:
1import os
2from dandi.dandiapi import DandiAPIClient
3
4dandiset_id = "001333"
5filepath = "sub-healthy-simulated-beta/sub-healthy-simulated-beta_ses-162_ecephys.nwb" # 220 KiB file
6with DandiAPIClient() as client:
7 asset = client.get_dandiset(dandiset_id, 'draft').get_asset_by_path(filepath)
8
9s3_path = asset.get_content_url(follow_redirects=1, strip_query=True)
10filename = os.path.basename(asset.path)
11asset.download(filename)
We here use a local copy of a small file from this Dandiset as an example:
import os
import shutil
from pynwb import NWBHDF5IO
from hdmf_zarr.nwb import NWBZarrIO
# Input file to convert
basedir = "resources"
filename = os.path.join(basedir, "sub-healthy-simulated-beta_ses-162_ecephys.nwb")
# Zarr file to generate for converting from HDF5 to Zarr
zarr_filename = "test_zarr_" + os.path.basename(filename) + ".zarr"
# HDF5 file to generate for converting from Zarr to HDF5
hdf_filename = "test_hdf5_" + os.path.basename(filename)
# Delete our converted HDF5 and Zarr file from previous runs of this notebook
for fname in [zarr_filename, hdf_filename]:
if os.path.exists(fname):
print("Removing %s" % fname)
if os.path.isfile(fname): # Remove a single file (here the HDF5 file)
os.remove(fname)
else: # Remove whole directory and subtree (here the Zarr file)
shutil.rmtree(fname)
Convert the NWB file from HDF5 to Zarr
To convert files between storage backends, we use HDMF’s export functionality.
As this is an NWB file, we here use the pynwb.NWBHDF5IO backend for reading the file from
from HDF5 and use the NWBZarrIO backend to export the file to Zarr.
with NWBHDF5IO(filename, 'r') as read_io: # Create HDF5 IO object for read
with NWBZarrIO(zarr_filename, 'w') as export_io: # Create Zarr IO object for write
export_io.export(src_io=read_io, write_args=dict(link_data=False)) # Export from HDF5 to Zarr
Note
When converting between backends we need to set link_data=False as linking
from Zarr to HDF5 (and vice-versa) is not supported.
Read the Zarr file back in
zarr_io = NWBZarrIO(zarr_filename, 'r')
nwb_zarr = zarr_io.read()
The basic behavior of the NWBFile object is the same.
# Print the NWBFile to illustrate that
print(nwb_zarr)
root pynwb.file.NWBFile at 0x135214958219200
Fields:
devices: {
NEURON_Simulator <class 'pynwb.device.Device'>
}
electrode_groups: {
shank0 <class 'pynwb.ecephys.ElectrodeGroup'>,
shank1 <class 'pynwb.ecephys.ElectrodeGroup'>,
shank2 <class 'pynwb.ecephys.ElectrodeGroup'>,
shank3 <class 'pynwb.ecephys.ElectrodeGroup'>
}
electrodes: electrodes <class 'pynwb.ecephys.ElectrodesTable'>
experiment_description: The PESD dataset is generated from a cortico-basal-ganglia network for a Parkinsonian computational model. The computational model of the cortico-basal-ganglia is originally presented by Fleming et al. in the article: 'Simulation of Closed-Loop Deep Brain Stimulation Control Schemes for Suppression of Pathological Beta Oscillations in Parkinson's Disease'.
experimenter: ['Ananna Biswas']
file_create_date: [datetime.datetime(2025, 3, 27, 16, 53, 28, 55430, tzinfo=tzoffset(None, -14400))]
identifier: 7a68ea11-865a-481a-a5fd-d91fe6def653
institution: Michigan Technological University
keywords: <zarr.core.Array '/general/keywords' (4,) object read-only>
lab: BrainX Lab
processing: {
ecephys <class 'pynwb.base.ProcessingModule'>
}
related_publications: ['https://arxiv.org/abs/2407.17756' 'DOI: 10.3389/fnins.2020.00166']
session_description: Parkinson's Electrophysiological Signal Dataset (PESD) Generated from Simulation
session_start_time: 2025-03-27 16:53:27.990500-04:00
subject: subject pynwb.file.Subject at 0x135214958217072
Fields:
age: P0D
age__reference: birth
description: This is a simulated dataset generated from a computational model.
sex: U
species: Homo sapiens
subject_id: healthy-simulated-beta
timestamps_reference_time: 2025-03-27 16:53:27.990500-04:00
The main difference is that datasets are now represented by Zarr arrays compared to h5py Datasets when reading from HDF5.
print(type(nwb_zarr.electrodes['label'].data))
<class 'zarr.core.Array'>
For illustration purposes, we here show the NWB Electrodes table.
print(nwb_zarr.electrodes.to_dataframe())
zarr_io.close()
location ... label
id ...
0 Simulated Cortico-basal-ganglia network of brain ... shank0_elec0
1 Simulated Cortico-basal-ganglia network of brain ... shank0_elec1
2 Simulated Cortico-basal-ganglia network of brain ... shank0_elec2
3 Simulated Cortico-basal-ganglia network of brain ... shank1_elec0
4 Simulated Cortico-basal-ganglia network of brain ... shank1_elec1
5 Simulated Cortico-basal-ganglia network of brain ... shank1_elec2
6 Simulated Cortico-basal-ganglia network of brain ... shank2_elec0
7 Simulated Cortico-basal-ganglia network of brain ... shank2_elec1
8 Simulated Cortico-basal-ganglia network of brain ... shank2_elec2
9 Simulated Cortico-basal-ganglia network of brain ... shank3_elec0
10 Simulated Cortico-basal-ganglia network of brain ... shank3_elec1
11 Simulated Cortico-basal-ganglia network of brain ... shank3_elec2
[12 rows x 4 columns]
Convert the Zarr file back to HDF5
Using the same approach as above, we can now convert our Zarr file back to HDF5.
with NWBZarrIO(zarr_filename, 'r') as read_io: # Create Zarr IO object for read
with NWBHDF5IO(hdf_filename, 'w') as export_io: # Create HDF5 IO object for write
export_io.export(src_io=read_io, write_args=dict(link_data=False)) # Export from Zarr to HDF5
Read the new HDF5 file back
Now our file has been converted from HDF5 to Zarr and back again to HDF5. Here we check that we can still read that file.
with NWBHDF5IO(hdf_filename, 'r') as hdf5_io:
nwb_hdf5 = hdf5_io.read()
print(nwb_hdf5)
root pynwb.file.NWBFile at 0x135214957317008
Fields:
devices: {
NEURON_Simulator <class 'pynwb.device.Device'>
}
electrode_groups: {
shank0 <class 'pynwb.ecephys.ElectrodeGroup'>,
shank1 <class 'pynwb.ecephys.ElectrodeGroup'>,
shank2 <class 'pynwb.ecephys.ElectrodeGroup'>,
shank3 <class 'pynwb.ecephys.ElectrodeGroup'>
}
electrodes: electrodes <class 'pynwb.ecephys.ElectrodesTable'>
experiment_description: The PESD dataset is generated from a cortico-basal-ganglia network for a Parkinsonian computational model. The computational model of the cortico-basal-ganglia is originally presented by Fleming et al. in the article: 'Simulation of Closed-Loop Deep Brain Stimulation Control Schemes for Suppression of Pathological Beta Oscillations in Parkinson's Disease'.
experimenter: ['Ananna Biswas']
file_create_date: [datetime.datetime(2025, 3, 27, 16, 53, 28, 55430, tzinfo=tzoffset(None, -14400))]
identifier: 7a68ea11-865a-481a-a5fd-d91fe6def653
institution: Michigan Technological University
keywords: <StrDataset for HDF5 dataset "keywords": shape (4,), type "|O">
lab: BrainX Lab
processing: {
ecephys <class 'pynwb.base.ProcessingModule'>
}
related_publications: ['https://arxiv.org/abs/2407.17756' 'DOI: 10.3389/fnins.2020.00166']
session_description: Parkinson's Electrophysiological Signal Dataset (PESD) Generated from Simulation
session_start_time: 2025-03-27 16:53:27.990500-04:00
subject: subject pynwb.file.Subject at 0x135214957319024
Fields:
age: P0D
age__reference: birth
description: This is a simulated dataset generated from a computational model.
sex: U
species: Homo sapiens
subject_id: healthy-simulated-beta
timestamps_reference_time: 2025-03-27 16:53:27.990500-04:00