hdmf_zarr package

Submodules

Module contents

class hdmf_zarr.ZarrIO(path, mode, manager=None, synchronizer=None, object_codec_class=None, storage_options=None, force_overwrite=False)

Bases: HDMFIO

Parameters:
  • path (str or Path or DirectoryStore or TempStore or NestedDirectoryStore) – the path to the Zarr file or a supported Zarr store

  • mode (str) – the mode to open the Zarr file with, one of (“w”, “r”, “r+”, “a”, “r-“). the mode r- is used to force open without consolidated metadata in read only mode.

  • manager (BuildManager) – the BuildManager to use for I/O

  • synchronizer (ProcessSynchronizer or ThreadSynchronizer or bool) – Zarr synchronizer to use for parallel I/O. If set to True a ProcessSynchronizer is used.

  • object_codec_class (None) – Set the numcodec object codec class to be used to encode objects.Use numcodecs.pickles.Pickle by default.

  • storage_options (dict) – Zarr storage options to read remote folders

  • force_overwrite (bool) – force overwriting existing object when in ‘w’ mode. The existing file or directory will be deleted when before opening (even if the object is not Zarr, e.g,. an HDF5 file)

static can_read(path)

Determines whether a given path is readable by this HDMFIO class

static generate_dataset_html(dataset)

Generates an HTML representation for a dataset for the ZarrIO class.

This method extracts metadata from a Zarr array using its info_items() method and formats it as an HTML table for display in Jupyter notebooks and other HTML-based interfaces.

Parameters:

dataset (zarr.core.Array) – The Zarr array for which to generate an HTML representation

Returns:

HTML representation of the dataset

Return type:

str

property path

The path to the Zarr file as set by the user

property abspath

The absolute path to the Zarr file

property synchronizer
property object_codec_class
property mode

The mode specified by the user when creating the ZarrIO instance.

NOTE: The Zarr library may not honor the mode. E.g., DirectoryStore in Zarr uses append mode and does not allow setting a file to read-only mode.

open()

Open the Zarr file

close()

Close the Zarr file

is_remote()

Return True if the file is remote, False otherwise

classmethod load_namespaces(namespace_catalog, path=None, file=None, storage_options=None, namespaces=None)

Load cached namespaces from a file.

Parameters:
Returns:

dict mapping the names of the loaded namespaces to a dict mapping included namespace names and the included data types

Return type:

dict

load_namespaces_io(namespace_catalog, namespaces=None)

Load cached namespaces from this ZarrIO object itself.

Parameters:
  • namespace_catalog (NamespaceCatalog or TypeMap) – the NamespaceCatalog or TypeMap to load namespaces into

  • namespaces (list) – the namespaces to load

Returns:

dict mapping the names of the loaded namespaces to a dict mapping included namespace names and the included data types

Return type:

dict

write(container, cache_spec=True, link_data=True, exhaust_dci=True, number_of_jobs=1, max_threads_per_process=None, multiprocessing_context=None, consolidate_metadata=True)

Overwrite the write method to add support for caching the specification and parallelization.

Parameters:
  • container (Container) – the Container object to write

  • cache_spec (bool) – cache specification to file

  • link_data (bool) – If not specified otherwise link (True) or copy (False) Datasets

  • exhaust_dci (bool) – exhaust DataChunkIterators one at a time. If False, add them to the internal queue self.__dci_queue and exhaust them concurrently at the end

  • number_of_jobs (int) – Number of jobs to use in parallel during write (only works with GenericDataChunkIterator-wrapped datasets).

  • max_threads_per_process (int) – Limits the number of threads used by each process. The default is None (no limits).

  • multiprocessing_context (str) – Context for multiprocessing. It can be None (default), ‘fork’ or ‘spawn’. Note that ‘fork’ is only available on UNIX systems (not Windows).

  • consolidate_metadata (bool) – Consolidate metadata into a single .zmetadata file in the root group to accelerate read.

export(src_io, container=None, write_args={}, clear_cache=False, cache_spec=True, number_of_jobs=1, max_threads_per_process=None, multiprocessing_context=None, consolidate_metadata=True)

Export data read from a file from any backend to Zarr. See hdmf.backends.io.HDMFIO.export() for more details.

Parameters:
  • src_io (HDMFIO) – the HDMFIO object for reading the data to export

  • container (Container) – the Container object to export. If None, then the entire contents of the HDMFIO object will be exported

  • write_args (dict) – arguments to pass to write_builder()

  • clear_cache (bool) – whether to clear the build manager cache

  • cache_spec (bool) – whether to cache the specification to file

  • number_of_jobs (int) – Number of jobs to use in parallel during write (only works with GenericDataChunkIterator-wrapped datasets).

  • max_threads_per_process (int) – Limits the number of threads used by each process. The default is None (no limits).

  • multiprocessing_context (str) – Context for multiprocessing. It can be None (default), ‘fork’ or ‘spawn’. Note that ‘fork’ is only available on UNIX systems (not Windows).

  • consolidate_metadata (bool) – Consolidate metadata into a single .zmetadata file in the root group to accelerate read.

get_written(builder, check_on_disk=False)

Return True if this builder has been written to (or read from) disk by this IO object, False otherwise.

Parameters:
  • builder (Builder) – Builder object to get the written flag for

  • check_on_disk (bool) – Check that the builder has been physically written to disk not just flagged as written by this I/O backend

Returns:

True if the builder is found in self._written_builders using the builder ID, False otherwise. If check_on_disk is enabled then the function cals get_builder_exists_on_disk in addition to verify that the builder has indeed been written to disk.

get_builder_exists_on_disk(builder)

Convenience function to check whether a given builder exists on disk in this Zarr file.

Parameters:

builder (Builder) – The builder of interest

get_builder_disk_path(builder, filepath=None)
Parameters:
  • builder (Builder) – The builder of interest

  • filepath (str) – The path to the Zarr file or None for this file

write_builder(builder, link_data=True, exhaust_dci=True, export_source=None, consolidate_metadata=True)

Write a builder to disk.

Parameters:
  • builder (GroupBuilder) – the GroupBuilder object representing the NWBFile

  • link_data (bool) – If not specified otherwise link (True) or copy (False) Zarr Datasets

  • exhaust_dci (bool) – Exhaust DataChunkIterators one at a time. If False, add them to the internal queue self.__dci_queue and exhaust them concurrently at the end

  • export_source (str) – The source of the builders when exporting

  • consolidate_metadata (bool) – Consolidate metadata into a single .zmetadata file in the root group to accelerate read.

write_group(parent, builder, link_data=True, exhaust_dci=True, export_source=None)

Write a GroupBuider to file

Parameters:
  • parent (Group) – the parent Zarr object

  • builder (GroupBuilder) – the GroupBuilder to write

  • link_data (bool) – If not specified otherwise link (True) or copy (False) Zarr Datasets

  • exhaust_dci (bool) – exhaust DataChunkIterators one at a time. If False, add them to the internal queue self.__dci_queue and exhaust them concurrently at the end

  • export_source (str) – The source of the builders when exporting

Returns:

the Group that was created

Return type:

Group

write_attributes(obj, attributes)

Set (i.e., write) the attributes on a given Zarr Group or Array.

Parameters:
  • obj (Group or Array) – the Zarr object to add attributes to

  • attributes (dict) – a dict containing the attributes on the Group or Dataset, indexed by attribute name

static get_zarr_parent_path(zarr_object)

Get the absolute Unix path to the parent of a zarr_object from the root of the Zarr file :param zarr_object: Object for which we are looking up the path :type zarr_object: Zarr Group or Array :return: String with the path

resolve_ref(zarr_ref)

Get the full path to the object linked to by the zarr reference

The function only constructs the links to the targe object, but it does not check if the object exists

Parameters:

zarr_ref – Dict with source and path keys or a ZarrReference object

Returns:

  1. name of the target object

  2. the target zarr object within the target file

Parameters:
  • parent (Group) – the parent Zarr object

  • builder (LinkBuilder) – the LinkBuilder to write

  • export_source (str) – The source of the builders when exporting

write_dataset(parent, builder, link_data=True, exhaust_dci=True, force_data=None, export_source=None)
Parameters:
  • parent (Group) – the parent Zarr object

  • builder (DatasetBuilder) – the DatasetBuilder to write

  • link_data (bool) – If not specified otherwise link (True) or copy (False) Zarr Datasets

  • exhaust_dci (bool) – exhaust DataChunkIterators one at a time. If False, add them to the internal queue self.__dci_queue and exhaust them concurrently at the end

  • force_data (None) – Used internally to force the data being used when we have to load the data

  • export_source (str) – The source of the builders when exporting

Returns:

the Zarr array that was created

Return type:

Array

classmethod get_type(data)
read_builder()
Returns:

a GroupBuilder representing the NWB Dataset

Return type:

GroupBuilder

get_container(zarr_obj)

Get the container for the corresponding Zarr Group or Dataset

Raises:

ValueError – When no builder has been constructed yet for the given h5py object

Parameters:

zarr_obj (Array or Group) – the Zarr object to the corresponding Container/Data object for

get_builder(zarr_obj)

Get the builder for the corresponding Group or Dataset

Raises:

ValueError – When no builder has been constructed

Parameters:

zarr_obj (Array or Group) – the Zarr object to the corresponding Builder object for

class hdmf_zarr.ZarrDataIO(data, chunks=None, fillvalue=None, compressor=None, filters=None, link_data=False)

Bases: DataIO

Wrap data arrays for write via ZarrIO to customize I/O behavior, such as compression and chunking for data arrays.

Parameters:
  • data (ndarray or list or tuple or Array or Iterable) – the data to be written. NOTE: If an zarr.Array is used, all other settings but link_data will be ignored as the dataset will either be linked to or copied as is in ZarrIO.

  • chunks (list or tuple) – Chunk shape

  • fillvalue (None) – Value to be returned when reading uninitialized parts of the dataset

  • compressor (Codec or bool) – Zarr compressor filter to be used. Set to True to use Zarr default. Set to False to disable compression)

  • filters (list or tuple) – One or more Zarr-supported codecs used to transform data prior to compression.

  • link_data (bool) – If data is an zarr.Array should it be linked to or copied. NOTE: This parameter is only allowed if data is an zarr.Array

Only applies to zarr.Array type data

Type:

Bool indicating should it be linked to or copied. NOTE

property io_settings: dict

Dict with the io settings to use

get_io_params() dict

Returns a dict with the I/O parameters specified in this DataIO.

static from_h5py_dataset(h5dataset, **kwargs)

Factory method to create a ZarrDataIO instance from a h5py.Dataset. The ZarrDataIO object wraps the h5py.Dataset and the io filter settings are inferred from filters used in h5py such that the options in Zarr match (if possible) the options used in HDF5.

Parameters:
  • dataset (h5py.Dataset) – h5py.Dataset object that should be wrapped

  • kwargs – Other keyword arguments to pass to ZarrDataIO.__init__

Returns:

ZarrDataIO object wrapping the dataset

static hdf5_to_zarr_filters(h5dataset) list

From the given h5py.Dataset infer the corresponding filters to use in Zarr

static is_h5py_dataset(obj)

Check if the object is an instance of h5py.Dataset without requiring import of h5py

class hdmf_zarr.NWBZarrIO(path, mode, manager=None, synchronizer=None, object_codec_class=None, storage_options=None, force_overwrite=False, load_namespaces=True, extensions=None)

Bases: ZarrIO

IO backend for PyNWB for writing NWB files

This class is similar to the NWBHDF5IO class in PyNWB. The main purpose of this class is to perform default setup for BuildManager, loading or namespaces etc., in the context of the NWB format.

Parameters:
  • path (str or Path or DirectoryStore or TempStore or NestedDirectoryStore) – the path to the Zarr file or a supported Zarr store

  • mode (str) – the mode to open the Zarr file with, one of (“w”, “r”, “r+”, “a”, “r-“). the mode r- is used to force open without consolidated metadata in read only mode.

  • manager (BuildManager) – the BuildManager to use for I/O

  • synchronizer (ProcessSynchronizer or ThreadSynchronizer or bool) – Zarr synchronizer to use for parallel I/O. If set to True a ProcessSynchronizer is used.

  • object_codec_class (None) – Set the numcodec object codec class to be used to encode objects.Use numcodecs.pickles.Pickle by default.

  • storage_options (dict) – Zarr storage options to read remote folders

  • force_overwrite (bool) – force overwriting existing object when in ‘w’ mode. The existing file or directory will be deleted when before opening (even if the object is not Zarr, e.g,. an HDF5 file)

  • load_namespaces (bool) – whether or not to load cached namespaces from given path - not applicable in write mode

  • extensions (str or TypeMap or list) – a path to a namespace, a TypeMap, or a list consisting paths to namespaces and TypeMaps

export(src_io, nwbfile=None, write_args={})
Parameters:
  • src_io (HDMFIO) – the HDMFIO object for reading the data to export

  • nwbfile (NWBFile) – the NWBFile object to export. If None, then the entire contents of src_io will be exported

  • write_args (dict) – arguments to pass to write_builder()

static read_nwb(path)

Helper factory method for reading an NWB file and return the NWBFile object

Parameters:

path (str or Path or DirectoryStore or TempStore or NestedDirectoryStore) – the path to the Zarr file or a supported Zarr store