hdmf_zarr.utils module

Collection of utility I/O classes for the ZarrIO backend store.

class hdmf_zarr.utils.ZarrIODataChunkIteratorQueue(number_of_jobs: int = 1, max_threads_per_process: None | int = None, multiprocessing_context: None | Literal['fork', 'spawn'] = None)

Bases: deque

Helper class used by ZarrIO to manage the write for DataChunkIterators Each queue element must be a tuple of two elements: 1) the dataset to write to and 2) the AbstractDataChunkIterator with the data :param number_of_jobs: The number of jobs used to write the datasets. The default is 1. :type number_of_jobs: integer :param max_threads_per_process: Limits the number of threads used by each process. The default is None (no limits). :type max_threads_per_process: integer or None :param multiprocessing_context: Context for multiprocessing. It can be None (default), “fork” or “spawn”. Note that “fork” is only available on UNIX systems (not Windows). :type multiprocessing_context: string or None

exhaust_queue()

Read and write from any queued DataChunkIterators.

Operates in a round-robin fashion for a single job. Operates on a single dataset at a time with multiple jobs.

append(dataset, data): Append a value to the queue :param dataset: The dataset where the DataChunkIterator is written to :type dataset: Zarr array :param data: DataChunkIterator with the data to be written :type data: AbstractDataChunkIterator

static initializer_wrapper(operation_to_run: callable, process_initialization: callable, initialization_arguments: Iterable, max_threads_per_process: int | None = None)

Needed as a part of a bug fix with cloud memory leaks discovered by SpikeInterface team.

Recommended fix is to have global wrappers for the working initializer that limits the threads used per process.

static function_wrapper(args: Tuple[str, str, AbstractDataChunkIterator, Tuple[slice, ...]])

Needed as a part of a bug fix with cloud memory leaks discovered by SpikeInterface team.

Recommended fix is to have a global wrapper for the executor.map level.

class hdmf_zarr.utils.ZarrSpecWriter(group)

Bases: SpecWriter

Class used to write format specs to Zarr

Parameters:: group (Group) – the Zarr file to write specs to

static stringify(spec): Converts a spec into a JSON string to write to a dataset

write_spec(spec, path): Write a spec to the given path

write_namespace(namespace, path): Write a namespace to the given path

class hdmf_zarr.utils.ZarrSpecReader(group)

Bases: SpecReader

Class to read format specs from Zarr

Parameters:: group (Group) – the Zarr file to read specs from

read_spec(spec_path): Read a spec from the given path

read_namespace(ns_path): Read a namespace from the given path

class hdmf_zarr.utils.ZarrDataIO(data, chunks=None, fillvalue=None, compressor=None, filters=None, link_data=False)

Bases: DataIO

Wrap data arrays for write via ZarrIO to customize I/O behavior, such as compression and chunking for data arrays.

Parameters:

data (ndarray or list or tuple or Array or Iterable) – the data to be written. NOTE: If an zarr.Array is used, all other settings but link_data will be ignored as the dataset will either be linked to or copied as is in ZarrIO.
chunks (list or tuple) – Chunk shape
fillvalue (None) – Value to be returned when reading uninitialized parts of the dataset
compressor (Codec or bool) – Zarr compressor filter to be used. Set to True to use Zarr default. Set to False to disable compression)
filters (list or tuple) – One or more Zarr-supported codecs used to transform data prior to compression.
link_data (bool) – If data is an zarr.Array should it be linked to or copied. NOTE: This parameter is only allowed if data is an zarr.Array

property link_data: bool

Only applies to zarr.Array type data

Type:: Bool indicating should it be linked to or copied. NOTE

property io_settings: dict: Dict with the io settings to use

get_io_params() → dict: Returns a dict with the I/O parameters specified in this DataIO.

static from_h5py_dataset(h5dataset, **kwargs)

Factory method to create a ZarrDataIO instance from a h5py.Dataset. The ZarrDataIO object wraps the h5py.Dataset and the io filter settings are inferred from filters used in h5py such that the options in Zarr match (if possible) the options used in HDF5.

Parameters:

dataset (h5py.Dataset) – h5py.Dataset object that should be wrapped
kwargs – Other keyword arguments to pass to ZarrDataIO.__init__

Returns:

ZarrDataIO object wrapping the dataset

static hdf5_to_zarr_filters(h5dataset) → list: From the given h5py.Dataset infer the corresponding filters to use in Zarr

static is_h5py_dataset(obj): Check if the object is an instance of h5py.Dataset without requiring import of h5py

class hdmf_zarr.utils.ZarrReference(source=None, path=None, object_id=None, source_object_id=None)

Bases: dict

Data structure to describe a reference to another container used with the ZarrIO backend

Parameters:

source (str) – Source of referenced object. Usually the relative path to the Zarr file containing the referenced object
path (str) – Path of referenced object within the source
object_id (str) – Object_id of the referenced object (if available)
source_object_id (str) – Object_id of the source (should always be available)

property source: str

property path: str

property object_id: str

property source_object_id: str