hdmf_zarr.backend module¶
Module with the Zarr-based I/O-backend for HDMF
- hdmf_zarr.backend.ROOT_NAME = 'root'¶
Name of the root builder for read/write
- hdmf_zarr.backend.SPEC_LOC_ATTR = '.specloc'¶
Reserved attribute storing the path to the Group where the schema for the file are cached
- hdmf_zarr.backend.DEFAULT_SPEC_LOC_DIR = 'specifications'¶
Default name of the group where specifications should be cached
- hdmf_zarr.backend.SUPPORTED_ZARR_STORES = (<class 'zarr.storage.DirectoryStore'>, <class 'zarr.storage.TempStore'>, <class 'zarr.storage.NestedDirectoryStore'>)¶
Tuple listing all Zarr storage backends supported by ZarrIO
- class hdmf_zarr.backend.ZarrIO(path, mode, manager=None, synchronizer=None, object_codec_class=None)¶
Bases:
HDMFIO
- Parameters:
path (
str
orDirectoryStore
orTempStore
orNestedDirectoryStore
) – the path to the Zarr file or a supported Zarr storemode (
str
) – the mode to open the Zarr file with, one of (“w”, “r”, “r+”, “a”, “w-“)manager (
BuildManager
) – the BuildManager to use for I/Osynchronizer (
ProcessSynchronizer
orThreadSynchronizer
orbool
) – Zarr synchronizer to use for parallel I/O. If set to True a ProcessSynchronizer is used.object_codec_class (None) – Set the numcodec object codec class to be used to encode objects.Use numcodecs.pickles.Pickle by default.
- static can_read(path)¶
Determines whether a given path is readable by this HDMFIO class
- property file¶
The Zarr zarr.hierarchy.Group (or zarr.core.Array) opened by the backend. May be None in case open has not been called yet, e.g., if no data has been read or written yet via this instance.
- property path¶
The path to the Zarr file as set by the use
- property abspath¶
The absolute path to the Zarr file
- property synchronizer¶
- property object_codec_class¶
- open()¶
Open the Zarr file
- close()¶
Close the Zarr file
- classmethod load_namespaces(namespace_catalog, path, namespaces=None)¶
Load cached namespaces from a file.
- Parameters:
namespace_catalog (
NamespaceCatalog
orTypeMap
) – the NamespaceCatalog or TypeMap to load namespaces intopath (
str
orDirectoryStore
orTempStore
orNestedDirectoryStore
) – the path to the Zarr file or a supported Zarr storenamespaces (
list
) – the namespaces to load
- write(container, cache_spec=True, link_data=True, exhaust_dci=True, number_of_jobs=1, max_threads_per_process=None, multiprocessing_context=None)¶
Overwrite the write method to add support for caching the specification and parallelization.
- Parameters:
container (
Container
) – the Container object to writecache_spec (
bool
) – cache specification to filelink_data (
bool
) – If not specified otherwise link (True) or copy (False) Datasetsexhaust_dci (
bool
) – exhaust DataChunkIterators one at a time. If False, add them to the internal queue self.__dci_queue and exhaust them concurrently at the endnumber_of_jobs (
int
) – Number of jobs to use in parallel during write (only works with GenericDataChunkIterator-wrapped datasets).max_threads_per_process (
int
) – Limits the number of threads used by each process. The default is None (no limits).multiprocessing_context (
str
) – Context for multiprocessing. It can be None (default), ‘fork’ or ‘spawn’. Note that ‘fork’ is only available on UNIX systems (not Windows).
- export(src_io, container=None, write_args={}, clear_cache=False, cache_spec=True, number_of_jobs=1, max_threads_per_process=None, multiprocessing_context=None)¶
- Export data read from a file from any backend to Zarr.
See
hdmf.backends.io.HDMFIO.export()
for more details.
- Parameters:
src_io (HDMFIO) – the HDMFIO object for reading the data to export
container (
Container
) – the Container object to export. If None, then the entire contents of the HDMFIO object will be exportedwrite_args (
dict
) – arguments to pass towrite_builder()
clear_cache (
bool
) – whether to clear the build manager cachecache_spec (
bool
) – whether to cache the specification to filenumber_of_jobs (
int
) – Number of jobs to use in parallel during write (only works with GenericDataChunkIterator-wrapped datasets).max_threads_per_process (
int
) – Limits the number of threads used by each process. The default is None (no limits).multiprocessing_context (
str
) – Context for multiprocessing. It can be None (default), ‘fork’ or ‘spawn’. Note that ‘fork’ is only available on UNIX systems (not Windows).
- get_written(builder, check_on_disk=False)¶
Return True if this builder has been written to (or read from) disk by this IO object, False otherwise.
- Parameters:
builder (Builder) – Builder object to get the written flag for
check_on_disk (bool) – Check that the builder has been physically written to disk not just flagged as written by this I/O backend
- Returns:
True if the builder is found in self._written_builders using the builder ID, False otherwise. If check_on_disk is enabled then the function cals get_builder_exists_on_disk in addtion to verify that the builder has indeed been written to disk.
- get_builder_exists_on_disk(builder)¶
Convenience function to check whether a given builder exists on disk in this Zarr file.
- Parameters:
builder (
Builder
) – The builder of interest
- get_builder_disk_path(builder, filepath=None)¶
- write_builder(builder, link_data=True, exhaust_dci=True, export_source=None)¶
Write a builder to disk.
- Parameters:
builder (
GroupBuilder
) – the GroupBuilder object representing the NWBFilelink_data (
bool
) – If not specified otherwise link (True) or copy (False) Zarr Datasetsexhaust_dci (
bool
) – Exhaust DataChunkIterators one at a time. If False, add them to the internal queue self.__dci_queue and exhaust them concurrently at the endexport_source (
str
) – The source of the builders when exporting
- write_group(parent, builder, link_data=True, exhaust_dci=True, export_source=None)¶
Write a GroupBuider to file
- Parameters:
parent (
Group
) – the parent Zarr objectbuilder (
GroupBuilder
) – the GroupBuilder to writelink_data (
bool
) – If not specified otherwise link (True) or copy (False) Zarr Datasetsexhaust_dci (
bool
) – exhaust DataChunkIterators one at a time. If False, add them to the internal queue self.__dci_queue and exhaust them concurrently at the endexport_source (
str
) – The source of the builders when exporting
- Returns:
the Group that was created
- Return type:
Group
- write_attributes(obj, attributes, export_source=None)¶
Set (i.e., write) the attributes on a given Zarr Group or Array.
- static get_zarr_paths(zarr_object)¶
For a Zarr object find 1) the path to the main zarr file it is in and 2) the path to the object within the file :param zarr_object: Object for which we are looking up the path :type zarr_object: Zarr Group or Array :return: Tuple of two string with: 1) path of the Zarr file and 2) full path within the zarr file to the object
- static get_zarr_parent_path(zarr_object)¶
Get the location of the parent of a zarr_object within the file :param zarr_object: Object for which we are looking up the path :type zarr_object: Zarr Group or Array :return: String with the path
- static is_zarr_file(path)¶
Check if the given path defines a Zarr file :param path: Full path to main directory :return: Bool
- resolve_ref(zarr_ref)¶
Get the full path to the object linked to by the zarr reference
The function only constructs the links to the targe object, but it does not check if the object exists
- Parameters:
zarr_ref – Dict with source and path keys or a ZarrReference object
- Returns:
name of the target object
the target zarr object within the target file
- write_link(parent, builder)¶
- Parameters:
parent (
Group
) – the parent Zarr objectbuilder (
LinkBuilder
) – the LinkBuilder to write
- write_dataset(parent, builder, link_data=True, exhaust_dci=True, force_data=None, export_source=None)¶
- Parameters:
parent (
Group
) – the parent Zarr objectbuilder (
DatasetBuilder
) – the DatasetBuilder to writelink_data (
bool
) – If not specified otherwise link (True) or copy (False) Zarr Datasetsexhaust_dci (
bool
) – exhaust DataChunkIterators one at a time. If False, add them to the internal queue self.__dci_queue and exhaust them concurrently at the endforce_data (None) – Used internally to force the data being used when we have to load the data
export_source (
str
) – The source of the builders when exporting
- Returns:
the Zarr array that was created
- Return type:
Array
- classmethod get_type(data)¶
- read_builder()¶
- Returns:
a GroupBuilder representing the NWB Dataset
- Return type:
GroupBuilder
- get_container(zarr_obj)¶
Get the container for the corresponding Zarr Group or Dataset
- raises ValueError:
When no builder has been constructed yet for the given h5py object