.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/plot_s3_streaming.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_tutorials_plot_s3_streaming.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_plot_s3_streaming.py:


.. _s3_streaming_tutorial:

Streaming NWB Zarr files from S3
=================================

This tutorial demonstrates how to stream NWB files stored in Zarr format from Amazon S3 cloud storage.
Streaming from S3 allows you to access large datasets without downloading the entire file, which is
particularly useful for exploring data, reading specific subsets, or working with datasets too large
for local storage.

Prerequisites
-------------

To stream data from S3, you need to install the optional dependencies ``fsspec`` and ``s3fs``:

.. code-block:: bash

    pip install hdmf-zarr[full]

Or install the dependencies separately:

.. code-block:: bash

    pip install fsspec s3fs

.. GENERATED FROM PYTHON SOURCE LINES 28-29

.. code-block:: Python
   :dedent: 1


.. GENERATED FROM PYTHON SOURCE LINES 31-41

Streaming from a Public S3 Bucket
----------------------------------

To read an NWB Zarr file from a public S3 bucket, you can provide the S3 URL
to :py:class:`~hdmf_zarr.nwb.NWBZarrIO`. For HTTPS URLs (``https://``), no
additional configuration is needed. For ``s3://`` protocol URLs, you need to
specify ``storage_options=dict(anon=True)`` to enable anonymous access.

Here we demonstrate reading from a public dataset in the DANDI Archive using
an HTTPS URL:

.. GENERATED FROM PYTHON SOURCE LINES 41-58

.. code-block:: Python


    from hdmf_zarr import NWBZarrIO

    # Public S3 URL from DANDI Archive (DANDISET 000719)
    # Path: sub-R6_ses-20200206T210000_behavior+ophys_DirectoryStore_rechunked.nwb.zarr
    s3_url = "https://dandiarchive.s3.amazonaws.com/zarr/c8c6b848-fbc6-4f58-85ff-e3f2618ee983/"

    # Open the file from S3
    try:
        with NWBZarrIO(s3_url, mode="r") as io:
            nwbfile = io.read()
            print(f"Session Description: {nwbfile.session_description}")
            print(f"Identifier: {nwbfile.identifier}")
            print(f"Subject ID: {nwbfile.subject.subject_id if nwbfile.subject else 'N/A'}")
    except Exception as e:
        print(f"Note: Could not access S3 file (network access may be required): {e}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Session Description: TwoTower_foraging_002_002
    Identifier: 7208f856-f527-479f-973d-e6e72326a8ea
    Subject ID: R6


.. GENERATED FROM PYTHON SOURCE LINES 59-69

.. note::

    For S3 URLs with the ``s3://`` protocol, you need to provide the ``storage_options``
    parameter explicitly. For example:

    .. code-block:: python

        s3_path = "s3://your-bucket/path/to/file.nwb.zarr/"
        with NWBZarrIO(s3_path, mode="r", storage_options=dict(anon=True)) as io:
            nwbfile = io.read()

.. GENERATED FROM PYTHON SOURCE LINES 71-103

Accessing Private S3 Buckets
-----------------------------

To access files in private S3 buckets, you need to provide AWS credentials.
There are several ways to do this:

**Option 1: Use AWS credentials from environment or ~/.aws/credentials**

If your AWS credentials are configured via environment variables
(``AWS_ACCESS_KEY_ID``, ``AWS_SECRET_ACCESS_KEY``) or in the AWS credentials file,
you can simply omit the ``anon=True`` option:

.. code-block:: python

    with NWBZarrIO(s3_url, mode="r") as io:
        nwbfile = io.read()

**Option 2: Provide credentials explicitly**

You can also provide credentials directly via the ``storage_options`` parameter:

.. code-block:: python

    storage_options = {
        'key': 'YOUR_ACCESS_KEY_ID',
        'secret': 'YOUR_SECRET_ACCESS_KEY',
    }
    with NWBZarrIO(s3_url, mode="r", storage_options=storage_options) as io:
        nwbfile = io.read()

**Note:** Never hardcode credentials in your scripts. Use environment variables
or AWS credentials files instead.

.. GENERATED FROM PYTHON SOURCE LINES 105-119

The Importance of Consolidated Metadata
----------------------------------------

Zarr files store metadata for each array and group in separate files. When reading
from S3, each metadata access requires a separate network request, which can
significantly slow down file opening and data access.

**Consolidated metadata** addresses this by storing all metadata in a single
``.zmetadata`` file at the root of the Zarr store. This helps improve read performance
by reducing the number of S3 requests needed to open a file.

By default, :py:class:`~hdmf_zarr.nwb.NWBZarrIO` consolidates metadata when
writing files, and automatically uses consolidated metadata when available
during read operations.

.. GENERATED FROM PYTHON SOURCE LINES 121-148

Generating and Updating Consolidated Metadata
----------------------------------------------

When you create or modify a Zarr file, you should consolidate the metadata
to ensure optimal performance for readers, especially those streaming from S3.
By default, :py:class:`~hdmf_zarr.nwb.NWBZarrIO` automatically consolidates
metadata when writing files. See the
:py:meth:`~hdmf_zarr.nwb.NWBZarrIO.write` method's ``consolidate_metadata``
parameter for more details.

.. note::

    If you modify a Zarr file after creation (e.g., by directly using zarr APIs),
    you need to manually update the consolidated metadata:

    .. code-block:: python

        import zarr
        path = "myfile.nwb.zarr"
        zarr.consolidate_metadata(path)

    This ensures that the ``.zmetadata`` file reflects the current state of the
    Zarr store. This step is critical before uploading modified files to S3.

    For more details on consolidated metadata, see the
    :zarr-docs:`Zarr documentation <tutorial.html#consolidating-metadata>` and the
    :ref:`sec-zarr-storage` section of the hdmf-zarr documentation.

.. GENERATED FROM PYTHON SOURCE LINES 150-155

Using the Convenience Method
----------------------------

:py:class:`~hdmf_zarr.nwb.NWBZarrIO` provides a convenience static method
:py:meth:`~hdmf_zarr.nwb.NWBZarrIO.read_nwb` for quick read access:

.. GENERATED FROM PYTHON SOURCE LINES 155-163

.. code-block:: Python


    # Read file directly using the convenience static method
    try:
        nwbfile = NWBZarrIO.read_nwb(s3_url)
        print(f"Session Start Time: {nwbfile.session_start_time}")
    except Exception as e:
        print(f"Note: Could not access S3 file (network access may be required): {e}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Session Start Time: 2020-02-06 21:00:00-08:00


.. GENERATED FROM PYTHON SOURCE LINES 164-169

.. note::

    PyNWB also provides a more general :py:func:`~pynwb.NWBHDF5IO.read` method that
    can automatically detect and use the appropriate IO class (HDF5 or Zarr) based
    on the file path or URL.

.. GENERATED FROM PYTHON SOURCE LINES 171-192

Best Practices for S3 Streaming
--------------------------------

1. **Always use consolidated metadata** for files stored on S3. This is the default
   when writing with :py:class:`~hdmf_zarr.nwb.NWBZarrIO`.

2. **Use HTTPS URLs** (``https://``) for public buckets when possible, as they
   work without additional configuration.

3. **For private buckets**, configure AWS credentials properly using environment
   variables or the AWS credentials file rather than hardcoding them.

4. **After modifying Zarr files**, always run ``zarr.consolidate_metadata(path)``
   before uploading to S3.

5. **Test your S3 URLs** to ensure they are accessible before sharing them with
   collaborators.

6. **Consider network costs**: While streaming is convenient, repeated access to
   the same data may be less efficient than downloading the file once for local
   access.


.. _sphx_glr_download_tutorials_plot_s3_streaming.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_s3_streaming.ipynb <plot_s3_streaming.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_s3_streaming.py <plot_s3_streaming.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_s3_streaming.zip <plot_s3_streaming.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_