Share Zarr and NetCDF data hosted on EDITO data storage

There are many ways to manipulate and share files in EDITO, and you can use your favorite tools to do it. The following content describes a simple and quick way to share publicly some Zarr and NetCDF data hosted on your EDITO data storage, so that others can open them with the Python library xarray.

Share Zarr files

Upload a Zarr folder to your personal bucket

Either from the EDITO datalab UI or from where your Zarr data is by using your favorite tools, upload the Zarr folder to your personal bucket on EDITO, and name it foobar.zarr.

Here is an example using xarray.Dataset and s3fs:

from os import environ
from xarray import Dataset
import s3fs

fs = s3fs.S3FileSystem(
    client_kwargs={'endpoint_url': 'https://'+'minio.dive.edito.eu'},
    key = environ["AWS_ACCESS_KEY_ID"],
    secret = environ["AWS_SECRET_ACCESS_KEY"],
    token = environ["AWS_SESSION_TOKEN"]
)

dataset: Dataset = ... # a dataset containing values of a variable my_var over dimensions x and y

x_chunk: int = ... # maximum number of value in each chunk over dimension x
y_chunk: int = ... # maximum number of value in each chunk over dimension y
encoding = {'my_var': {'chunks': (x_chunk, y_chunk)}}
out_store = s3fs.S3Map(root=f"oidc-[YOUR-USERNAME]/foobar.zarr", s3=fs, create=True)
dataset.to_zarr(
    store=out_store,
    consolidated=True,
    mode='w',
    encoding=encoding
)

Make your data publicly accessible

To do that, either click on the “closed eye” aligned with the zarr folder in the datalab file browser, or use the mc anonymous command in a Jupyterlab terminal running on EDITO (click here to launch one), and open the terminal and run:

mc anonymous set download s3/oidc-[YOUR_USERNAME]/foobar.zarr

Open it with xarray

In a python environment (either on your machine or within an EDITO service like this one) in which xarray, dask and zarr are installed, open the python console/notebook and run:

import xarray
xarray.open_dataset("https://minio.dive.edito.eu/oidc-[YOUR-USERNAME]/foobar.zarr", engine="zarr")

Share a NetCDF file

Upload and make your foobar.nc NetCDF file publicly accessible like in the sections above, then in a python environment (either on your machine or within an EDITO service like this one) in which xarray and netCDF4 are installed, open the python console/notebook and run:

import xarray
xarray.open_dataset("https://minio.dive.edito.eu/oidc-[YOUR-USERNAME]/foobar.nc#mode=bytes", engine="netcdf4")