From 631fc640292102aab63b412290d5dd5d91942fbc Mon Sep 17 00:00:00 2001 From: Ewan Short Date: Sat, 13 Dec 2025 18:59:03 +1100 Subject: [PATCH 1/6] amend zarr encoding specification doc GH8749 --- doc/internals/zarr-encoding-spec.rst | 13 +++++++------ xarray/backends/zarr.py | 3 +++ 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/doc/internals/zarr-encoding-spec.rst b/doc/internals/zarr-encoding-spec.rst index c34c2f21ddd..d016431c1c7 100644 --- a/doc/internals/zarr-encoding-spec.rst +++ b/doc/internals/zarr-encoding-spec.rst @@ -36,6 +36,9 @@ When writing data to Zarr V2, Xarray sets this attribute on all variables based variable dimensions. This attribute is visible when accessing arrays directly with zarr-python. +**NCZarr Format:** +Xarray uses the ``dimrefs`` field in the custom ``.zarray`` metadata file. Note Xarray can not write NCZarr groups. + **Zarr V3 Format:** Xarray uses the native ``dimension_names`` field in the array metadata. This is part of the official Zarr V3 specification and is not stored as a regular attribute. @@ -62,14 +65,12 @@ Compatibility and Reading Because of these encoding choices, Xarray cannot read arbitrary Zarr arrays, but only Zarr data with valid dimension metadata. Xarray supports: -- Zarr V2 arrays with ``_ARRAY_DIMENSIONS`` attributes -- Zarr V3 arrays with ``dimension_names`` metadata -- `NCZarr `_ format +1. Zarr V3 arrays with ``dimension_names`` metadata +2. Zarr V2 arrays with ``_ARRAY_DIMENSIONS`` attributes +3. `NCZarr `_ format (dimension names are defined in the ``.zarray`` file) -After decoding the dimension information and assigning the variable dimensions, -Xarray proceeds to [optionally] decode each variable using its standard CF decoding -machinery used for NetCDF data. +When reading a Zarr group, Xarray checks each of these three conventions, in the order given above. After decoding the dimension information and assigning the variable dimensions, Xarray proceeds to [optionally] decode each variable using its standard CF decoding machinery used for NetCDF data. Finally, it's worth noting that Xarray writes (and attempts to read) "consolidated metadata" by default (the ``.zmetadata`` file), which is another diff --git a/xarray/backends/zarr.py b/xarray/backends/zarr.py index fe004c212b6..b37989e6bbd 100644 --- a/xarray/backends/zarr.py +++ b/xarray/backends/zarr.py @@ -355,6 +355,9 @@ def _determine_zarr_chunks(enc_chunks, var_chunks, ndim, name): def _get_zarr_dims_and_attrs(zarr_obj, dimension_key, try_nczarr): + # Check for attributes and dimension name metadata as discussed in the Zarr encoding + # specification https://docs.xarray.dev/en/stable/internals/zarr-encoding-spec.html + # Zarr V3 explicitly stores the dimension names in the metadata try: # if this exists, we are looking at a Zarr V3 array From 86c925141b3e35806ba53efee4faf1f97b2d64e9 Mon Sep 17 00:00:00 2001 From: Ewan Short Date: Sat, 13 Dec 2025 22:45:33 +1100 Subject: [PATCH 2/6] additional edits to zarr encoding spec --- doc/internals/zarr-encoding-spec.rst | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/doc/internals/zarr-encoding-spec.rst b/doc/internals/zarr-encoding-spec.rst index d016431c1c7..5ec7e38b933 100644 --- a/doc/internals/zarr-encoding-spec.rst +++ b/doc/internals/zarr-encoding-spec.rst @@ -25,9 +25,9 @@ the name of each array's dimensions. However, starting with Zarr v3, the NetCDF data model in Zarr. Dimension Encoding in Zarr Formats ------------------------------------ +----------------------------------------------- -Xarray encodes array dimensions differently depending on the Zarr format version: +Xarray encodes/decodes array dimensions differently depending on the Zarr format version: **Zarr V2 Format:** Xarray uses a special Zarr array attribute: ``_ARRAY_DIMENSIONS``. The value of this @@ -36,9 +36,6 @@ When writing data to Zarr V2, Xarray sets this attribute on all variables based variable dimensions. This attribute is visible when accessing arrays directly with zarr-python. -**NCZarr Format:** -Xarray uses the ``dimrefs`` field in the custom ``.zarray`` metadata file. Note Xarray can not write NCZarr groups. - **Zarr V3 Format:** Xarray uses the native ``dimension_names`` field in the array metadata. This is part of the official Zarr V3 specification and is not stored as a regular attribute. @@ -46,9 +43,9 @@ When accessing arrays with zarr-python, this information is available in the arr metadata but not in the attributes dictionary. When reading a Zarr group, Xarray looks for dimension information in the appropriate -location based on the format version, raising an error if it can't be found. The +location based on the inferred format version, raising an error if it can't be found. The dimension information is used to define the variable dimension names and then -(for Zarr V2) removed from the attributes dictionary returned to the user. +(for Zarr V2) is removed from the attributes dictionary returned to the user. CF Conventions -------------- @@ -62,15 +59,14 @@ used to describe metadata in NetCDF and Zarr. Compatibility and Reading ------------------------- -Because of these encoding choices, Xarray cannot read arbitrary Zarr arrays, but only -Zarr data with valid dimension metadata. Xarray supports: +Because of these encoding choices, Xarray cannot read arbitrary Zarr groups, but only +Zarr groups with valid dimension metadata. Xarray supports: -1. Zarr V3 arrays with ``dimension_names`` metadata -2. Zarr V2 arrays with ``_ARRAY_DIMENSIONS`` attributes -3. `NCZarr `_ format - (dimension names are defined in the ``.zarray`` file) +1. Zarr V3 groups with ``dimension_names`` metadata +2. Zarr V2 groups with ``_ARRAY_DIMENSIONS`` attributes +3. `NCZarr `_ format (dimension names are defined in the ``dimrefs`` field in the custom ``.zarray`` file) -When reading a Zarr group, Xarray checks each of these three conventions, in the order given above. After decoding the dimension information and assigning the variable dimensions, Xarray proceeds to [optionally] decode each variable using its standard CF decoding machinery used for NetCDF data. +Xarray checks each of these three conventions, in the order given above, when looking for dimension name metadata. Note that while Xarray can read NCZarr groups, it currently does not write NCZarr groups. After decoding the dimension information and assigning the variable dimensions, Xarray proceeds to [optionally] decode each variable using its standard CF decoding machinery used for NetCDF data. Finally, it's worth noting that Xarray writes (and attempts to read) "consolidated metadata" by default (the ``.zmetadata`` file), which is another From a54cfa82d675b2aaed74f512272a9ec529037a02 Mon Sep 17 00:00:00 2001 From: Ewan Short Date: Sat, 13 Dec 2025 22:49:53 +1100 Subject: [PATCH 3/6] minor doc edits for consistency --- doc/internals/zarr-encoding-spec.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals/zarr-encoding-spec.rst b/doc/internals/zarr-encoding-spec.rst index 5ec7e38b933..8b7a59b791f 100644 --- a/doc/internals/zarr-encoding-spec.rst +++ b/doc/internals/zarr-encoding-spec.rst @@ -25,7 +25,7 @@ the name of each array's dimensions. However, starting with Zarr v3, the NetCDF data model in Zarr. Dimension Encoding in Zarr Formats ------------------------------------------------ +----------------------------------- Xarray encodes/decodes array dimensions differently depending on the Zarr format version: From ff5502e3a656dc5de4cd5c79378b6ef667070744 Mon Sep 17 00:00:00 2001 From: Ewan Short Date: Sun, 14 Dec 2025 10:09:55 +1100 Subject: [PATCH 4/6] amend whats-new.rst --- doc/whats-new.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 7e3badc7143..2e0ce3f1b34 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -29,6 +29,9 @@ Bug Fixes - Ensure that ``keep_attrs='drop'`` and ``keep_attrs=False`` remove attrs from result, even when there is only one xarray object given to ``apply_ufunc`` (:issue:`10982` :pull:`10997`). By `Julia Signell `_. +- Slightly amend `Xarray's Zarr Encoding Specification doc `_ for clarity, and provide a code comment in ``xarray.backends.zarr._get_zarr_dims_and_attrs`` referencing the doc (:issue:`8749`). + By `Ewan Short `_. + Documentation ~~~~~~~~~~~~~ From 1ae0da3e21e93afafe6efdc469b6f0e743075a78 Mon Sep 17 00:00:00 2001 From: Ewan Short Date: Sun, 14 Dec 2025 18:57:19 +1100 Subject: [PATCH 5/6] add pull number to whats-new.rst --- doc/whats-new.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 2e0ce3f1b34..aa8391b8d4f 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -29,7 +29,7 @@ Bug Fixes - Ensure that ``keep_attrs='drop'`` and ``keep_attrs=False`` remove attrs from result, even when there is only one xarray object given to ``apply_ufunc`` (:issue:`10982` :pull:`10997`). By `Julia Signell `_. -- Slightly amend `Xarray's Zarr Encoding Specification doc `_ for clarity, and provide a code comment in ``xarray.backends.zarr._get_zarr_dims_and_attrs`` referencing the doc (:issue:`8749`). +- Slightly amend `Xarray's Zarr Encoding Specification doc `_ for clarity, and provide a code comment in ``xarray.backends.zarr._get_zarr_dims_and_attrs`` referencing the doc (:issue:`8749` :pull:`11013`). By `Ewan Short `_. From 0b067d9fa77229f2852b8ca756e3d7a68ea86121 Mon Sep 17 00:00:00 2001 From: Ewan Short Date: Mon, 15 Dec 2025 13:03:27 +1100 Subject: [PATCH 6/6] fix incorrect usage of 'groups' --- doc/internals/zarr-encoding-spec.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/internals/zarr-encoding-spec.rst b/doc/internals/zarr-encoding-spec.rst index 8b7a59b791f..7bbf8ab3bd4 100644 --- a/doc/internals/zarr-encoding-spec.rst +++ b/doc/internals/zarr-encoding-spec.rst @@ -27,7 +27,7 @@ NetCDF data model in Zarr. Dimension Encoding in Zarr Formats ----------------------------------- -Xarray encodes/decodes array dimensions differently depending on the Zarr format version: +Xarray encodes array dimensions differently depending on the Zarr format version: **Zarr V2 Format:** Xarray uses a special Zarr array attribute: ``_ARRAY_DIMENSIONS``. The value of this @@ -60,10 +60,10 @@ Compatibility and Reading ------------------------- Because of these encoding choices, Xarray cannot read arbitrary Zarr groups, but only -Zarr groups with valid dimension metadata. Xarray supports: +Zarr groups containing arrays with valid dimension metadata. Xarray supports: -1. Zarr V3 groups with ``dimension_names`` metadata -2. Zarr V2 groups with ``_ARRAY_DIMENSIONS`` attributes +1. Zarr V3 arrays with ``dimension_names`` metadata +2. Zarr V2 arrays with ``_ARRAY_DIMENSIONS`` attributes 3. `NCZarr `_ format (dimension names are defined in the ``dimrefs`` field in the custom ``.zarray`` file) Xarray checks each of these three conventions, in the order given above, when looking for dimension name metadata. Note that while Xarray can read NCZarr groups, it currently does not write NCZarr groups. After decoding the dimension information and assigning the variable dimensions, Xarray proceeds to [optionally] decode each variable using its standard CF decoding machinery used for NetCDF data.