feat: subchunk write order by ilan-gold · Pull Request #3826 · zarr-developers/zarr-python

ilan-gold · 2026-03-24T11:04:30Z

In order to encourage ecosystem compatibility + reserve runtime setting strings/enums (see zarrs/zarrs-python#160), subchunk write order is expanded from morton to include lexicographic, colexicographic, and unordered (which is randomized).

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.md
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

d-v-b · 2026-03-24T11:57:39Z

src/zarr/codecs/sharding.py

    end = "end"


+class SubchunkWriteOrder(Enum):


advantage of an enum over Literal["morton", "unordered", "lexicographic", "colexicographic"]?

Just copied what was done for ShardingCodecIndexLocation!

I'm not a huge fan of enums in python (including ShardingCodecIndexingLocation), so unless you object I think it would be better to use a simple Literal + a final tuple of strings, like:

SubchunkWriteOrder = Literal["morton", "unordered", "lexicographic", "colexicographic"] SUBCHUNK_WRITE_ORDER: Final[tuple[str, str, str, str]] = ("morton", "unordered", "lexicographic", "colexicographic")

Done (hopefully)!

src/zarr/codecs/sharding.py

docs/user-guide/performance.md

Co-authored-by: Davis Bennett <davis.v.bennett@gmail.com>

…o ig/shard_order

ilan-gold · 2026-03-27T16:36:34Z

src/zarr/codecs/sharding.py


        if self._is_complete_shard_write(indexer, chunks_per_shard):
-            shard_dict = dict.fromkeys(morton_order_iter(chunks_per_shard))
+            shard_dict = dict.fromkeys(np.ndindex(chunks_per_shard))


cc @mkitti

Here and below, I don't think there is any need to construct the dict in morton order, right? There should be no correctness or performance hit here?

@d-v-b This now ensures we only shuffle in the unordered case once so the test is nice and clean - write once + get order, create a new codec with the same seed + create the iterator from that codec, match orders

In Python, dicts are ordered and I think the optimal iteration order may need to be encoded in the dict the last time I examined the situation. I was just trying to preserve the situation before my edits.

027c469

So this wasn't about dictionary order, but instead in the vectorized case, the order to ShardReader.to_dict_vectorized had to match that of what ShardReader was internally generating, as it turned out morton order. So I'm glad I caught this because I think it means the data was being corrupted for the other orders (which weren't getting hypothesis-tested).

So I'm going to add something to the hyptothesis tests for this.

I had the same feeling initially that the dictionary order mattered, but it turns out the final call to _encode_shard_dict actually handles the ordering for us to the output buffer while writing to the intermediate shard_dict can be done in any order, as long as the final buffer is done in the correct order

ilan-gold added 6 commits March 24, 2026 11:53

feat: subchunk write order

5477d70

chore: export SubchunkWriteOrder

2e36679

chore: docs

c6498b2

chore: relnote

58e071c

Merge branch 'main' into ig/shard_order

417df78

rename

11b94c0

d-v-b reviewed Mar 24, 2026

View reviewed changes

ilan-gold added 2 commits March 24, 2026 15:32

refactor: no enums

b0c622d

Merge branch 'main' into ig/shard_order

22a5dda

ilan-gold commented Mar 24, 2026

View reviewed changes

src/zarr/codecs/sharding.py Show resolved Hide resolved

Merge branch 'main' into ig/shard_order

39634f0

ilan-gold requested a review from d-v-b March 24, 2026 17:51

Merge branch 'main' into ig/shard_order

f4498a6

ilan-gold mentioned this pull request Mar 26, 2026

chore: prepare for release zarrs/zarrs-python#162

Merged

d-v-b reviewed Mar 26, 2026

View reviewed changes

src/zarr/codecs/sharding.py Outdated Show resolved Hide resolved

d-v-b reviewed Mar 26, 2026

View reviewed changes

docs/user-guide/performance.md Outdated Show resolved Hide resolved

ilan-gold and others added 5 commits March 27, 2026 09:30

Update docs/user-guide/performance.md

be7ac83

Co-authored-by: Davis Bennett <davis.v.bennett@gmail.com>

Merge branch 'main' into ig/shard_order

a89249a

Merge branch 'main' into ig/shard_order

5ea1cf3

feat: deterministic but random order

7b663ff

Merge branch 'ig/shard_order' of github.com:ilan-gold/zarr-python int…

eae06dd

…o ig/shard_order

ilan-gold commented Mar 27, 2026

View reviewed changes

ilan-gold requested a review from d-v-b March 27, 2026 16:36

ilan-gold added 3 commits March 27, 2026 18:23

Merge branch 'main' into ig/shard_order

f36ea93

fix: make vectorized fetching less reliant on matching order

027c469

chore: add hypothesis

3f53182

ilan-gold mentioned this pull request Mar 27, 2026

Add store methods for writing into an existing buffer #3831

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: subchunk write order#3826

feat: subchunk write order#3826
ilan-gold wants to merge 18 commits intozarr-developers:mainfrom
ilan-gold:ig/shard_order

ilan-gold commented Mar 24, 2026 •

edited

Loading

Uh oh!

d-v-b Mar 24, 2026

Uh oh!

ilan-gold Mar 24, 2026

Uh oh!

d-v-b Mar 24, 2026

Uh oh!

ilan-gold Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilan-gold Mar 27, 2026 •

edited

Loading

Uh oh!

mkitti Mar 27, 2026

Uh oh!

ilan-gold Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ilan-gold commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

d-v-b Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

ilan-gold Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

d-v-b Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

ilan-gold Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilan-gold Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkitti Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

ilan-gold Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ilan-gold commented Mar 24, 2026 •

edited

Loading

ilan-gold Mar 24, 2026 •

edited

Loading

ilan-gold Mar 27, 2026 •

edited

Loading