Skip to content

380 alphanumeric well images#71

Open
jluethi wants to merge 9 commits intoome:mainfrom
jluethi:380_alphanumeric_well_images
Open

380 alphanumeric well images#71
jluethi wants to merge 9 commits intoome:mainfrom
jluethi:380_alphanumeric_well_images

Conversation

@jluethi
Copy link
Contributor

@jluethi jluethi commented Jan 21, 2026

This PR suggests relaxing the allowed names of images in wells, following the discussion here: ome/ngff#380

While I understand restrictions on names for rows and columns of HCS plates, the images in HCS plates can be many things. I don't think it's reasonable to restrict it more than general Zarr node naming restrictions. As proposed by @d-v-b , this PR proposes following the Zarr spec recommendations for characters in node names.

I don't know what the rule on linking to other pages like the Zarr spec page is / should be, so we can also remove that link if needed.

@thewtex Could you also have a look at this and whether this would interfere with anything ngff-zarr related?

We already implemented this in ngio (we actually had always been using _ in path names and only noticed last fall that this clashes with the strict path definitions in HCS wells.

cc @imagejan @lorenzocerrone

@github-actions
Copy link

Automated Review URLs

Copy link
Contributor

@thewtex thewtex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM!

Copy link
Member

@will-moore will-moore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

Copy link
Contributor

@jo-mueller jo-mueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jluethi,

thanks for the update! Maybe as a follow-up here, could you add/change the following (some of this is non-obvious and didn't occur to me before either ^^"):

  • Add some test.json with valid/invalid plate/well metadata under tests/attributes/spec/{valid/invalid}/{well/plate} to make sure these are correctly flagged as valid or invalid by the CI?
  • Do you have some example json to upload somewhere on S3 that one could point the validator to? We'd need a copy of ome/ome-ngff-validator#48 from @will-moore to point at this branch for this to work, though.
  • Small mention of the proposed change in version_history.md :)

@d-v-b
Copy link
Contributor

d-v-b commented Jan 27, 2026

Do you have some example json to upload somewhere on S3 that one could point the validator to?

You can just commit JSON to git, and use the github url to the raw JSON object.

edit: or use a gist.

@jo-mueller
Copy link
Contributor

You can just commit JSON to git, and use the github url to the raw JSON object.

In that case, adding valid/invalid examples for the validation CI is perfectly enough. Probably even better 👍

@jluethi
Copy link
Contributor Author

jluethi commented Jan 30, 2026

@jo-mueller Thanks for those follow-ups! I added relevant test jsons for wells, both valid and invalid. I think we can cover all the new option in 1 valid part, but should test the 4 ways to be invalid separately (such that we would catch each issue, not just a combination thereof).

This does not affect the plate schemas, because they only point to where the well is, not where the paths in the well are.

I also added something to version_history. I'd leave it to you to move this from unreleased to a specific version with the relevant version updates, depending on when it makes its way to main.
(side observation: It's a super minute update & detailed description compared with some past descriptions that are implementations of whole RFCs. But nothing against switching to more detailed changelogs)

@jluethi jluethi requested a review from jo-mueller January 30, 2026 14:07
@jo-mueller
Copy link
Contributor

I also added something to version_history. I'd leave it to you to move this from unreleased to a specific version with the relevant version updates, depending on when it makes its way to main.

Sounds good to me 👍

but should test the 4 ways to be invalid separately (such that we would catch each issue, not just a combination thereof).

Jup!

@tlambert03
Copy link

tlambert03 commented Feb 2, 2026

i agree! missed this and was just proposing similar in #433... 100% agree, and my use case also centered around using _ in field names.

Some quick questions/thoughts about this proposal though:

                  "path": {
                    "description": "The path for this field of view subgroup",
                    "type": "string",
                    "minLength": 1,
                    "not": {
                      "anyOf": [
                        { "const": "." },
                        { "const": ".." },
                        { "pattern": "^__" },
                        { "pattern": "/" }
                      ]
                    }

This might be too permissive? The current ^[A-Za-z0-9]+$ is definitely overly restrictive, but it's bullet proof. This new pattern allows:

  • \ or ":" - windows path sep or drive letters
  • < > " | ? * invalid on windows
  • spaces - not strictly illegal, but probably will cause someone a bad day in a bash script somewhere someday :)
  • leading trailing spaces or trailing period (problematic on various file systems)
  • Non-ASCII characters - more potential glyph confusions

It's worth pointing out that dataset.path has similar problems (which should probably be fixed)... but perhaps we should take advantage of this "opportunity" of over-restriction on field names and not swing too far back? How about just "^[A-Za-z0-9_]+$"? possibly with dashes "^[A-Za-z0-9_-]+$" ... and maybe periods? "^[A-Za-z0-9_.-]+$"? ... and then you can add not.anyOf to avoid those other cases if needed

@jluethi
Copy link
Contributor Author

jluethi commented Feb 2, 2026

Funny how we converge on things here!

This might be too permissive? The current ^[A-Za-z0-9]+$ is definitely overly restrictive, but it's bullet proof.

Personally, I'd be totally fine with this being more restrictive. "^[A-Za-z0-9_.-]+$" is pretty convincing to me, as it would be nice to e.g. have my_image.ome.zarr (or some other file ending) as a image name in a plate (to make it easier for users to recognize the image).

My motivation for going as broad as suggested above was to follow the Zarr recommended node names (https://github.com/zarr-developers/zarr-specs/blob/main/docs/v3/core/index.rst#node-names) and to have the same level of OME-Zarr spec restrictions on these image names as for other, standalone images.

The Zarr rules also include the should for alphanumeric and .-_. Thus I added:

The path SHOULD only use characters in the sets a-z, A-Z, 0-9, -, _, ..

I initially drafted schemas that would enforce this, but wasn't sure if someone had important use cases for some extra characters that are allowed (even if not recommended) in Zarr. And in some discussions, I wasn't sure we'd want to restrict image in well much differently than another standalone images.

@tlambert03
Copy link

My motivation for going as broad as suggested above was to follow the Zarr recommended node names (https://github.com/zarr-developers/zarr-specs/blob/main/docs/v3/core/index.rst#node-names)

that's a good motivation! And for zarr (a generic, totally agnostic storage layer) i can see why they set their minimum constraints extremely broadly (you don't want to impose constraints that would artificially limit valid use cases right off the bat). It does, however, definitely leave open the possibility for fundamental incompatibility on certain file systems. Zarr is basically saying "look, if you want to make something that can't be used on windows or in bash scripts, we're not going to stop you".

but I think the point of a domain-specific subschema like ngff is to add constraints that enable interoperability. and the zarr spec acknowledges this sort of thing:

he underlying store might pose additional restriction on node names, such as the following:...

And the zarr node name spec itself offers a recommendation:

To ensure consistent behaviour across different storage systems and programming languages, we recommend users to only use characters in the sets a-z, A-Z, 0-9, -, _, .

and I think that's the "right" constraint for something like ngff. 👍

@jluethi
Copy link
Contributor Author

jluethi commented Feb 2, 2026

Agreed on all of that. And personally, I'd be fine to limit the image in well spec to just that.

The discussion on how all OME-Zarrs can be named across the whole spec seems to be a broader discussion though that I didn't intend to open for this PR. While I'm also not super happy about only having a should that stops people from having \:#? or other weird stuff in their image names, it feels inconsistent to be very specific on the image in well level and say nothing for other OME-Zarr images. Thus the current proposal of require the minimum and suggest ("should") the constraint behavior.

@tlambert03
Copy link

tlambert03 commented Feb 2, 2026

Agreed. And I definitely wasn't suggesting broadening the discussion to the whole spec (agree we should only be discussing well fields at the moment). But given that well field paths are already super strictly defined, it seems less breaking on the longer term to just slightly relax the spec now, and accept the asymmetry with dataset.paths, rather than fully open it up (for symmetry with dataset paths) and then break both of them in the future to enforce file system compatibility.

In other words: it's always easier to be more permissive, but it's a breaking change to retract permissions.

Am I understanding your point correctly? Or were you talking about something different there? In other words: what's your current preference for well image paths at the moment: go with your permissive spec that uses the general zarr node spec? Or just slightly relax, going with the zarr node recommendation (allow just under score, dash, period)

@jluethi
Copy link
Contributor Author

jluethi commented Feb 3, 2026

You understood my point well.

In other words: it's always easier to be more permissive, but it's a breaking change to retract permissions.

This seems very convincing to me. I'll update the PR accordingly.

@jluethi
Copy link
Contributor Author

jluethi commented Feb 3, 2026

@tlambert03 @jo-mueller I've updated the proposal according to the discussion above, limiting the allowed characters to alphanumeric . _ -.

@imagejan @thewtex @will-moore Given that you reviewed this in an earlier form, pinging you again for this. Instead of allowing almost any characters in well paths, we decided it would be more reasonable to relax the requirements less drastically to allow "only" ._- (dot, underline, dash) in the image paths in wells. The main motivation is that many of the other potential characters cause too many issues across different storage systems / operating systems and we don't want to overly broaden the allowed spec (from only alphanumeric) if it would have to be restricted again later due to such issues. See detailed discussion above

Copy link
Member

@will-moore will-moore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - makes sense 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants