Skip to content

Validate NX_class against known NeXus class names#68

Merged
SimonHeybrock merged 6 commits into
mainfrom
validate-nx-class-name
Jun 24, 2026
Merged

Validate NX_class against known NeXus class names#68
SimonHeybrock merged 6 commits into
mainfrom
validate-nx-class-name

Conversation

@SimonHeybrock

Copy link
Copy Markdown
Member

Summary

Adds a NX_class_invalid validator that errors when a group's NX_class attribute is not a known NeXus class name. Catches typos such as NXPositioner vs NXpositioner, which previously slipped through entirely: NX_class_attr_missing only checked presence, and downstream validators keying off specific names (e.g. chopper_frequency_units_invalid) silently did not apply.

The allow-list is the full set of NeXus base classes pulled from nexusformat/definitions plus NXcanSAS. Application and contributed definitions are intentionally not included: a raw ESS file declaring e.g. NXapm_paraprobe_clusterer_results is almost certainly a mistake and should fail validation. Extend nexus_application_classes when a legitimate new application class is needed.

The base class list lives in its own module (src/chexus/nexus_base_classes.py) with the upstream URL and a gh api refresh command in the docstring.

Deprecated base classes (NXgeometry, NXorientation, NXshape, NXtranslation) remain in the allow-list so they continue to be reported only by the dedicated NX_class_is_legacy validator, not flagged twice.

Test plan

  • Existing test suite passes (72 tests)
  • New tests cover the motivating typo, other unknown names, the known-good NXcanSAS, and the no-NX_class-attr applies-to case

Catches typos like NXPositioner vs NXpositioner that previously slipped
through unchecked (NX_class_attr_missing only verified presence, and
downstream validators keying off specific names silently did not apply).

The allow-list contains all NeXus base classes (pulled from
nexusformat/definitions) plus NXcanSAS. Application/contributed
definitions are intentionally not included: a raw ESS file declaring,
say, NXapm_paraprobe_clusterer_results is almost certainly a mistake
and should fail validation. Extend nexus_application_classes when a
legitimate new application class is needed.

The base class list lives in its own module with the upstream URL and a
gh api refresh command in the docstring.
@SimonHeybrock SimonHeybrock marked this pull request as ready for review May 20, 2026 05:23

@jl-wynen jl-wynen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good. Just comments about the data file.

Comment thread src/chexus/nexus_base_classes.py Outdated
Refresh with:

gh api repos/nexusformat/definitions/contents/base_classes \
--jq '.[].name' | grep '\\.nxdl\\.xml$' | sed 's/\\.nxdl\\.xml$//' | sort

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This outputs nothing for me. Is your grep an alias for something else? I need

grep '.nxdl.xml$' | sed 's/.nxdl.xml$//'

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably some quoting issue, replacing with a jq only command.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Single backslashes work for me, too. So is this an issue with Python strings? You could simply use a raw string.

Comment thread src/chexus/nexus_base_classes.py Outdated
Refresh with:

gh api repos/nexusformat/definitions/contents/base_classes \
--jq '.[].name' | grep '\\.nxdl\\.xml$' | sed 's/\\.nxdl\\.xml$//' | sort

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this command does not generate the complete Python module file here, how about you pipe the result into a text file and load that from Python? That would simplify updating the list.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to ship the text file in the wheel etc., is it worth it? Alternative: add a script that writes the file.

@jl-wynen jl-wynen May 20, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ship text files in other packages, too. It's pretty simple. But a separate script in a tools folder would work, too.

SimonHeybrock and others added 3 commits May 20, 2026 08:27
The previous refresh command relied on `\.` in a single-quoted regex,
which several shell/grep combinations interpret differently or which
copy-paste from the rendered docstring mangles. Doing everything in jq
removes the shell-escaping hazard.
The 142-entry list is now regenerated by tools/refresh_nexus_base_classes.py
which fetches the directory listing via `gh api` and rewrites the Python
module in place. Avoids hand-editing the literal on every upstream update.

Per reviewer suggestion; chose a script over a separate .txt data file to
avoid runtime resource-loading machinery for a static list.
Comment thread tools/refresh_nexus_base_classes.py Outdated
)

HEADER = '''# SPDX-License-Identifier: BSD-3-Clause
# Copyright (c) 2023 Scipp contributors (https://github.com/scipp)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sticking with 2023 now? I've noticed this in a couple of your files but wasn't sure if it's on purpose.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is a while since I looked at headers, just copy and paste from other file I suppose :D

Comment thread src/chexus/nexus_base_classes.py Outdated
Comment thread src/chexus/validators.py Outdated
Comment thread tools/refresh_nexus_base_classes.py Outdated
Comment thread tools/refresh_nexus_base_classes.py Outdated
Co-authored-by: Simon Heybrock <12912489+SimonHeybrock@users.noreply.github.com>
@SimonHeybrock SimonHeybrock enabled auto-merge May 20, 2026 13:49
@nvaytet

nvaytet commented Jun 23, 2026

Copy link
Copy Markdown
Member

@SimonHeybrock the changes were approved but the CI is failing. Can you fix so we can get it merged?

The refresh script uses subprocess with constant args and prints status —
both fine for dev-time tooling but flagged by S603, S607, and T201.
@SimonHeybrock SimonHeybrock merged commit bcc27a0 into main Jun 24, 2026
4 checks passed
@SimonHeybrock SimonHeybrock deleted the validate-nx-class-name branch June 24, 2026 04:53
@jl-wynen

Copy link
Copy Markdown
Member

This check may fail in the future. ECDC are thinking of using 'wrong' class names when testing things to hide groups from our software. E.g., when they want to add an additional monitor for some tests, that monitor might have class _NXmonitor.
Not sure we should accommodate this in chexus. Just posting it for information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants