Validate NX_class against known NeXus class names#68
Conversation
Catches typos like NXPositioner vs NXpositioner that previously slipped through unchecked (NX_class_attr_missing only verified presence, and downstream validators keying off specific names silently did not apply). The allow-list contains all NeXus base classes (pulled from nexusformat/definitions) plus NXcanSAS. Application/contributed definitions are intentionally not included: a raw ESS file declaring, say, NXapm_paraprobe_clusterer_results is almost certainly a mistake and should fail validation. Extend nexus_application_classes when a legitimate new application class is needed. The base class list lives in its own module with the upstream URL and a gh api refresh command in the docstring.
jl-wynen
left a comment
There was a problem hiding this comment.
The code looks good. Just comments about the data file.
| Refresh with: | ||
|
|
||
| gh api repos/nexusformat/definitions/contents/base_classes \ | ||
| --jq '.[].name' | grep '\\.nxdl\\.xml$' | sed 's/\\.nxdl\\.xml$//' | sort |
There was a problem hiding this comment.
This outputs nothing for me. Is your grep an alias for something else? I need
grep '.nxdl.xml$' | sed 's/.nxdl.xml$//'
There was a problem hiding this comment.
Probably some quoting issue, replacing with a jq only command.
There was a problem hiding this comment.
Ah, right. Single backslashes work for me, too. So is this an issue with Python strings? You could simply use a raw string.
| Refresh with: | ||
|
|
||
| gh api repos/nexusformat/definitions/contents/base_classes \ | ||
| --jq '.[].name' | grep '\\.nxdl\\.xml$' | sed 's/\\.nxdl\\.xml$//' | sort |
There was a problem hiding this comment.
Since this command does not generate the complete Python module file here, how about you pipe the result into a text file and load that from Python? That would simplify updating the list.
There was a problem hiding this comment.
We'd need to ship the text file in the wheel etc., is it worth it? Alternative: add a script that writes the file.
There was a problem hiding this comment.
We ship text files in other packages, too. It's pretty simple. But a separate script in a tools folder would work, too.
The previous refresh command relied on `\.` in a single-quoted regex, which several shell/grep combinations interpret differently or which copy-paste from the rendered docstring mangles. Doing everything in jq removes the shell-escaping hazard.
The 142-entry list is now regenerated by tools/refresh_nexus_base_classes.py which fetches the directory listing via `gh api` and rewrites the Python module in place. Avoids hand-editing the literal on every upstream update. Per reviewer suggestion; chose a script over a separate .txt data file to avoid runtime resource-loading machinery for a static list.
| ) | ||
|
|
||
| HEADER = '''# SPDX-License-Identifier: BSD-3-Clause | ||
| # Copyright (c) 2023 Scipp contributors (https://github.com/scipp) |
There was a problem hiding this comment.
Are we sticking with 2023 now? I've noticed this in a couple of your files but wasn't sure if it's on purpose.
There was a problem hiding this comment.
I think is a while since I looked at headers, just copy and paste from other file I suppose :D
Co-authored-by: Simon Heybrock <12912489+SimonHeybrock@users.noreply.github.com>
|
@SimonHeybrock the changes were approved but the CI is failing. Can you fix so we can get it merged? |
The refresh script uses subprocess with constant args and prints status — both fine for dev-time tooling but flagged by S603, S607, and T201.
|
This check may fail in the future. ECDC are thinking of using 'wrong' class names when testing things to hide groups from our software. E.g., when they want to add an additional monitor for some tests, that monitor might have class |
Summary
Adds a
NX_class_invalidvalidator that errors when a group'sNX_classattribute is not a known NeXus class name. Catches typos such asNXPositionervsNXpositioner, which previously slipped through entirely:NX_class_attr_missingonly checked presence, and downstream validators keying off specific names (e.g.chopper_frequency_units_invalid) silently did not apply.The allow-list is the full set of NeXus base classes pulled from
nexusformat/definitionsplusNXcanSAS. Application and contributed definitions are intentionally not included: a raw ESS file declaring e.g.NXapm_paraprobe_clusterer_resultsis almost certainly a mistake and should fail validation. Extendnexus_application_classeswhen a legitimate new application class is needed.The base class list lives in its own module (
src/chexus/nexus_base_classes.py) with the upstream URL and agh apirefresh command in the docstring.Deprecated base classes (
NXgeometry,NXorientation,NXshape,NXtranslation) remain in the allow-list so they continue to be reported only by the dedicatedNX_class_is_legacyvalidator, not flagged twice.Test plan
NX_class-attr applies-to case