Skip to content

Fix IndexedTarStore writer reload truncation#150

Merged
janickm merged 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/itar-reload-resources
Jun 16, 2026
Merged

Fix IndexedTarStore writer reload truncation#150
janickm merged 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/itar-reload-resources

Conversation

@fallintoplace

Copy link
Copy Markdown
Contributor

Summary

IndexedTarStore.reload_resources() reopened write-mode archives with wb+, which truncates the archive during an in-progress write. This changes the writer reload path to reopen the existing archive with rb+ so already-written tar payloads remain intact.

I also added a regression test that writes one entry, reloads writer resources, verifies the file size and payload are preserved, writes a second entry, and then reopens the archive in read mode to check both entries.

Test

PYTHONPATH=$PWD uv run --no-project --python 3.12 --with 'pytest==8.3.4' --with 'cbor2==5.9.0' --with 'numcodecs==0.15.1' --with 'numpy==1.26.4' --with 'parameterized==0.9.0' --with 'universal-pathlib==0.3.6' --with 'zarr==2.18.4' pytest -q ncore/impl/data/stores_test.py

Result: 26 passed

@copy-pr-bot

copy-pr-bot Bot commented Jun 13, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@janickm janickm self-assigned this Jun 15, 2026
@janickm janickm self-requested a review June 15, 2026 08:42
@janickm

janickm commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

/ok to test 78dd098

@janickm

janickm commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

This is great, thanks for fixing @fallintoplace - LGTM.

Can you please still correct the commit message to conform to the conventional-commit requirements (maybe use fix(itar-store): ... as the commit message prefix). If you can, please also sign the commit according to the DCO.

I think we never ran into this as we only ever reloaded resources when reading (e.g., when training multi-process @ torch dataloader, where the stores are loaded in the main process, requiring a reload in the forked workers) - maybe I ask in which setting you ran into this while writing (wondering how you could have a setting where you'd need to reload the resources while writing)?

@janickm janickm left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks great - just the commit message requires fixes + signature are missing. Maybe adding details to the commit body would also be useful.

@janickm janickm assigned fallintoplace and unassigned janickm Jun 15, 2026
reload_resources() reopened write-mode stores with wb+, which truncates an in-progress archive while the in-memory index still references entries that were already written.

Reopen write-mode stores with rb+ during resource reload so existing tar payloads are preserved, and cover the behavior with a regression test that writes before and after reload.

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
@fallintoplace fallintoplace force-pushed the fix/itar-reload-resources branch from 78dd098 to 3dba779 Compare June 15, 2026 16:18
@janickm

janickm commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

/ok to test 3dba779

@janickm janickm enabled auto-merge June 16, 2026 07:59
@janickm

janickm commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

thanks @fallintoplace! approved, merging once CI succeeds

do you mind still sharing in which types of setups you ran into this problem (reloading resources while writing?)

@janickm janickm added this pull request to the merge queue Jun 16, 2026
Merged via the queue into NVIDIA:main with commit bfef84d Jun 16, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants