Optimised database registration logic in CLEM workflow#787
Open
Optimised database registration logic in CLEM workflow#787
Conversation
… reference for when updating grid squares
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #787 +/- ##
==========================================
+ Coverage 50.72% 50.88% +0.16%
==========================================
Files 96 96
Lines 10059 10089 +30
Branches 1322 1337 +15
==========================================
+ Hits 5102 5134 +32
+ Misses 4690 4686 -4
- Partials 267 269 +2 🚀 New features to boost your workflow:
|
…te a new ImagingSite entry, but should fail loudly if no entry was found; move the 'commit()' command out of the for loop
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes a couple of issues with the database registration logic in the CLEM workflow:
Handle the possibility of there being multiple atlas images associated with a grid
The CLEM workflow currently assumes that all of the datasets stored under the top level folder belong to the same grid. It currently also uses the field of view of the incoming dataset to decide if it is to be treated as an atlas image or a grid. We thus occasionally run into a situation where more than one datasets in a grid get registered as atlases, at which point the workflow breaks because the SQL query expects a single matching database entry or no result.
This can be resolved by adjusting the logic slightly so that the database query returns all matching atlas entries, sorted by insertion order. It then uses the latest one to perform the ISPyB updates. Because the grid squares and data collection groups are re-updated for every new image set received by the Murfey backend, all entries for a given sample will be re-updated correctly with reference to the last atlas-class image registered.
Accidental insert when updating
ImagingSiteentries after registeringGridSquareentriesIn the
_register_dcg_and_atlasfunction, we previously would create anImagingSiteentry if none was found. In production, this logic led to the accidental creation of partially populated duplicate rows due to race conditions when registeringImagingSiteentries in the database (one worker thinks there is no entry, and tries to create a duplicate row before the transaction done by another worker is set)._register_dcg_and_atlasshould not be creating any newImagingSiterows, but should error if such a race condition crops up.Aggressive commits
In
register_grid_square,db.commit()was previously run multiple times within a singleforloop. By moving it out of the loop, this should improve performance.