Skip to content

Unnecessarily high Slurm RPC count when running under Slurm #6

Description

@goodsonjr

When running CellViT-Inference in a Slurm job, the tool checks memory usage constantly. This takes the form of a subprocess call to sstat. Each call of sstat performs relatively expensive RPCs to the Slurm controller. When using small batches and/or many jobs are running simultaneously, this creates extremely large volumes of Slurm RPCs that cause problems for busy Slurm controllers handling normal job load.

This should either prioritize the cheap cgroup data reads with the expensive sstat call as the fallback or provide a configuration option to disable or lower the frequency of sstat calls. If you would like a pull request to do either let me know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions