When running CellViT-Inference in a Slurm job, the tool checks memory usage constantly. This takes the form of a subprocess call to sstat. Each call of sstat performs relatively expensive RPCs to the Slurm controller. When using small batches and/or many jobs are running simultaneously, this creates extremely large volumes of Slurm RPCs that cause problems for busy Slurm controllers handling normal job load.
This should either prioritize the cheap cgroup data reads with the expensive sstat call as the fallback or provide a configuration option to disable or lower the frequency of sstat calls. If you would like a pull request to do either let me know.
When running CellViT-Inference in a Slurm job, the tool checks memory usage constantly. This takes the form of a subprocess call to
sstat. Each call ofsstatperforms relatively expensive RPCs to the Slurm controller. When using small batches and/or many jobs are running simultaneously, this creates extremely large volumes of Slurm RPCs that cause problems for busy Slurm controllers handling normal job load.This should either prioritize the cheap cgroup data reads with the expensive
sstatcall as the fallback or provide a configuration option to disable or lower the frequency ofsstatcalls. If you would like a pull request to do either let me know.