Skip to content

Low GPU utilization and very long inference times #5

Description

@LoveNordling

Hi,

I am experiencing extremely low GPU utilization when running cellvit-inference on whole slide images. Even though GPU memory is fully allocated, the GPU utilization stays around ~10%, and CPU usage is also low.

As a result, inference is very slow.

Observed performance

~ 1 hour per WSI

GPU utilization: ~10%

CPU utilization: low

GPU memory: mostly allocated

This occurs consistently across multiple GPUs:

RTX 3090

NVIDIA A40

NVIDIA A100

So the issue does not appear to be GPU compute performance.

Dataset

H&E whole slide images

format: NDPI

typical file size: ~0.5 GB

Troubleshooting attempted

I tried different combinations of:

cpu_count

ray_worker

ray_remote_cpus

but none of these significantly changed GPU utilization.

Questions

Is ~1 hour per WSI expected for CellViT inference?

Could NDPI format or OpenSlide I/O be a bottleneck here?

Is there a recommended configuration for maximizing GPU utilization during WSI inference?

Are there preprocessing steps that should be performed before inference to avoid slow tile loading?

At the moment, inference on a cohort takes multiple weeks, and the low GPU utilization also causes issues on shared GPU clusters where jobs are terminated due to inefficient hardware usage.

CUDA 12.1
Pytorch 2.1.2

Config:

==========================

CellViT Inference Config

==========================

Model selection (REQUIRED)

model: "SAM"

Nuclei classification taxonomy (OPTIONAL)

If you want just nuclei segmentation without types, use "binary".

Otherwise keep default-like behavior with "pannuke".

nuclei_taxonomy: "pannuke"

==========================

Inference Settings (OPTIONAL)

==========================

inference:
gpu: 0
#enforce_amp: true
#batch_size: 48

==========================

Output Settings (REQUIRED outdir)

==========================

output_format:
outdir: "./TLS_BOMI1_wholeslides_cellvitoutput"
geojson: true
graph: false
compression: false

==========================

Processing Mode (Choose One)

==========================

process_dataset:
wsi_folder: "./BOMI1_wsi/"
wsi_extension: "ndpi"

Optional overrides (normally auto-read from slide metadata via OpenSlide):

wsi_mpp: 0.441

wsi_magnification: 20

==========================

System Settings (OPTIONAL)

==========================

system:
#cpu_count: 16
#ray_worker: 8
#ray_remote_cpus: 1

memory: 64000

debug: false

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions