Skip to content

error when testing #37

@VitaLemonTea1

Description

@VitaLemonTea1

I train the baseline with 1 A100-40G,using ./tools/dist_train.sh ./projects/configs/bevformer/bevformer_base_occ.py 1.
After 24epoch,I tried to use ./tools/dist_test.py ./projects/configs/bevformer/bevformer_base_occ.py work_dirs/bevformer_base_occ/epoch_24.pth 1.
After loading checkpoint and evaluate for 6019tasks, I saw the memory increased from18G to 42G, and suddenly it got error: torch.distributed.elastic.multiprocessing.api:failed.
So how can I fix this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions