Hi, thanks for open-sourcing this great work! I'm excited to try it out.
I ran the code on the GenImage dataset using a single RTX 4090, following the same configuration as the provided train.sh — training on stable_diffusion_v_1_4 and evaluating on all other subsets.
The paper reports ADM accuracy at 96.68% on GenImage, but I'm only getting ~72%.
Could you help clarify:
Are there any differences between the released code and the training config in train.sh that could explain this gap?
Does the paper's GenImage result use a different embedding precomputing recipe?
train epoch 5:
subset accuracy real_accuracy fake_accuracy ap f1
ADM 0.709417 0.954333 0.464500 0.849598 0.615164
BigGAN 0.879917 0.959167 0.800667 0.962611 0.869581
glide 0.927583 0.953333 0.901833 0.982136 0.925669
Midjourney 0.931667 0.952833 0.910500 0.984518 0.930189
stable_diffusion_v_1_4 0.976417 0.959500 0.993333 0.998553 0.976809
VQDM 0.931417 0.956667 0.906167 0.982783 0.929640
wukong 0.970417 0.958333 0.982500 0.996912 0.970770
Chameleon 0.832367 0.745744 0.947628 0.961499 0.829091
train epoch 10:
subset accuracy real_accuracy fake_accuracy ap f1
ADM 0.721833 0.948167 0.495500 0.852250 0.640457
BigGAN 0.896167 0.950167 0.842167 0.968281 0.890240
glide 0.934583 0.944667 0.924500 0.984731 0.933917
Midjourney 0.934917 0.947667 0.922167 0.984949 0.934076
stable_diffusion_v_1_4 0.970000 0.945000 0.995000 0.998506 0.970732
VQDM 0.937750 0.948833 0.926667 0.985311 0.937052
wukong 0.968083 0.949667 0.986500 0.997146 0.968660
Chameleon 0.818269 0.714190 0.956759 0.962727 0.818770
Hi, thanks for open-sourcing this great work! I'm excited to try it out.
I ran the code on the GenImage dataset using a single RTX 4090, following the same configuration as the provided train.sh — training on stable_diffusion_v_1_4 and evaluating on all other subsets.
The paper reports ADM accuracy at 96.68% on GenImage, but I'm only getting ~72%.
Could you help clarify:
Are there any differences between the released code and the training config in train.sh that could explain this gap?
Does the paper's GenImage result use a different embedding precomputing recipe?